Difference between revisions of "Replication Server Performance Tuning"

From SybaseWiki
Jump to: navigation, search
(Replicating System Procedures like sp_addlogin / sp_password)
 
(29 intermediate revisions by 7 users not shown)
Line 1: Line 1:
==Performance Tuning==
+
Here are some rules of thumb to tune a replication server (mainly Warm Standby setup). They may not always be suitable for your environment, but it can be used as a starting point for further improvements. The list is not complete.
  
Here are some rules of thumb to tune a replication server. They may not always be suitable for your environment, but it can be used as a starting point for further improvements.
+
==RepAgent==
 
+
===RepAgent===
+
 
Some basic settings need to be changed to get a better througput from the RepAgent to the RepServer. Change these setting with the sp_config_rep_agent stored procedure and do a restart of the RepAgent afterwards.
 
Some basic settings need to be changed to get a better througput from the RepAgent to the RepServer. Change these setting with the sp_config_rep_agent stored procedure and do a restart of the RepAgent afterwards.
 
* "scan batch size", set to "10000"
 
* "scan batch size", set to "10000"
Line 13: Line 11:
 
In the output of sp_help_rep_agent the markers are shown like this (example): (2954283,4). The first number is the page number and second one the record number on the page.
 
In the output of sp_help_rep_agent the markers are shown like this (example): (2954283,4). The first number is the page number and second one the record number on the page.
  
===SQM===
+
==SQM==
 
Do not spend a lot of time on this. Just make sure that the stable devices are on raw disk when your operating system supports it.
 
Do not spend a lot of time on this. Just make sure that the stable devices are on raw disk when your operating system supports it.
  
===SQT===
+
==STS==
The most important configuration parameter for the SQT is "sqt_max_cache_size". To check if this parameter has been set too low, run an "admin who,sqt" at regular intervals and check if the columns "removed" or "full" contain a non-zero value. Be aware that the output of "admin who,sqt" also shows the DSI threads. So when "removed" or "full" are non-zero for a DSI thread, change the memory setting at the connection-level using "alter connection to <dataserver.database> set dsi_sqt_max_cache_size to '<new-value>'". Otherwise change the "sqt_max_cache_size" configuration parameter using "rs_configure".
+
Increase the value of sts_cachesize so rows of the RSSD system tables can be cached in memory. A fairly good indication of the value of this configuration parameter can be determined by adding the rowcount of all system tables and use the total number at the value. Add something like 10% to allow for growth.
 +
configure replication server set sts_cachesize to '550'
 +
Cache the system tables itself by running these commands:
 +
configure replication server set sts_full_cache_rs_classes to 'on'
 +
configure replication server set sts_full_cache_rs_columns to 'on'
 +
configure replication server set sts_full_cache_rs_config to 'on'
 +
configure replication server set sts_full_cache_rs_databases to 'on'
 +
configure replication server set sts_full_cache_rs_datatype to 'on'
 +
configure replication server set sts_full_cache_rs_diskaffinity to 'on'
 +
configure replication server set sts_full_cache_rs_functions to 'on'
 +
configure replication server set sts_full_cache_rs_objects to 'on'
 +
configure replication server set sts_full_cache_rs_publications to 'on'
 +
configure replication server set sts_full_cache_rs_queues to 'on'
 +
configure replication server set sts_full_cache_rs_repdbs to 'on'
 +
configure replication server set sts_full_cache_rs_routes to 'on'
 +
configure replication server set sts_full_cache_rs_sites to 'on'
 +
configure replication server set sts_full_cache_rs_systext to 'on'
 +
configure replication server set sts_full_cache_rs_translation to 'on'
 +
configure replication server set sts_full_cache_rs_users to 'on'
 +
configure replication server set sts_full_cache_rs_version to 'on'
 +
In RepServer version before 15.0 you could technically also cache the rs_locater table. '''Do not''' cache this table. A crash of Repserver can then lead to inconsistencies.
  
===Replicating Stored Procedures===
+
Also, you should be aware that while caching these tables will improve performance of replication in general and will reduce CPU utilization by repserver and dataserver, it can impact operations that cause the table caches to be rebuilt.  Specifically, if rs_objects and rs_columns are cached, it will severely increase the time required to create, drop, or alter replication definitions.  Repserver's CPU usage will spike during this time as it is forced to rebuild its cache.
 +
 
 +
One user experienced this after executing the above commands to cache the tables on RepServer 12.5.  A support call to Sybase lked to un-caching rs_objects and rs_columns.  From a private communication with Sybase support:
 +
CR 500890 More efficient processing of changes to cached that are identified to sts_full_cache_<tablename>
 +
Workaround:It is not necessary to fully cache tables.  STS cache automatically does that for RS and is quite efficient.
 +
If there are frequent changes to RS like adding/dropping repdefs and subscriptions, it is best NOT to fully cache those tables,
 +
but to increase the sts_cachesize to hold more entries
 +
The user's experience was restricted purely to slow adding/dropping of repdefs, whereas subscriptions were added/dropped with normal speed. Sybase support further communicated the fact that this will not be changed in later versions.
 +
 
 +
==SQT==
 +
The most important configuration parameter for the SQT is "sqt_max_cache_size". To check if this parameter has been set too low, run an "admin who,sqt" at regular intervals and check if the columns "removed" or "full" contain a non-zero value. Be aware that the output of "admin who,sqt" also shows the DSI threads. So when "removed" or "full" are non-zero for a DSI thread, change the memory setting at the connection-level using "alter connection to <dataserver.database> set dsi_sqt_max_cache_size to '<new-value>'". Otherwise change the "sqt_max_cache_size" configuration parameter using "rs_configure" or (better!) "configure replication server".
 +
 
 +
It's best to start increasing "sqt_max_cache_size" before changing settings at the connection level.
 +
 
 +
For RepServer 12.6 do not increase "sqt_max_cache_size" to much. Oversizing the cache will in fact decrease performance. SQT cache is sufficiently sized if you do not observe, or observe only infrequently, transactions being flushed from cache because there is not enough room to store more in cache.
 +
 
 +
==DIST==
 +
For standard warm standby setups you can safely disable the distributor thread (DIST) to save some unneeded overhead within Replication Server. When you want to define subscriptions using the logical connection (for instance for Multi Site Availability) you will need the DIST thread so either leave it as is, or re-enable the distributor thread when it has been disabled.
 +
Disabling the DIST can be done with "alter logical connection to <logical-ds.logical-db> set distribution off".
 +
 
 +
==Replicating Stored Procedures==
 
Using stored procedure replication can greatly improve the performance of a replicated environment. Before using this feature check if the tables that are modified through the stored procedure are also marked for replication. For warm standby environments this is usually done with the sp_reptostandby stored procedure.
 
Using stored procedure replication can greatly improve the performance of a replicated environment. Before using this feature check if the tables that are modified through the stored procedure are also marked for replication. For warm standby environments this is usually done with the sp_reptostandby stored procedure.
 
Since data at the replicate site is now modified through a stored proc, some side effects can occur.
 
Since data at the replicate site is now modified through a stored proc, some side effects can occur.
 
* Tables using identity columns will have a different value in their identity column on the primary and replicate site.
 
* Tables using identity columns will have a different value in their identity column on the primary and replicate site.
 
* Inserted/Updated values from global variables (like @@spid), various system functions like getdate() will also differ between primary and replicate, causing a difference in data between sites.
 
* Inserted/Updated values from global variables (like @@spid), various system functions like getdate() will also differ between primary and replicate, causing a difference in data between sites.
 +
* You cannot use nested stored procedure replication (stored procedure a executes b, and a and b are both marked for replication). See http://www.sybase.com/detail?id=1039889 for recovery instructions when you got "RepAgent(n): Nested replicated stored procedure detected. Transaction log may be corrupt. Please contact SYBASE Technical Support.". It might be neccesary to run the recovery steps twice.
 
Check if this applies to your situation before using stored procedure replication.
 
Check if this applies to your situation before using stored procedure replication.
  
Line 29: Line 68:
 
There is no need to create anything else (like subscribtions or other stuff).
 
There is no need to create anything else (like subscribtions or other stuff).
  
==Replicating System Procedures like sp_addlogin / sp_password / etc.==
+
==Modifying the RSSD using sql scripts==
 
+
(from sybase.com): Every time you run a SQL script that modifies the Replication Server System Database (RSSD), you must shut down Replication Server before running the script, then restart it after running the script. This is because of changes to the way heterogeneous datatype information is cached.
This seems to be a classic problem at warm-standby environments. Mark A. Parsons found a solution for this, but here is a much easier one. The description given below is a kind of template based on the requirement to replicate the execution of sp_addlogin. You can easily modify it for other Sybase supplied system stored procedures.
+
* Setup a database (e.g. myDatabase) for warm standby as you normally do.
+
* Grant the neccesary roles to the maintenance user to succesfully execute system stored procedures (like sso_role for sp_addlogin)
+
* create a wrapper stored procedure like this template
+
  
use myDatabase
+
==Minimal Column Replication==
  go
+
With minimal column replication updates and deletes will be handled with greater efficiency by the RepServer. Before implementing this for a table check the following:
create procedure sp_addlogin @loginame varchar(30),
+
* the table has a unique index
                              @passwd  varchar(31)
+
* the columns in that index will never be changed by an update statement
as
+
Once you have verified this it's easy to implement minimal column replication. Here is a template:
-- trick the RepAgent
+
   create replication definition <repdef-name>
commit tran
+
  with primary at &lt;logical-connection&gt;.&lt;database&gt;
    
+
   with all tables named '&lt;table-name&gt;'
exec sybsystemprocs..sp_addlogin @loginame, @passwd
+
  (
    
+
    &lt;column-list-with-datatypes&gt;
-- Start a new transaction (only at the standby side)
+
  )
if  proc_role("replication_role") > 0
+
  primary key (&lt;column_list&gt;)
    begin tran
+
  replicate minimal columns
go
+
The replication definition is defined at the logical connection and so it will also be used after a "switch active".
  
* mark the new procedure for replication: sp_setrepproc sp_addlogin,"function"
+
==Configure for SMP systems==
 +
When replication server runs on a multiprocessor machine you can take advantage of the multi-threaded capability of RepServer. Run this command:
 +
configure replication server set smp_enable to 'on'
 +
A reboot is required to activate the setting.
  
That's all. Each execution of sp_addlogin in the myDatabase will create a login at the primary and the standby server.
 
  
Just a couple of remarks:
+
[[Category:RepServer]]
* The template given above is not complete with regard to the parameters that need to be passed to sybsystemprocs..sp_addlogin. It's just a template.
+
* even when a transaction fails on the primary side (for instance when the login already exists) it will be replicated to the standby side failing there as well. This will bring down the DSI connection. You can add some extra checks in the template to prevent common errors. To exit the procedure prematurely in those cases, do not give a "commit", but a "rollback".
+
* When you do not need to replicate table modifications in the myDatabase, do not mark the database with sp_reptostandby. This is to prevent unnecessary transactions flowing through the RepAgent to the Replication Server.
+
* This method has not been tested thorougly.
+
* Do not mark the sp_addlogin stored procedure in sybsystemprocs for replication. When you do this, RepAgent adds an implicit "begin transaction" before the execution of the stored procedure, causing it to fail.
+
* There is no need to change the Sybase supplied system stored procedures so this method is save with regard to upgrades.
+

Latest revision as of 22:01, 10 February 2010

Here are some rules of thumb to tune a replication server (mainly Warm Standby setup). They may not always be suitable for your environment, but it can be used as a starting point for further improvements. The list is not complete.

RepAgent

Some basic settings need to be changed to get a better througput from the RepAgent to the RepServer. Change these setting with the sp_config_rep_agent stored procedure and do a restart of the RepAgent afterwards.

  • "scan batch size", set to "10000"
  • "scan timeout", set to "5" (seconds) (not really a performance booster)
  • "send buffer size", set to "16k"
  • Check if "batch ltl" is set to "true".

With the sp_help_rep_agent stored procedure you can see if the RepAgent can keep up with the activity in the transaction log. Output of this procedure (with the "scan" option") shows the current and the end marker. The end marker is the last page/row number of the transaction log, and the current marker is the page/row number currently being scanned by the RepAgent. In the output of sp_help_rep_agent the markers are shown like this (example): (2954283,4). The first number is the page number and second one the record number on the page.

SQM

Do not spend a lot of time on this. Just make sure that the stable devices are on raw disk when your operating system supports it.

STS

Increase the value of sts_cachesize so rows of the RSSD system tables can be cached in memory. A fairly good indication of the value of this configuration parameter can be determined by adding the rowcount of all system tables and use the total number at the value. Add something like 10% to allow for growth.

configure replication server set sts_cachesize to '550'

Cache the system tables itself by running these commands:

configure replication server set sts_full_cache_rs_classes to 'on'
configure replication server set sts_full_cache_rs_columns to 'on'
configure replication server set sts_full_cache_rs_config to 'on'
configure replication server set sts_full_cache_rs_databases to 'on'
configure replication server set sts_full_cache_rs_datatype to 'on'
configure replication server set sts_full_cache_rs_diskaffinity to 'on'
configure replication server set sts_full_cache_rs_functions to 'on'
configure replication server set sts_full_cache_rs_objects to 'on'
configure replication server set sts_full_cache_rs_publications to 'on'
configure replication server set sts_full_cache_rs_queues to 'on'
configure replication server set sts_full_cache_rs_repdbs to 'on'
configure replication server set sts_full_cache_rs_routes to 'on'
configure replication server set sts_full_cache_rs_sites to 'on'
configure replication server set sts_full_cache_rs_systext to 'on'
configure replication server set sts_full_cache_rs_translation to 'on'
configure replication server set sts_full_cache_rs_users to 'on'
configure replication server set sts_full_cache_rs_version to 'on'

In RepServer version before 15.0 you could technically also cache the rs_locater table. Do not cache this table. A crash of Repserver can then lead to inconsistencies.

Also, you should be aware that while caching these tables will improve performance of replication in general and will reduce CPU utilization by repserver and dataserver, it can impact operations that cause the table caches to be rebuilt. Specifically, if rs_objects and rs_columns are cached, it will severely increase the time required to create, drop, or alter replication definitions. Repserver's CPU usage will spike during this time as it is forced to rebuild its cache.

One user experienced this after executing the above commands to cache the tables on RepServer 12.5. A support call to Sybase lked to un-caching rs_objects and rs_columns. From a private communication with Sybase support:

CR 500890 More efficient processing of changes to cached that are identified to sts_full_cache_<tablename>
Workaround:It is not necessary to fully cache tables.  STS cache automatically does that for RS and is quite efficient. 
If there are frequent changes to RS like adding/dropping repdefs and subscriptions, it is best NOT to fully cache those tables,
but to increase the sts_cachesize to hold more entries

The user's experience was restricted purely to slow adding/dropping of repdefs, whereas subscriptions were added/dropped with normal speed. Sybase support further communicated the fact that this will not be changed in later versions.

SQT

The most important configuration parameter for the SQT is "sqt_max_cache_size". To check if this parameter has been set too low, run an "admin who,sqt" at regular intervals and check if the columns "removed" or "full" contain a non-zero value. Be aware that the output of "admin who,sqt" also shows the DSI threads. So when "removed" or "full" are non-zero for a DSI thread, change the memory setting at the connection-level using "alter connection to <dataserver.database> set dsi_sqt_max_cache_size to '<new-value>'". Otherwise change the "sqt_max_cache_size" configuration parameter using "rs_configure" or (better!) "configure replication server".

It's best to start increasing "sqt_max_cache_size" before changing settings at the connection level.

For RepServer 12.6 do not increase "sqt_max_cache_size" to much. Oversizing the cache will in fact decrease performance. SQT cache is sufficiently sized if you do not observe, or observe only infrequently, transactions being flushed from cache because there is not enough room to store more in cache.

DIST

For standard warm standby setups you can safely disable the distributor thread (DIST) to save some unneeded overhead within Replication Server. When you want to define subscriptions using the logical connection (for instance for Multi Site Availability) you will need the DIST thread so either leave it as is, or re-enable the distributor thread when it has been disabled. Disabling the DIST can be done with "alter logical connection to <logical-ds.logical-db> set distribution off".

Replicating Stored Procedures

Using stored procedure replication can greatly improve the performance of a replicated environment. Before using this feature check if the tables that are modified through the stored procedure are also marked for replication. For warm standby environments this is usually done with the sp_reptostandby stored procedure. Since data at the replicate site is now modified through a stored proc, some side effects can occur.

  • Tables using identity columns will have a different value in their identity column on the primary and replicate site.
  • Inserted/Updated values from global variables (like @@spid), various system functions like getdate() will also differ between primary and replicate, causing a difference in data between sites.
  • You cannot use nested stored procedure replication (stored procedure a executes b, and a and b are both marked for replication). See http://www.sybase.com/detail?id=1039889 for recovery instructions when you got "RepAgent(n): Nested replicated stored procedure detected. Transaction log may be corrupt. Please contact SYBASE Technical Support.". It might be neccesary to run the recovery steps twice.

Check if this applies to your situation before using stored procedure replication.

Usually a stored procedure is marked for replication like this: sp_setrepproc "<procedure name>","function" There is no need to create anything else (like subscribtions or other stuff).

Modifying the RSSD using sql scripts

(from sybase.com): Every time you run a SQL script that modifies the Replication Server System Database (RSSD), you must shut down Replication Server before running the script, then restart it after running the script. This is because of changes to the way heterogeneous datatype information is cached.

Minimal Column Replication

With minimal column replication updates and deletes will be handled with greater efficiency by the RepServer. Before implementing this for a table check the following:

  • the table has a unique index
  • the columns in that index will never be changed by an update statement

Once you have verified this it's easy to implement minimal column replication. Here is a template:

 create replication definition <repdef-name>
 with primary at <logical-connection>.<database>
 with all tables named '<table-name>'
 (
   <column-list-with-datatypes>
 )
 primary key (<column_list>)
 replicate minimal columns

The replication definition is defined at the logical connection and so it will also be used after a "switch active".

Configure for SMP systems

When replication server runs on a multiprocessor machine you can take advantage of the multi-threaded capability of RepServer. Run this command:

configure replication server set smp_enable to 'on'

A reboot is required to activate the setting.