cancel
Showing results for 
Search instead for 
Did you mean: 

Replication Server 15.7.1 outbound queue segments accumulating

Former Member
0 Kudos

Hi

architecture: 2 Sybase ASE 15.7 SP62 with a replication server 15.7.1 sp304 on solaris 10.

4 databases replication using MSA

During business hours, the outbound queue of our biggest database is accumulating up to thousand segments. At the end of the day it is catching up.

during the peak hours, I launched a rs_ticket to follow 76 little inserts in a dummy table. In the result below, the problem seems to be between the DIST and the DSI:

HEADER BD PDB EXEC exec_b DIST DSI RDB NB SECONDS

---------- --------------- ------------ ------------ ------------------------- ------------ ------------ ------------ -----------

START db1 16:23:24:970 16:23:24:946 11720904356 16:23:24:980 18:27:14:596 18:27:14:846 7429

END db1 16:23:25:000 16:23:24:976 11720954638 16:23:25:163 18:27:14:830 18:27:14:886 7429

We tried to modify lots of different sqt parameters on the dsi connections but no luck.

My next step would be to analyze the stats counters but I am a newbie in this area

Any ideas how to proceed with that issue?

thanks. V

ince

Accepted Solutions (0)

Answers (4)

Answers (4)

Former Member
0 Kudos

The problem is fixed. There is no delay.

To summarize, what helped is:

- Less transactions against the same "allpages"-locked table

- the threaded kernel seems to be more efficient to run queries and benefit from the statement cache.

Former Member
0 Kudos

Hi Mark,

To answer your previous question, there was no excessive cpu usage.

I did 2 things that seems to help a lot (no delay this morning):

1- Thanks to the MDA tables, I identified a huge amount of logicial I/Os against the same table (13 millions of rows). I reported this problem to the application manager. He disabled a flag in the application that was impacting this table and cleaned the 13 millions rows to 3 rows (big cleaning indeed!). It was generating so much DELETE statements against the standby ASE that the replication could never catch up...so we dropped the subscription/flushed the queue and recreate the subscription with dump marker.

2- I also switched the kernel mode from "process" to "threaded" and defined a syb_default_pool pool with 14 engines

Result: This morning there is no delay in replication. Let's see if it the case in coming days.

Thanks a lot for your guidance.

Vincent

Former Member
0 Kudos

To keep you posted:

- no ZFS compression/encryption/snapshot ==> ok

- rs_ticket in a quiet period compared to a heavy period shows that a slower DSI duing peak time

- interesting fact: during quiet day (lower number of inbound segments) no delay (sorry for the new facts as I just joined the company)

- I am going to install a scrpt to collect the "big" queries from the MDA tables on the RDB

- I noticed some slowdowns on the network between repserver and RDB (pings going from half second to 3 seconds from time to time + ssh takes a long to connect to RDB UNIX server). Could it be an issue or is it just slow only when initiating the connection between repserver and RDB?

To summarize: my 2 paths of troubleshooting: determine the slow queries on RDB and see if the network could be an issue.

thanks,

Former Member
0 Kudos

Hi Mark,

Thanks for your answer!

1- The delay is just for one database.

2- I am going to run the rs_ticket tonight when the replication is quiesced. We will then be able to compare.

3- On the RDB, the transactions are simple 'INSERT/DELETE/UPDATEs'. No stored procedure is called on RDB (only on PDB). No triggers and no computed column functions being fired.

4- On the RDB,

+ I ran sp_object_stats for 5 mins against this problematic database if there was any lock contention on some tables but none was reported.

+ I looked at our sp_sysmon reports but could not find anything abnormal but I don't consider myself as an expert to read sysmon reports either.

+ I looked at the statement cache hit ratio ==> 100% I did not see any re-use with sp_monitorconfig.

looking at cpu/io performance of replicated Xacts is good idea...I could use the MDA tables to identify the "long"/IO/CPU-consuming Xacts or even the sp_sysmon with the right option. any advice on how to capture the replicated Xacts taking a long time in the RDB?

Other point I wanted to add is the repserver is using stables devices stored in a ZFS filesystem...I wonder if this could affect the performance.

Note: the repserver is using eRSSD (SQLAnywhere).

Thanks, Vincent