SAP Replication Server with two (correlated?) problems

We're replicating from one Primary ASE to three Replicate ASE's, using a MSA setup with two RS (PRS and RRS) on two sites. Every RS is "serving" two ASE's, and the setup is fully switchable, including a reverse of the replication flow. RS version is 15.7.1 SP102 and ASE version is 15.7.0 ESD#4.2.

I have noticed a slow increased amount of Memory Usage on both RS's, from a normal 3-5 % usage up to 30-40 % during a time of 2-3 weeks.

At the same time, I can see an increase on all three ASE's in the "number of open objects", displayed with sp_monitorconfig 'all'.

If I restart my connections or RS's, the Memory Usage drops down to a "normal" state on both RS, and the "number of open objects" also drops down to a "normal" state on all ASE's ! After this, it all starts ticking up again, slowly...

Did a bit of googlin' and found at least one known problem (the memory leak) https://launchpad.support.sap.com/#/notes/2221960/E that could be my type of memory problem, but I cannot find any reports having both of them or the "number of open objects" caused by RS...

The solution in the note I found is to upgrade RS (of course) and the workaround is to deactivate the RS option "dist_cmd_direct_replicate" at primary connection side or server-wide.

This parameter is however not described very well in the documentation, so I am curious about how it would affect any performance.

So my questions are;

  • Have anyone out there any experience from one or both of these problem in a similar MSA setup ?
  • If so, how did you fix the problem(s) ?
  • Have anyone any experience from changing the parameter "dist_cmd_direct_replicate", and the outcome from changing it ?

Please Advice

Thanks

/Mike

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

7 Answers

  • Best Answer
    Apr 12, 2017 at 01:00 PM

    Yeah, dbcc des should work (see this link - http://nntp-archive.sybase.com/nntp-archive/action/article/%3C473869a4@forums-1-dub%3E - for a discussion of the issue I mentioned earlier, plus notes on using dbcc des); alternatively you may be able to find the LWPs in monCachedProcedures (see Rob's comment at the end of that same link).

    -----------------

    I'd suggest you set dynamic_sql back to its default value of off, restart the repserver, then see if RS memory usage remain low, and also verify the ASE's number of open objects also remains low.

    With dynamic_sql = on, SRS will submit queries to the RDB via prepared statements; this is a performance-related setting which has a similar effect as enabling statement cache in the RDS (ie, dynamic_sql = on = queries are compiled once into LWPs and then re-used for subsequent query submissions). By setting dynamic_sql = off, you're telling SRS to *not* use prepared statements (which I'm guessing is the cause of your current issues - excessive memory usage and higher number of open objects counts).

    ***NOTE*** Make sure the replicate ASEs are configured to use a statement cache (literal autoparam = 1 ; statement cache size = XXXX), otherwise you'll likely see a sizeable performance degradation in the replicate ASEs due to the DSI connections having to undergo an expensive compilation phase for *every* query submitted to the replicate ASE.

    Add comment
    10|10000 characters needed characters exceeded

  • Apr 12, 2017 at 12:08 PM

    Have you always had this problem or did it just start recently? If the problem just started recently, have there been any recent changes to the repservers (eg, any config changes)?

    Are you seeing this problem with 1, 2 or all 3 ASE's?

    Have you tracked down the type of objects that are causing an increase in the ASE's number of open objects? (Are they by any chance lightweight procedures (LWPs) - aka temporary procs; in the past these showed up as procedures with system-generated names formatted like *ss...ss*)

    --------------

    Obviously (?) a memory leak in RS shouldn't cause an increase in the number of open objects in the target ASE.

    --------------

    What you've described (especially with regards to ASE) sounds like an issue I've seen when a client application fails to re-use (or clean up/delete stale) prepared statements => client memory usage goes up, ASE's number of open objects (related to an excessive volume of LWPs/temp-procs) goes up, and a restart of the client application 'fixes' the issue. (In your case the RS would be the client application).

    NOTE: Prepared statements are associated with an active database connection; a restart of the database connection (eg, bouncing RS; bouncing DSI) will cause all prepared statements (and associated LWPs) to be dropped, which in turn should lead to a drop in memory usage and number of open objects.

    I'd be curious to see what settings you have for the following configs (used by RS to manage an inventory of prepared statements submitted to the ASE) in both repservers:

    dynamic_sql

    dynamic_sql_cache_management

    dynamic_sql_cache_size

    Add comment
    10|10000 characters needed characters exceeded

  • Apr 12, 2017 at 12:34 PM

    Hi Mark,

    I actually do not know if we have had this problem before, as it is now, but I do know that we had one occasion a month ago on our Primary ASE when "Number of open objects" did ran out, then I could clearly see in the process list that a LOT of processes got into an "EXEC" state. But then it was a new application that did not reconnect properly.

    This time it is happening on all the three Replicate ASE's, and they have all got Standby licenses (SF), so they do not have any active connections at all.

    We have changed two things in January; I added a second license to both RS's, so they now run on two engines each and we moved the script that runs rs_ticket away from our Primary ASE now to be run from en external server instead of locally.

    Yes, the RS reboot and/or connection restart makes the number of open objects to drop !

    We are seeing the problem on all three Replicate ASE's.

    I have not tracked down the objects yet, is it "dbcc des" I should use and write the output to a file for analyzing it ?

    Both my RS's are configured equally, and the config is;

    1> admin config, 'dynamic_sql'
    2> go
    Configuration Config Value Run Value Default Value Legal Values Datatype Status

    ----------------------------------------------------------------------------------------------------------------------------------------------
    dynamic_sql on on off list: on,off string Connection/route restart required
    dynamic_sql_cache_management mru mru fixed list: mru,fixed string Connection/route restart required
    dynamic_sql_cache_size 2048 2048 100 range: 1,65535 integer Connection/route restart required

    (3 rows affected)
    ===============================================================================

    Cheers

    Mike

    Add comment
    10|10000 characters needed characters exceeded

  • Apr 12, 2017 at 01:32 PM

    These are the types of objects that are increasing according to the output from dbcc des;

    objname=*sq2136322818_0055194197ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2136334493_0101857914ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2136477912_0610685807ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2136776425_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2136894352_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2136895423_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137011208_1500200008ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137062192_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137376664_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137494591_0679312931ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137506266_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137546646_0169304417ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137806921_0608524408ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137869580_0590626820ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2137870651_1617653249ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138170235_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138210615_0273550946ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138233965_1225473354ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138430510_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138431581_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138534620_1969750368ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138651476_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138690785_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138808712_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)
    objname=*sq2138951060_1820207394ss* dol_rowno=0 dol_vattcnt=0 dol_status: 0x00: (0x0000)

    Currently there are about >15.000 of them and still increasing...

    I do not recall why we turned on the "dynamic_sql", since we have had it that way more or less for years, but from my setup script there is a note saying that is should be related to the "dsi_bulk_copy" parameters ?

    And for your information, the "literal autoparam" is set to 1 already and the "statement cache" = 4096 !

    I will go ahead and try the change in our test environment

    Thanks for your suggestions

    Cheers

    Mike

    Add comment
    10|10000 characters needed characters exceeded

  • Apr 12, 2017 at 01:41 PM

    Yeah, those look like LWPs, and the increasing numbers of them sounds about right when either a) the LWPs are not being re-used (a bug in SRS?) or b) there are a huge number of unique queries (ie, SRS not able to re-use prepared statements).

    -------------

    *shrug* I'm not sure what (if any) relationship there is between dynamic_sql (use of prepared statements and the creation of related LWPs) and dsi_bulk_copy (uses the bulk copy libraries - think bcp - for large batches of insert statements)... will have to sleep on that one ... *shrug*

    -------------

    Your (ASE) statement cache settings look good. I'd keep an eye on sp_sysmon results; just after the data cache section is section for stored procs; if you see a large number of proc drops/creates then you may want to increase statement cache size in order to reduce the volume of proc compilations. ("Duh, Mark!" ?)

    Add comment
    10|10000 characters needed characters exceeded

  • Apr 13, 2017 at 02:48 PM

    On the off chance your repservers really are generating 15K+ different/unique queries, you'll want to keep an eye on the ASE statement and procedure caches as well as number of open objects.

    re: statement cache - you'll basically be moving 15K+ prepared statements from the client into ASE's statement cache; as mentioned earlier, use sp_sysmon to keep an eye on statement cache thrashing

    re: procedure cache - you'll basically be replacing 15K+ LWPs (related to prepared statements) with 15K+ LWPs (related to entries in statement cache); net result is that proc cache usage would likely remain similar

    re: number of open objects - I'm not sure if entries in statement cache count against number of open objects, but the LWPs definitely will; net result is that you could still see number of open objects jump up

    Again, just something to keep a look out for if you disable the repservers' use of dynamic sql (aka prepared statements).

    --------------

    Also, if you do end up with 15K+ different/unique queries, I'd suggest tracking down said queries and try to figure out if the volumes make sense ... but that gets into a whole 'nother discussion if you do find repserver is submitting 15K+ different/unique queries ...

    Add comment
    10|10000 characters needed characters exceeded

  • Nov 27, 2017 at 01:50 PM

    Mark, I have totally forgotten to update this thread with the findings and my result after the changes in our RS's !

    By turning off the "dynamic_sql" globally in our Replication Servers, both our problems has now disappeared;

    - Replication Server memory usage is back to normal

    - Number of open objects is no longer increasing up to the limits anymore

    Problems solved

    Thanks

    Cheers /Mike

    Add comment
    10|10000 characters needed characters exceeded

    • Good to hear you fixed the symptom but curious ... did tech support ever come back with an explanation? [Would've been nice if they had found the root cause and/or confirmed the issue is addressed in a new RS version.]

Skip to Content