Skip to Content

Anyway to make transactions not complete until log records make it to the repserver?

Jul 31, 2017 at 02:37 AM


avatar image

(note, I'm using the word "primary" here for what Sybase typically calls the "active" database)

We occasionally want to do a "switch active" for a routine hardware failure or OS problem (tell the repserver to use what *was* the warm standby db as the new primary db)

But with the typical replication setup, transactions in the old primary database's transaction logs could be lost when we switch active when the old primary server down.

Anyway to set things on a primary ASE database so that transactions in that database don't commit until they've finished replicating into the repserver?

I understand the tradeoffs here, a problem in the repserver could block access to the primary db (or, at least, slow it down). And this isn't guaranteeing data made it to the replicate db's, but just to the repserver. But I think it's a feature that would be useful in certain environments.

Thanks in advance
Ben Slade

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

3 Answers

Mark A Parsons Aug 01, 2017 at 02:28 PM

If you've got the money, can wade through the convoluted setup, and don't mind the synchronous 2-phase commit ... you could implement the ASE/SRS HA/DR solution (aka synchronous rep; aka streaming rep). 'course, that's going a bit further than what you want (ie, you have to wait for the txn to complete on the secondary database before it's marked 'complete' in the primary database.


The next idea would be to beef up the repagent/SRS connectivity and throughput by tweaking the repagent and SRS configs. Repagent configs would include 'priority' (set to 0?), 'retry timeout' (set to 1?), 'scan timeout' (set to 1?). SRS configs ... you'll likely need to opt for the SRS/ASO option to allow for use of some of the newer performance-related configs. And of course, work with the hardware/network guys to make sure you've got the shortest/fastest network hop from repagent to repserver.


Of course, none of the above really matters if you've got a batch process spitting out several 10's of millions of DML commands in a short period of time, with those more-important txns getting stranded in the pdb after you perform that unplanned switch.

In this scenario you may have to live with the fact that you're going to lose some transactions in the event of an unplanned switch (active).

However, what you might want to consider is whether or not you can break your transactions into categories, eg, must-get-replicated-at-all-cost, would-really-like-to-make-sure-these-get-replicated-but-not-the-end-of-the-world-if-they-dont and these-can-be-easily-resumed-in-the-new-primary-db. If you can do this then might want to look at multipath replication, with the main objective being to route the 'high priority' transactions over their own path (so they're not held back/up by huge-txn-count maintenance and batch operations.

'course, multipath rep also requires the SRS/ASO option, but it's a lot easier to setup and doesn't suffer from the performance overhead of synchronous 2-phase commits (aka, ASE/SRS HA/DR always-on, synchronous rep).

Show 1 Share
10 |10000 characters needed characters left characters exceeded

Yeah, the ASE/SRS HA/DR solution is too heavy/expensive for what I'm looking for. I just want synchronous from the primary ASE server to the first point in the repserver where it's safely stored (but not necessarily made it through all steps in repserver)

My understanding is that, someday, the rep_agents will starting using a new communications protocol related to SRS HA/DR implementation, but that new communications protocol would be used even if you're not using SRS HA/DR. Maybe at that point, some new rep agent options will show up.

Anyway, thanks for the detailed writeup.

Avinash Kothare Jul 31, 2017 at 07:10 PM

SAP/Sybase Warm Standby replication is asynchronous mechanism.

So active (primary) and standby (replicate) are loosely connected via replication server in between.

As such there is always a chance of loosing a few transaction(s) on active side if switch is abrupt.

To minimize transaction loss in a controlled switch you can

- first cut off all user connections on active side (no new DML coming in),

- have your mechanism (or rs_ticket ) of sending last transaction across (all TXN have gone from active to standby)

- verify that last TXN arrived (Standby matches active),

and then do the active standby switch.



Show 1 Share
10 |10000 characters needed characters left characters exceeded

Re: To minimize transaction loss in a controlled switch you can...

Yes, I understand, but my question was about an uncontrolled switch.

Ben Slade Aug 23, 2017 at 06:16 PM

Answering my own question....

Could you:

  • open a transaction in a primary db.
  • do a million DML commands (or 1 DML command that affects a million rows) in that transaction
  • hold the transaction open
  • on a separate connection, run rs_ticket in that primary database
  • once the rs_ticket command makes it to the replicate database (shows up in rs_ticket_history), commit the transaction in the first connection?

I ran a quick test and it looks like the rs_ticket command causes the open transaction log entries in the first connection to get written to the repserver, even though they don't get applied to the replicate db until the commit transaction command gets replicated.

Show 2 Share
10 |10000 characters needed characters left characters exceeded

The repagent tends to forward all txns to the repserver as it reads the pages from the log, though I've seen some instances where a small open txn (eg, a single row insert/update/delete) doesn't get forwarded until some other activity gets written to the txn log. [I'm guessing it has more to do with whether or not a new page is allocated to the txn log thus allowing the repagent to flush/complete the current page and move to the new page.]

NOTE: A possible exception might exist if a txn's begin tran and rollback tran records reside on the same txn log page, with the idea being that the repagent is smart enough not to forward this particular txn to the repserver; this is just a guess on my behalf ... would have to get someone with repagent internals knowledge to confirm/deny.

In your test if you perform that 1mil row txn (but don't commit it; and don't send the rs_ticket) and wait a little bit (10-30secs), you should be able to see the 1mil row txn in the repserver via sysadmin dump_queue.

Since the default behavior of the repserver is to maintain the order of txns based on their *commit* time, repserver will hold onto the open 1mil row txn while allowing the rs_ticket txn (considered to be a committed txn) to continue on its merry way to the rdb.

'course, there's no real magic going on here ... the txn log is processed as a (fifo) queue so the 1mil row txn will be processed first by the repagent, followed by the rs_ticket; if, during the processing of that 1mil row txn, something goes awry (ASE shuts down; SRS shuts down), then the 1mil row txn *and* the rs_ticket will be stranded in the pdb. ("Duh, Mark!" ?)


Agree with Mark.

RepAgent is just sending the transaction log records (committed or not) to stable queue in a chronological order to optimize data transfer.

Come the time of "applying" these on Replicate side and only commit order is honored.

So rs_ticket or any other single row committed insert/update in a replicated user table will be applied on the replicate side before the "big open transaction" still waiting to commit/rollback.

You could write tandem transaction where rs_ticket or equivalent single row committed insert/update can reach replicate first and checking that on Replicate you can commit your application transaction on Primary.

Just note this is still asynchronous operation and subject to replication latency.

We use similar mechanism for switching our DSS reporting application that shifts connection from REPLICATE to PRIMARY based on a time stamped row replicated every minute and arriving with reasonable latency.