cancel
Showing results for 
Search instead for 
Did you mean: 

Checking consistency of a transaction executed on a DSI connection.

sladebe
Active Participant
0 Kudos

Where I work, we've had a long running problem where high volume replication of large varbinary columns (~16k) cause the repserver to occasionally crash. When the repserver crashes we've noted occasional lost varbinary rows of data within a successfully applied transaction (to be exact, it's a lost replication command within the transaction. But typically, it's a lost insert for us). Over the years, we've opened several tickets (er, incidents) with SAP about this, but haven't been able to track down the cause.

The problem is exacerbated by the fact that the 64 bit version of the repserver for Linux seems to have stacktraces disabled (CRs 811377, 341442), so it's very difficult to identify the code causing the problem.

FYI, the crash/data loss problem seems to be load related. We've worked around this problem (mostly) by getting rid of routes, turning off SMP threading, and splitting up replication into separate repservers (for different databases) so there's less load for each repserver. We also run data check scripts looking for missing rows.

Note, we've also seen similar "missing insert within a successful transaction" data loss when doing a normal shutdown of a repserver, while replication was active. So now we do a 'suspend log transfer from all" and let the repserver drain before shutting down a repserver.

My question is, is there anyway to have the repserver check the consistency of a transaction (ie., a group of inserts/deletes/updates within a begin transaction/commit transaction set of commands). For example, a check on the number of RCL commands applied for the transaction.

Note, setting dsi_row_count_validation doesn't help, since the insert command is never run.

Thanks in advance.
Ben

Accepted Solutions (0)

Answers (2)

Answers (2)

sladebe
Active Participant
0 Kudos

FYI, there is now a Knowledge Base Article (KBA), number 2681937, that talks about the missing repserver stracktrace problem described above. Basically, it's fixed in RS 15.7.1 SP305 or above (CR 794854)

bonusbrevis
Participant
0 Kudos

Have you tried SQL Statement Replication?

Repagent sends both the DML statement and exec rs_sqldml @rs_rowcount= to the logical inbound queue. DSI checks whether affected row count is matched, if row count mismatches, RS returns error 5186 and DSI is suspended by default. You can then handle the error accordingly. This assumes the problem is in the generation of the RS commands. If what fails is the application to the target, then it probably won't help. It's not really a check per se, but it may help in your situation. I don't have large varbinary columns in our setup, our larger fields are of the TEXT datatype instead, so I have not seen the behavior you mention.

My other thought is that maybe the Data Assurance Option may have a feature that would help you with this, but I have no experience with it.

sladebe
Active Participant
0 Kudos

Thanks for the suggestion. My understanding is that SQL Statement Replication, although it can be very efficient in some situations, won't always guarantee consistency on the replicate, depending on the type of activity. Is this a correct?

bonusbrevis
Participant
0 Kudos

I have never seen any instance where the rowcounts go out of sync using Statement Replication in our implementaiton of it.. What I have seen is that at times the execution of the sqldml fails and suspends the DSI because of two other issues that crop up on our implementation now and then - 'set dsi_quoted_identifier' and the character set switch we did from iso-1 to utf-8 a couple of years ago. Seems that at times the sqldml fails and it appears as a syntax error in the RS log and a suspended DSI. We just disable statement replication until the offending transactions pass and resume it again. This happens perhaps four times a year, so it has not been worth investigating.