I am replicating data from sourcedb1 (syb ASE AIX 1254 2k pages) to targetdb1 (syb ASE linux 1254 2k pages) and recently we added targetdb2 (syb ASE Linux 157 8k pages)
targetdb1 only contains a subset of table and targetdb2 is a warm standby database.
When I execute larger transactions, there are executed fast on both target servers, no issues.
But the inbound queue of my LDS keeps this transaction long time and is using disk space. My monitoring script assumes that there is a large backlog. 150000 record inserts arrives in seconds on both target servers, but takes over 15 minutes to be removed to be removed from the inbound LDS queue.
How long does replication server stores this information in the inbound stable queue? And is there a way to control this?
Best regards,
Toon Eysakkers
Hi,
Data in the inbound queue, regardless of whether it is warm standby or non-warm standby, stays in the queue until the oldest transaction in the queue is successfully applied at the target (generally) and the "save_interval" is met.
The more technical answer also includes ...and when the SQT & SQM threads are given time to do this cleanup work.
There is another question here. When you added the warm standby, did you mark the tables with sp_reptostandby ? If so, transactions for all tables (and optionally DDL and sp* procedures) are now getting into the inbound queue, whereas before, they may have not. Therefore, legitimately, you might actually have more data in the stable device before RS applies the commit to the standby database, which makes the segments eligible for release of the stable device space.
The inbound queue is NOT in transaction order. Therefore, a transaction that has a lot of other transaction work recorded during that transaction, will not release the stable device space until the first transaction has been committed (and this is relative). You will be able to see this by doing an admin who, sqt as the display shows you where the oldest open transaction starts, from the SQT point of view.
If this only started happening after you added the warm standby, and you had always had the database marked with sp_reptostandby, make sure that after initialization of the warm standby, you have resumed the ACTIVE DSI as well as the standby DSI.
There is also the possibility that your RS is now so busy, that the SQT and SQM don't check often enough to see if there are segments that are no longer needed in the stable device.
I'm assuming that you have already taken a look at the RS log and see no warning or error messages.
I would suggest opening up an incident and let someone in support take a look at the issue.
If there's no true backlog (eg, rs_ticket goes through right away) then (off the top of my head) I can think of a couple reasons for a delay in freeing up the queue's segments:
- DSI save interval has been set (config name = 'save_interval')
- DSI has been configured to allow non-blocking commits (config name = 'dsi_non_blocking_commit'); behind the scenes this implements an implicit save interval for the connection's recently processed segments
I'd want to get a detailed look at the queue's used segment count. Try running the following in the RSSD:
=======================
-- NOTE: For segments allocated to routes you'll need to join to
-- rs_sites (instead of rs_databases)
select db.dsname,
db.dbname,
s.q_type,
-- all segments that have been allocated to a queue
sum(case when s.used_flag >= 1 then 1 else 0 end) as alloc_count,
-- used but not-yet applied segments have used_flag = 1
sum(case when s.used_flag = 1 then 1 else 0 end) as used_count,
-- applied but awaiting cleanup segments have used_flag > 1
-- actual number represents when segment can be deallocated
sum(case when s.used_flag > 1 then 1 else 0 end) as saved_count
from rs_segments s,
rs_databases db
where s.used_flag >= 1
and s.q_number = db.dbid
group by db.dsname, db.dbname, s.q_type
order by 4 desc
go
=======================
If used_count is (relatively) large then you have transactions that haven't been applied to the RDB yet.
If saved_count is (relatively) large then your connection has been configured to save the transactions (on those segments) for some period of time.
Message was edited by: Mark A Parsons; added alloc_count to query
Add a comment