Skip to Content

Sybase ASE heavy I/O during moderate COMMIT

Feb 02 at 04:56 PM


avatar image

In summary, our question is: Does COMMIT re-read log pages?

We ask because we have a performance problem whose explanation seems to be that.

Our night batch has a period when several processes perform heavy bulk inserts. They don't use bcp but a vendor's application using Open Client Bulk-library routines. Transactions have a size of 200 thousand to 600 thousand inserted rows each and perform slow-bcp. sysprocesses most common values for cmd are BULK INSERT and COMMIT TRANSACTION. Sometimes, the duration of COMMIT increases a lot. We studied one case that we think that behave as the other, less studied cases. That process opened a transaction but had to wait on a lock. When the lock was released and it continued, its COMMIT took a long time, 20 minutes or more. During this COMMIT, sysprocesses shown very high values in physical_io (567293 was the highest), but we found no explanation on statistics reported in ASEMON's <servername>_CnxActiv table. Since we have neither 'statement statistics active' nor 'per object statistics active' set, we have few I/O statistics at hand. During the COMMIT, sysprocesses.status switched between sleeping, runnable and probably others.

That was a very bizarre case. Work to be commited was moderate, less than 300 K inserted rows. Why so many I/Os? Too many for the writing of log and dirty pages. Our theory is that ASE did read a large portion of syslogs, from the start of the transaction to present. It amounted to several gigabytes due to concurrent transactions consuming a lot of log at high speed, and it was much larger than usual because the lock wait gave more time to others to write more log. But, in theory, there is no need for Sybase ASE to read that log. We have no replication and no triggers, which could be two common reasons for re-reading log pages.

Has anyone seen this before? Any advice?

ASE 15.7 SP 52 64-bit on SPARC Sun Solaris

Thanks in advance,

Mariano Corral Herranz

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

1 Answer

Mark A Parsons Feb 03 at 03:49 AM

If you happen to have a smallish (log) cache(let) then you could be seeing earlier transactional activity being flushed to disk, just to be read back in as part of the commit operation.

This could especially be the case for a small cache, a large transaction, and a deferred operation (and trigger code) that has to re-read the log pages (long since flushed to disk) in order to 're-play' the deferred operations (or in the case of a trigger, read the inserted/deleted pages from the log).

A similar situation can occur if your database happens to be a primary database in an active replication setup, and the repagent has been down/stopped for a period of time. In this scenario the secondary trunc point, and most of the transactions that occurred since the last time the secondary trunc point was updated, may have been flushed to disk ... said log pages then have to be re-read into cache when the repagent is started up (ie, the repagent has to go find the secondary trunc point and then read/process all the log activity since then).

Assuming you've got sp_sysmon running at the time, and assuming this is an issue of having to read log pages from disk, you should (hopefully) be able to see a high volume of reads on your log devices (in sp_sysmon's detailed disk IO section). [Depending on which collections you have active, I would think ASEMON should also show a (relatively) high volume of reads on your log devices ... ??]

And, of course, as a general rule ... if you routinely see (largish) volumes of physical read IOs on your log devices, then you may want to take a look at what you can do to reduce the need to (re)read log pages from disk, eg, increase the size of your log cache, work with developers/application-teams to reduce the size and duration of transactions (yeah, I know, wishful thinking).

Show 1 Share
10 |10000 characters needed characters left characters exceeded

Thanks, Mark. Good idea looking at per-device ASEMON statistics. They show, as expected, a very high volume of writes, much higher than reads, typical of bulk copying. On log devices writes outnumbered reads by 8 to one: about 8000 writes/sec vs 1000 reads/sec. Very few reads on data devices during that period. These statistics are moreoless consistent with our hypothesis of COMMIT re-reading log pages; not a clear proof, but consistent.

Cache pools are large: 8 GB for the 4-K buffer (page size is 4 K) and 10 GB for 32. I cannot remember the log io size at that moment (we changed it from 4 to 32 at some point.) I think large caches explain why writes outnumbered reads: ASE didn't need to re-read most of the log pages from disk as many were in cache. The figure for reads from log devices was very uniform during nine consecutive ASEMON statistics periods, at 950 reads per sec; we conclude this was the disk speed, although it looks low to me.

Logsegment is really huge, 40 GB.

None of the reasons we can imagine for re-reading the log were present: no triggers, no Replication Server replication, no deferred updates or deletes (transactions did mainly bulk inserts)

When bulk inserts ran more smoothly, with fewer concurrent transactions, writes were accordingly lower (moreoless at half) but reads were much lower (one fifth.) Probably due to cache holding most of the requested log pages.