cancel
Showing results for 
Search instead for 
Did you mean: 

Table level statistics inaccuracy question

0 Kudos

"Some statistics, particularly cluster ratios, may be slightly inaccurate because not all page allocations and deallocations are recorded during changes made by data modification queries. Run update statistics or create index to correct any inconsistencies."

I am curious in what condition table level statistics like DPCR became slightly inaccurate and why the document is mentioning 'not all page allocations and deallocations are recorded during changes made by data modification queries.' so i did some research from Note,google,old document.. and also raised to SAP support but I can only get partial answers without expaination, anyway below is my understanding/summary for Table level statistics inaccuracy based on my shallow knowledge, please correct me/share me some light on it:

For Table level statsitcs(pagecnt/SPUT/DPCR):

1. Flush happens periodically by Housekeeper process.

2. Flush can also be triggered before scavenge process, but it's not guaranteed as deadlock can be happened.

If deadlock happened and after scavenge, and if DES is gone from cache, the new pagecnt can't be flushed into systabstats to reflect new value.

3. If a spin-lock on DES and at same time, first oam page for table is exclusively latched, pagecnt will not be flushed to systabstats this time, but can be flushed next time.

4. For performance reasons, the Tally Manager works in a non-transactional fashion, which can allow for some inaccuracy for some stats like pagecnt/dpcr.

-->I guess some page split happens during inserting but pagecnt should be recorded. But during roll-back, for perfromance purpose, those deleted pages is recorded with estimate values.

5. DES not in cache because of crashes: the new pagecnt can't be flushed to systabstats to reflect new value.

6. DES not in cache because of roll-backs: the new pagecnt can't be flushed to systabstats to reflect new value.

But I can't imagine how DES can be lost during roll-back..

7. SPUT "Space utilization uses the average row size and number of rows to compute the expected minimum number of data pages, and compares it to the current number of pages."

datarowsize is populated with an estimate which may be inaccurate, particularly if have variable width columns (for instance NULLable columns).

8. reorg rebuild will update accurate pagecnt, but does not update datarowsize in systabstats.

9. Updating the table statistics improves the datarowsize accuracy.

10. Spinlock contention on DES can slow down ASE but it's not the cause of inaccuracy.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

It sounds like it is referring to minimally logged operations like select into / (fast) bulk copy. 

0 Kudos

Hi Jason,

Right... these minimal logged DMLs are working in non-transactional fasion, which might use estimate values to pursue performance. Anyway, they are 'slightly inaccurate' which looks few people are really care about this.  Thanks for your answer Jason.

I will wait for another 2 days to close this discussion. Hoping 'minimal logged DML' you mentioned is the only reason for inaccurate table statistics(except special cases like deadlock/crash). That will be easier to remember for me.




former_member182259
Contributor
0 Kudos

Actually, there is another possible consideration....or two:

When updating a DOL clustered index, the datarow the index key points to doesn't move as it does in APL.   This can distort datapage cluster ratios if it happens a lot.

When doing large deletes on DOL, the rows are only marked for deletion.   The housekeeper GC is the process that comes around and rewrites the page or deallocates as necessary.   Unfortunately, each engine in ASE only has a HKGC pending queue of 50000 entries.....so if you overrun the HKGC queue (HKGC overflow in monOpenObjectActivity) then the housekeeper doesn't have a record of the overflowed records to delete.   Same thing with a crash or shutdown with nowait.   This is likely to leave the dpcr as if the deletes haven't happened yet - which could distort things a tad....or more than a tad if whacking huge amounts of data.

0 Kudos

Thanks Jeff and I'm honored to get your reply. That's exactly what I wanted to know.

Answers (2)

Answers (2)

0 Kudos

Thanks Jeff,Jason and Kevin!

kevin_sherlock
Contributor
0 Kudos

This all depends on locking scheme, but  I would say probably the greatest reason for Cluster Ratio inaccuracies over time come from the fact that dpagecrcnt/ipagecrcnt/drowcrcnt numbers are used in the calculation.  Remember that cluster ratios are "derived" statistics.  That is, they are derived at runtime from the other statistics in systabstats (which come from memory as you state).  The "*crcnt" values are measures of how often an "extent jump" is done while scanning a chain of pages.   I don't believe these are "in-memory" values that are kept up-to-date as DML occurs.

Add to this that pagecnt/leafcnt/rowcnt values can also have some accuracy issues (although seems like in only special circumstances) as you describe above, and the logical conclusion is that if you want THE most accurate cluster ratio calculations, you need to update statistics for the table/index (without sampling) you are interested in (or rebuild/recreate indexes).  I don't know of any other way to get *crcnt values updated than that.

Would be nice to have an option, if for instance, a DML operation that scanned the table/index would update *crcnt (in memory to be flushed later) as part of the scan.

Also, "rowsize" as you mention above is probably not maintained in memory either.  Would only be updated with update stats/index creation.