Skip to Content
0
May 15, 2013 at 07:56 AM

Primary key scan faster than matched index scan for ORDER BY clause?

115 Views

Hi experts,

Please help me to explain and understand performance cases described below. Hope this will be lesson learned for me and others. I tried to improve time execution of query and I found interesting outcome, worth to share it. 4 questions from me are marked inside text.

Case 1:

  • There is a table PA2002 that has over 62 mln of records.
  • 39 598 distinct values for PERNR
  • It has primary key PA2002~0 that consists of 8 fields: MANDT, PERNR, SUBTY, OBJPS, SPRPS, ENDDA, BEGDA, SEQNR
  • There is a query with WHERE condition on fields MANDT, PERNR, BEGDA <=, ENDDA >=
  • There is ORDER BY clause in same sequence order as PRIMARY KEY PA2002~0 fields
  • Execution time of query took over 9 seconds, see execution plan:

We see that required steps are:

  • 1 - Get data from table using part of primary index (MANDT and PERNR) and then filter content based on predicates BEGDA and ENDDA.
  • 2 - From this result set just find index by rowid.
  • No need of sorting for ORDER BY clause.

QUESTION 1: Does it mean that values in primary index are always sorted according to key sequence (I think it is from primary key definition)?

Case 2:

Total costs does not look so bad, but the duration 9 seconds was something I tried to improve. So I created secondary index, dedicated for query fields MANDT, PERNR, BEGDA, ENDDA so the WHERE in query matches exactly. Below are results.

  • Index PA2002~Z01 was automatically chosen by optimizer.
  • Time is reduced from 9 to 3.5 seconds.
  • However if we look at the explain execution plan, it theoretically looks much worse (IO costs are 1 not 2, but all in total give value 4, CPU costs seems to be very high now):

  • Now in addition next step appeared - SORT ORDER BY - and it has high CPU-Costs, over 15 mln.
  • Wider part of index is used - 3 fields and not 2: MANDT, PERNR, BEGDA (ENDDA is filter predicate which means that index result set is still filtered by this value).

Statistics about indexes and data distribution for PA2002 table:

QUESTION 2: If there are two or more "<=" or ">=" comparisons in query, index can use only one field (as BEGDA in case above), right? Same effect would be if Z01 index contained only MANDT, PERNR and BEGDA? What is best index proposal?

QUESTION 3: As we see final general estimation is 3, estimated rows count 4, same as in first case. However time is reduced much (almost 3 times less). I read that we should not compare estimations to real duration, but if we try, this execution plan looks worse but gives better result. Why it is so? Does the index improves reading so much?

Case 3:

If we assume that index speeds up query execution so much, I wonder why optimizer is not choosing same execution plan for other table, same query and same conditions (only amount of rows is much less):

  • PA2001 table has now over 6.7 mln records
  • 30 777 distinct values for PERNR
  • Even if same index Z01 exist, optimizer selects query based on index PA2001~0:

Table statistics:

QUESTION 4: Why optimizer does not choose secondary key Z01 which matches better WHERE condition for PA2001, and it chose Z01 for PA2002 table?

I have one guess:

- PA2001 has 6 793 233 rows, divided by number of distinct persons PERNR 30 777 gives 221 rows in average for person. Filtering by BEGDA and ENDDA date is assumed to be faster than final sorting when Z01 key would be used (using primary key does not require 3rd step SORT).

- PA2002 has much more rows - 62 126 700, divided by number of distinct persons PERNR 39 598 gives 1 569 rows in average for person. That is why optimizer assumes it is faster to search by index and then sort it finally.

I wonder if using index Z01 would be give faster execution for PA2001 as well, but I cannot check it as this is SAP standard code so I cannot add hint to query.

Thank you in advance for any hints and possible improvements suggestions.

Regards 😊

Adam