Solved: How long does it take to compile a DB2 access plan...

former_member598328 · ‎12-13-2018

I have coded a function module to extract data from the totals table, faglflext. I expected it to be fairly slow to run, as I have a where condition based on rassc, which is not the first column in any index. However, I found an index scan of an index containing rassc (as the last column) performed adequately in a test environment which contained around 98% of our production data. The function module is used infrequently, and a response time of several minutes is acceptable. Unfortunately, the first run in production was unacceptably slow, at around 30 minutes; running again an hour or two later took only seconds. Could it possibly take that long to build an access plan?

I noticed in testing the first time I executed the query after changing it, or when running it after a gap of perhaps a day or more, it would take several minutes, but it could be run again soon after in seconds.

The table is not buffered, and in any case, if I understand correctly, buffering would not be used because the query joins several tables. For additional reassurance the difference was not down to buffering, I added the BYPASSING BUFFER option; run time was still much faster after the first time.

I understand the first run may need to build the access plan, the plan is stored in a cache, and may be lost if the query is not reused for a while. I assumed this was the reason for the longer first time run, and implemented the code in production.

In production, the first query, on a quiet system at the weekend, ran for around 30 minutes. When repeated an hour later, it returned the same data in seconds. The table contains over 80 million rows, but only 10 were selected. Running again to select around 700 rows for a different trading partner also ran in seconds. Looking at the access plan for the long run, the query was executed as expected.

Could it really take half an hour to build an access plan for a query joining four tables? I could rewrite it as separate queries, finishing with a FOR ALL ENTRIES instead of the joins, but I am not sure that will help, as in the test environment (with prod-like data) performance is slightly worse.

Are there any other reasons the first run would be slower?

Why might the first run take so much longer in production than in test, even with almost the same data, when subsequent runs are similar in the two environments?

Any suggestions appreciated.

Frank-Martin · ‎12-13-2018

Hi Antony,

a prepare of a very complex SQL statement may take some seconds but I do not expect it to prepare for minutes. You can check this if you find the SQL statement in "DBACOCKPIT -> performance -> SQL cache" . Change the default layout and include column "Preparation Time".

You described that the query can not use RASSC as the first column in an index. Does the optimizer use the predicate on this column only as a "sargable" predicate or does it use a JUMPSCAN ?. Only predicates that can be used as START/STOP predicates during an index scan limit the number of index pages that need to be read. If the predicate is only used as a "sargable" predicate the query probably needs to do a full index scan to apply the predicate. This may be very inefficient depending on the size of the index.

Based on the symptoms you describe I would conclude that the first query runs slow due to a buffer pool warm up effect. The next execution may benefit from data and index pages that are already in the bufferpool. You can also check this looking at columns "Data Phsical Reads" and "Index Physcal Reads" in "DBACOCKPIT -> performance -> SQL cache" .

Maybe you can provide some more details about the access plan of your query.

Regards

Frank

Frank-Martin · ‎12-14-2018

Hi Antony,

I see. Without good index the query may not get much faster. If table sizes are larger on prod the buffer pool warm up effect may be just larger.

There is nothing obvious in the query that can be tuned. The duplicate redundant predicate on rldnr does not help but should not hurt much either. The strage clause "(SELECT Q1.$C0 FROM (VALUES 0) AS Q1 WHERE (? = ? SELECTIVITY 1.000000))" may be a result from this.

Regarding the OR part. Yes, Db2 may split this into seperate scans.

Regards

Frank

Frank-Martin · ‎12-14-2018

Hi Antony,

yes. No buffering is used on application server side. The data bufferpool is the place where DB2 keeps the data in memory to avoid reading them from disk over and over again. If a query that needs o lof a data or index pages for the first time if may be much slower since the pages need to be read into the bufferpool from disk. If it is executed again most of the data may still be in the bufferpool depending on the bufferpool size.

Your ratio "BP Gets/ Rows Processed = 65,899" and "Rows Read / Rows Processed = 20,322" does not look good. For each row that needs to be looked at Db2 needs to do 65,899 buffer pool look ups. And most of the rows that are processed turn out to be irrelevant sund must of them are not returned by the query. Those are indicators that no good index exist.

Looking ot your IXSCAN output currently only index columns RCLNT and RBUKRS can be used efficiently as START/STOP predicates. The predicate on RLDNR and RASSC can only be used as "sargable predicates". "sargable predicates" do not limit the number of index pages that need to be evaluated. Therefore they are less efficient .No JUMPSCAN is used. A JUMPSCAN may not be attractive if the index columns between RCLNT, RBUKRS and RASSC have a high cardinality.

I wonder why there is an additional "((Q6.RBUKRS = ?) OR (Q6.RASSC = ?))" predicate in the index scan. Can you past the ABAP statement and the SQL statement text?

Regards

Frank

How long does it take to compile a DB2 access plan?

Accepted Solutions (1)

Accepted Solutions (1)

Answers (2)

Answers (2)

Re: Building SAP Asset Manager Client (MDK-23.8.7 ...

Re: Building SAP Asset Manager Client (MDK-23.8.7 ...

Re: DRC e-Cockpit Add a New Button

Re: DRC e-Cockpit Add a New Button

how can we mention virtual host in amqp adapter