cancel
Showing results for 
Search instead for 
Did you mean: 

Possible I/O Performance Problem - 'direct path read'

Former Member
0 Kudos

We are encountering intermittent performance problems in our ECC system and have found that there are occassionally ver high I/O response times. In particular, we identified high 'direct path read' for a session analysed.

A user reported a problem scrolling in transaction MIRO, so we performed a trace in ST12:

It took on average ~30 seconds to scroll one page. The above was the trace for that period.

The DB performed a FTS on RBKP:

Number of rows:

Shouldn’t be too big of a deal, right?

I identified the Oracle session and extracted the ASH data. The poor response times appear to be related to the wait event ‘direct path read’. I plotted the session against the wait event with 'Time Waited' (filtered on object RBKP):

From this (for me), there is clearly an I/O bottleneck. However, why is the optimizer choosing a ‘direct path read’ to retrieve this data? It would also be good to know how reliable the ASH stats are in relation to these events?

Thanks,

Tony

Accepted Solutions (1)

Accepted Solutions (1)

stefan_koehler
Active Contributor
0 Kudos

Hi Tony,

hmm ... sounds familiar to me

> However, why is the optimizer choosing a ‘direct path read’ to retrieve this data?

It is not the optimizer in this case - it is the runtime engine. Why is it done that way? Because of your table is larger than "_small_table_threshold". A fully explanation and notes about various changes in the runtime engine regarding the statistics and object driven approach can be found here: Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_st...

> It would also be good to know how reliable the ASH stats are in relation to these events?

ASH is not the right source of information here as discussed earlier

You don't know how many blocks were tried to be retrieved by each wait event (= request) and how many blocks were delivered by OS for each call (and in which way). 1 second for 8k maybe pretty bad, 1 second for 1 MB maybe not that bad. You need to correlate the Oracle I/O request to the OS calls (e.g. pread on Solaris - not quite sure about the corresponding OS call on HP-UX) and crosscheck it there. Bad I/O response times are not necessarily caused by the I/O subsystem. Just for example - i blogged about an issue on Solaris / ZFS and AIX / JFS2 some time ago.

> Shouldn’t be too big of a deal, right?

Depends on the LHWM and HHWM (assuming that you are using ASSM) and not on the amount of rows.

See you on Thursday

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

I was hoping you'd come in on this - didn't want to be cheeky and send direct.

Here are the other wait event stats for the problematic period:

I know this is not granular enough to determine the root cause but to me it certainly points to I/O.

Look forward to an indepth discussion on Thursday!

Cheers,

Tony

stefan_koehler
Active Contributor
0 Kudos

Hi Tony,

i don't bite .. everybody (and especially my clients) can send me any request at any time by e-mail. I usually learn something new or need to remember about forgotten stuff by answering questions - so i benefit every time as well.

> Here are the other wait event stats for the problematic period

Is this data specific to the corresponding Oracle sessions (swapping SAP WPs) by running MIRO or system-wide? It seems like it is system-wide.

However "db file parallel read" / "db file scattered read" (Blog 1 / Blog 2) maybe caused by nested loop batching, table / index pre-fetching or FTS without direct path reads. The only thing that you can say without looking into a crystal ball or guessing is that you spent round about 10048 seconds on doing I/O by the marked wait events. For example CPU is completely missing in the screenshot.

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

Good to know - thanks.

> Is this data specific to the corresponding Oracle sessions (swapping SAP WPs) by running MIRO or system-wide? It seems like it is system-wide.

Yes, system-wide. I reset the stats when the issue was reported and refreshed periodically throughout the analysis.

> For example CPU is completely missing in the screenshot.

Here are the top events by wait time (including CPU):

Cheers,

Tony

Answers (2)

Answers (2)

0 Kudos

Hi,

I came across this and might be late for this post but could help others in the future. We have the same situation and whenever MIRO is run, it goes to direct path reads hence doing high i/o. There is an SAP note and blog explaining this situation and why everything goes to direct path read bypassing the db cache. Also this only happens during the call to MRM_DUPLICATE_INVOICE_CHECK.

We resolved the situation through adding an index RBKP~2 which is also mentioned in the SAP note. After the index, the MIRO load is now too insignificant compared to other activities in the system. Previously it is the top i/o load in the database.

Below is the reference blog that has the SAP note explaining the situation why a full table scan is done and the resolution which is through index.

https://blogs.sap.com/2015/01/28/focus-is-on-miro-performance/

SAP Note 134660 – Logistics Invoice Verification: Performance (RBKP)

Regards,

Jennah

ashish_vikas
Active Contributor
0 Kudos

It is going for FULL TABLE SCAN.. so it need to scan complete table to fetch records.

Check if you have any index on table RBKP for fields in where clause.. or if not then create it and check if it can improve performance. (Also search SAP Notes.. there may be something for this)

best regards

ashish

Former Member
0 Kudos

Thanks Ashish but I want to know why it's going for a 'direct path read' i.e. why is it bypassing the buffer cache?