Re: Join Full scan with parallel to extract 6 mil ...

Former Member · ‎08-05-2011

Hi Expoert,

I have another scenario that need your advise. The requirement is to query around 6 mil records from (EKBE,EKPO and EKKO)

EKBE = 72.748.186

EKPO = 45.705.116

EKKO = 6.793.483


SELECT /*+ FULL(HS) PARALLEL(HS,4) index_ffs(D,"EKPO_{0") parallel_index(D,"EKPO}0",4) index_ffs(H,"EKKO~0")
parallel_index(H, "EKKO~0",4) / COUNT() FROM EKKO H, EKPO D, EKBE HS WHERE H.EBELN = D.EBELN   AND D.EBELN =
HS.EBELN   AND D.EBELP = HS.EBELP   AND HS.VGABE = '1'   AND HS.BWART = '101'   AND HS.BLDAT >= '20091001'
AND HS.BLDAT <= '20110730'   AND HS.ZEKKN = 00   AND HS.WERKS IN (
'1001','1002','1003','1004','1005','1008','1009','1010','1011','1013','1014','1015','1018','1019','1020','1021'
, '1022','1024','1025','1027')

is that to much ? would it be impacting the whole database performance ?

Please advise.

Thank You and Best Regards

Fernand

Edited by: Fernad Lesmana on Aug 5, 2011 6:19 AM

Former Member · ‎08-05-2011

I have change to the real query and use FTS for all 3 tables, please advise ?


SELECT /*+ FULL(HS) PARALLEL(HS,4) FULL(D) PARALLEL (D,4) FULL(H) PARALLEL(H,4) */ 
HS.ebeln,HS.ebelp,HS.vgabe,HS.gjahr,HS.belnr,HS.bldat,HS.xblnr,HS.dmbtr,HS.wrbtr,HS.waers,HS.shkzg,D.werks,D.matnr,D.matkl,D.retpo,H.lifnr,H.bedat,H.bsart
FROM EKKO H, EKPO D, EKBE HS
WHERE H.EBELN = D.EBELN
  AND D.EBELN = HS.EBELN
  AND D.EBELP = HS.EBELP
  AND HS.VGABE = '1'
  AND HS.BWART = '101'
  AND HS.BLDAT >= '20110101'
  AND HS.BLDAT <= '20110131'
  AND HS.ZEKKN = 00
  AND HS.WERKS IN ( '1001','1002','1003','1004','1005','1008','1009','1010','1011','1013','1014','1015','1018','1019','1020','1021','1022','1024','1025','1027')


 SELECT STATEMENT ( Estimated Costs = 503,076 , Estimated #Rows = 0 )
  ---   20 PX COORDINATOR
      ---   19 PX SEND QC (RANDOM) :TQ10004
          ---   18 HASH JOIN BUFFERED
              |--   4 PX RECEIVE
              |   ---   3 PX SEND HASH :TQ10002
              |       ---   2 PX BLOCK ITERATOR
              |           ------1 TABLE ACCESS FULL EKKO
              |                   ( Estim. Costs = 23,035 , Estim. #Rows = 6,090,450 )
              |                   Estim. CPU-Costs = 1,099,664,583 Estim. IO-Costs = 23,008
              |                   Access Predicates
              ---   17 PX RECEIVE
                  ---   16 PX SEND HASH :TQ10003
                      ---   15 HASH JOIN BUFFERED
                          |--   9 PX JOIN FILTER CREATE :BF0000
                          |   ---   8 PX RECEIVE
                          |       ---   7 PX SEND HASH :TQ10000
                          |           ---   6 PX BLOCK ITERATOR
                          |               ------5 TABLE ACCESS FULL EKBE
                          |                       ( Estim. Costs = 168,844 , Estim. #Rows = 6,024,996 )
                          |                       Estim. CPU-Costs = 20,186,066,269 Estim. IO-Costs = 168,346
                          |                       Access Predicates Filter Predicates
                          ---   14 PX RECEIVE
                              ---   13 PX SEND HASH :TQ10001
                                  ---   12 PX JOIN FILTER USE :BF0000
                                      ---   11 PX BLOCK ITERATOR
                                          ------10 TABLE ACCESS FULL EKPO
                                                   ( Estim. Costs = 311,156 , Estim. #Rows = 41,541,670 )
                                                   Estim. CPU-Costs = 38,887,618,861 Estim. IO-Costs = 310,196
                                                   Access Predicates

Edited by: Fernad Lesmana on Aug 5, 2011 10:10 AM

yuri_ziryukin · ‎08-05-2011

Fernad,

before you play with FTS, find out how many records your EKBE table gives back with your selection criteria.

This is the first point. The second point: why do you exactly need a join if all selection criteria are coming from EKBE only and you are selecting count(*).

Yuri

Former Member · ‎08-05-2011

Hi Yuri,

Total selection data from EKBE is around 6 million and base on ebeln and ebelp i need to get almost the same data from EKPO (PO Details) and I need to get the header data from EKKO (PO Header). That is the reason to join all the 3 tables.

The others option is to do FAE but currently it is very slow:

EKBE FAE -

> base on 6 millions data FAE from EKPO and distinct base on EBELN to FAE to EKKO.

any advise ?

Thank You and Best Regards

Fernand

Former Member · ‎08-05-2011

Some information for the table and index


Table   EKKO
Last statistics date                  12.06.2011
Analyze Method               Sample 609,045 Rows
Number of rows                         6,090,450
Number of blocks allocated               385,888
Number of empty blocks                         0
Average space                                  0
Chain count                                    0
Average row length                           426
Partitioned                                   NO
UNIQUE     Index   EKKO~0
Column Name                     #Distinct
MANDT                                          1
EBELN                                  6,090,450
Last statistics date                  12.06.2011
Analyze Method               Sample 607,340 Rows
Levels of B-Tree                               2
Number of leaf blocks                     22,230
Number of distinct keys                6,073,400
Average leaf blocks per key                    1
Average data blocks per key                    1
Clustering factor                      5,530,630
Table   EKBE
Last statistics date                  12.06.2011
Analyze Method              ample 6,605,256 Rows
Number of rows                        66,052,560
Number of blocks allocated             2,823,562
Number of empty blocks                         0
Average space                                  0
Chain count                                    0
Average row length                           295
Partitioned                                   NO
UNIQUE     Index   EKBE~0
Column Name                     #Distinct
MANDT                                          1
EBELN                                  3,200,510
EBELP                                        543
ZEKKN                                          2
VGABE                                          4
GJAHR                                          4
BELNR                                  4,365,636
BUZEI                                        744
Last statistics date                  12.06.2011
Analyze Method              ample 6,267,218 Rows
Levels of B-Tree                               3
Number of leaf blocks                    534,220
Number of distinct keys               62,672,180
Average leaf blocks per key                    1
Average data blocks per key                    1
Clustering factor                     62,102,850
Table   EKPO
Last statistics date                  12.06.2011
Analyze Method              ample 4,154,167 Rows
Number of rows                        41,541,670
Number of blocks allocated             5,202,733
Number of empty blocks                         0
Average space                                  0
Chain count                                    0
Average row length                           845
Partitioned                                   NO
UNIQUE     Index   EKPO~0
Column Name                     #Distinct
MANDT                                          1
EBELN                                  3,136,035
EBELP                                        629
Last statistics date                  12.06.2011
Analyze Method              ample 4,146,389 Rows
Levels of B-Tree                               3
Number of leaf blocks                    181,650
Number of distinct keys               41,463,890
Average leaf blocks per key                    1
Average data blocks per key                    1
Clustering factor                     25,995,890

volker_borowski2 · ‎08-05-2011

HI,

well I need to investigate a bit further, but I already like to say,

that you must include the MANDT in your test statements, as it will be included by the SAP kernel automaticly.

Since MANDT ist always the first field in a lot of indexes it is always a good idea to include it to WHERE.

Going parallel on all tables is rarely benefitial.

This tends to result in HASH or SORT_MERGE joins, which can utilize a significat amount of TEMP space in RAM and on DISK.

In general you try to use "divde an conquer".

You go FTS on the biggest table and try to do nested loops to the others to save temp resources.

This might not result in the best plan, but might be less disturbing for the rest of the system.

And don't overdo it. See my "bad experience" comment in your other thread.

Volker

Rechecked: Try only /*+ FULL(HS) PARALLEL(HS,8) */

I do not think the parallel approach on the other tables is benfitial in that join.

Edited by: Volker Borowski on Aug 5, 2011 11:25 AM

yuri_ziryukin · ‎08-05-2011


> Rechecked: Try only /*+ FULL(HS) PARALLEL(HS,8) */
> I do not think the parallel approach on the other tables is benfitial in that join.
> 
> Edited by: Volker Borowski on Aug 5, 2011 11:25 AM

Hello Volker,

side question, do you know if Oracle is clever enough to communicate the results of parallel scans from HS to one process that is supposed to do nested loops afterwards? This sounds like a little tricky task

Yuri

volker_borowski2 · ‎08-06-2011


> Hello Volker,
> 
> side question, do you know if Oracle is clever enough to communicate the results of parallel scans from HS to one process that is supposed to do nested loops afterwards? This sounds like a little tricky task 
> 
> Yuri

Hi Yuri,

actually, in this case, it would not be done in that way.

In a nested loop join, every PQ process does the join for its own part of the complete resultset itself.

While it gets packages of results, it feeds them to the process that initiated the statement.

This is done without an intermediate result set being built, just as the data comes in, it is given to

the PQ "master" process, so it requires no temp space (this is why I like these nl-joins more when using PQ,

even if they might be less efficiant for the individual statement, but they don't kill other stuff that requires TEMP).

The difference is, that your resultset might not be ordered in the same way as if you go single threaded,

because the first X rows may come from process p001, the next Y rows from p011, and the next ones from p005 ...

But with hash or sort_merge joins, you can have this problem, when a parallel resultset stored in TEMP needs to be fed to

a single thread doing the join. Or the other way round, when a single thread is building the hash table and

the PQs waiting in line with no activity until the first guy is ready.

You might assume something is wrong with PQ then when watching ST04

Volker

yuri_ziryukin · ‎08-08-2011

Indeed, sounds logical. Why I did not come to it myself?

Join Full scan with parallel to extract 6 mil records (EKBE,EKKO,EKPO)