07-17-2009 3:18 PM
2 questions (not really sure if it's against the rules to pose two questions in one post, but here they go):
Question 1: We have in MSEG one index by CHARG (MSEGZ01) and another one by WERKS + MATNR (MSEGM)
(statistics 100%)
nr rows: 24,991,486
matnr: 225,480
werks: 90
charg: 6,882,957
CHARG is in fact more selective than MATNR + WERKS,
even if nr distinct values charg < nr distinct values werks x nr distinct values matnr
(because in reality each material appears only in 1 or 2 plants)
The following query, then, uses MSEGM and not MSEGZ01 (which would be faster).
select *
from mseg
where mandt = :a1
and werks = :a2
and matnr = :a3
and charg = :a4
Apart from using hints (I don't want to force Z01 because the report has other optional parameters, and if filled other indexes should be selected), any ideas on how this is usually solved? For now the solution we have is to not use WERKS in the where condition (and then CHARG is more selective than MATNR, and Z01 is used).
Question 2: In our MSEG, the density for CHARG is not 1/distinct values (and that happens, I think, because we have MSEG with histograms - don't ask me why, but we do). Question is, how is this density calculated?
select table_name, column_name, num_distinct, 1/num_distinct, density, num_buckets
from all_tab_columns
where table_name = 'MSEG'
and column_name = 'CHARG'
TABLE_NAME COLUMN_NAME NUM_DISTINCT 1/NUM_DISTINCT DENSITY NUM_BUCKETS
------------------------------ ------------------------------ ------------ -------------- ---------- -----------
MSEG CHARG 6882957 1,4529E-07 ,000115903 61
I saw somewhere that the formula would be
sum(count^2) / ( sum(count)^2 )
but that doesn't seem to produce the expected density:
select sum(sq) / (sum(cnt) * sum(cnt))
from ( select charg, count(*)*count(*) sq, count(*) cnt from sapr3.mseg group by charg )
SUM(SQ)/(SUM(CNT)*SUM(CNT))
---------------------------
,038637666
07-17-2009 4:02 PM
Hi Rui Dantas,
for 1:
could you post your index statistics, please?
i assume both indexes contain client as first field?
starting with 10g distinct keys (index statistic) plays an important role.
Furthermore you probably don't want to use subsitute values, right? (which could make a change)
Furthermore i found out that in10.2.0.4 distinct keys are not only relvant for index selectivty but for
talbe selectivty as well which makes a change as well.
And as always, make sure you have set the current parameter recommendations for your db
(use the attached sql in note 1171650 for an automated Oracle DB parameter check)
for 2:
what i found in the net somewhere:
density = u03A3 cnt2 / ( num_rowsu02DC * u03A3 cnt )
u201Cthe sum of the squared frequencies of all non-popular values divided by the
sum of the frequencies of all non-popular values times the count of rows
with not null values of the histogram columnu201D.
But i didn't found a definition for "frequencies of all non-popular values"...
and another update:
The calculation of density in case of a histogram is documented in the US
patent 6732085:
http://www.freepatentsonline.com/6732085.html
Kind regards,
Hermann
Edited by: Hermann Gahm on Jul 17, 2009 5:16 PM
07-17-2009 4:02 PM
Hi Rui Dantas,
for 1:
could you post your index statistics, please?
i assume both indexes contain client as first field?
starting with 10g distinct keys (index statistic) plays an important role.
Furthermore you probably don't want to use subsitute values, right? (which could make a change)
Furthermore i found out that in10.2.0.4 distinct keys are not only relvant for index selectivty but for
talbe selectivty as well which makes a change as well.
And as always, make sure you have set the current parameter recommendations for your db
(use the attached sql in note 1171650 for an automated Oracle DB parameter check)
for 2:
what i found in the net somewhere:
density = u03A3 cnt2 / ( num_rowsu02DC * u03A3 cnt )
u201Cthe sum of the squared frequencies of all non-popular values divided by the
sum of the frequencies of all non-popular values times the count of rows
with not null values of the histogram columnu201D.
But i didn't found a definition for "frequencies of all non-popular values"...
and another update:
The calculation of density in case of a histogram is documented in the US
patent 6732085:
http://www.freepatentsonline.com/6732085.html
Kind regards,
Hermann
Edited by: Hermann Gahm on Jul 17, 2009 5:16 PM
07-17-2009 4:27 PM
Hi Hermann,
for 1:
i use 9.2.
i am not exactly sure what info you are asking me, but:
NONUNIQUE Index MSEG~Z01
Column Name #Distinct
CHARG 6,882,957
NONUNIQUE Index MSEG~M
Column Name #Distinct
MANDT 1
MATNR 225,480
WERKS 90
LGORT 694
BWART 152
SOBKZ 7
So, there are in fact some things I ommited to make it simpler:
- MSEG~M has MANDT (but only 1 distinct value, so it shouldn't make a difference)
- MSEG~M has other fields (after the 3 we use, so it shouldn't make a difference)
I understand that the selectivity for two fields (in this case MATNR + WERKS) is the product of the selectivity of each field, so it is expected that it is higher than CHARG alone. The problem is that in reality selectivity (matnr + werks) is much lower than selectivity (matnr) * selectivity (werks), because all materials do not exist in all plants.
I suppose this is a frequent problem for db optimization (and cause of wrong cost estimation), so that's why I was asking what is usually done in these situations.
for 2:
i also thought just after posting that I would have to remove the popular values, because they are problably not used in the calculation of the average density. I'll check if there are popular values in this table's histograms, and also look more carefully to your formula, which seems different from what I had found.
Thanks.
07-17-2009 5:52 PM
Hi Rui Dantas,
for 1.
in the ST05 popup in the bottom there should be a button "index statistics".
These values are interesting for both indexes.
the basic problem is that the optimizer assumes that WERKS and MATNR
are not correlated. But in your database there is some correlation between
those two.
Oracle 9i: Correlations are not considered at all.
Oracle 10.2.0.2: Correlations are only considered for the index access
by taking into account the distinct keys of the index.
Oracle 10.2.0.4: Correlations are considered using dynamic sampling
and simplified column group statistics.
Oracle 11g: General column group statistics are available
(independent of existing indexes).
So what to do?
It's not easy and i don't know if there is a real good solution.
What i did in the past was:
- hint
- patch the satistics (e.g. make CHARG more attractive (more distinct values))
- some self made optimization in abap (evaluate where and hint in special cases)
- substitute values and histograms
- recomend an upgrade from 10.2.0.2 to 10.2.0.4
each thing has pros- and cons- ... sorry
Kind regards,
Hermann
07-17-2009 6:30 PM
Hi Hermann,
Perfect, thanks. The abap dynamic thing is what we've been doing; I guess I'll have to wait for 10g for further improvements.
By the way, about your "patch statistics" solution: to avoid this value being overwritten every time statistics are calculated, can I "freeze" only that column, or would I have to lock the whole table against stat updates (like is done, for example, in oss note 1020260)?
The index statistics:
NONUNIQUE Index MSEG~Z01
Last statistics date 2009-07-11
Analyze Method mple 24,991,486 Rows
Levels of B-Tree 2
Number of leaf blocks 93,802
Number of distinct keys 6,882,957
Average leaf blocks per key 1
Average data blocks per key 2
Clustering factor 18,991,009
NONUNIQUE Index MSEG~M
Last statistics date 2009-07-11
Analyze Method mple 24,991,486 Rows
Levels of B-Tree 3
Number of leaf blocks 183,363
Number of distinct keys 872,451
Average leaf blocks per key 1
Average data blocks per key 19
Clustering factor 16,626,875
Again, thanks a lot for the expert tips.
07-18-2009 7:45 AM
Hi Rui Dantas,
thanks for the statistics. The distinct keys are there but not used for cost calculation in 9i.
In 10g the selectivity would be: min(distinct(MATNR) * distinct(WERKS) ; distinct key of that index) or something
like that... . but i'm not sure if with binds and histograms density will still be used for selectivity... . Maybe you
can try to adjust the density of that column... .
In ORACLE 9 you can only lock the table in DBSTATC (that is SAP level and brtools evaluate it). On DB level
the stats are not locked and can be overwritten in ORACLE 9i. In ORACLE 10g one can lock statistics at
DB level.
Yes, wait for 10g... you'll enjoy...
-much more fun and things to do with new features... solutions for old problems... and new problems...
and new methods to approach them...
Kind regards,
Hermann
07-19-2009 11:03 AM
You should also remember that one seldom goes against MSEG directly.
If you cannot restrict the access through MSKPF (? parent) you should look at the number of index tables (4 to 6? 8?)
This way you can access a restricted set of keys that will give you the right information to access MSEG by key, or not at all if the information you need is already in one of the index tables...
Have fun
07-17-2009 4:31 PM
Hi Rui,
I agree with you to do not force an index.
Try use substitute values hint as Hermann suggested.
With the values probably the optimizer will choose your Z01 index.
%_HINTS ORACLE '&SUBSTITUTE VALUES&'
Regards, Fernando Da Ró
07-22-2009 4:21 PM
Hi.
>
> Hi Rui,
> Try use substitute values hint as Hermann suggested.
> With the values probably the optimizer will choose your Z01 index.
>
>
%_HINTS ORACLE '&SUBSTITUTE VALUES&'
SUBSTITUTE VALUES is more difficult to test to know for sure (because in some cases Z01 would be used and not in others), but I would say the problem would still be there (oracle doesn't know the correlation between the two fields), so it would still wrongly estimate the number of rows. Anyway I would rather not use SUBSTITUTE VALUES.
>
> If the index Z01 is your index, then you could change that index, why don't you add werks or matnr as an additional field?
> WERKS is short.
Yep, well thought, werks would solve the problem. MSEG~Z01 is already around 600 Mb only with CHARG, so I was hoping not to have to make it bigger, but thanks for the suggestion.
Anyway, I consider the 1st part of the question answered with the correlations thing in 10g.
For the 2nd part, I still haven't found the time to test the formula, but I'll get there soon.
07-19-2009 2:21 PM
If the index Z01 is your index, then you could change that index, why don't you add werks or matnr as an additional field?
WERKS is short.
Siegfried
07-20-2009 9:21 AM