Solved: Column(s) selectivity

former_member182566 · ‎07-17-2009

2 questions (not really sure if it's against the rules to pose two questions in one post, but here they go):

Question 1: We have in MSEG one index by CHARG (MSEG_{Z01) and another one by WERKS + MATNR (MSEG}M)

(statistics 100%)

nr rows: 24,991,486
matnr:      225,480     
werks:           90
charg:    6,882,957

CHARG is in fact more selective than MATNR + WERKS,

even if nr distinct values charg < nr distinct values werks x nr distinct values matnr

(because in reality each material appears only in 1 or 2 plants)

The following query, then, uses MSEG_{M and not MSEG}Z01 (which would be faster).

select *
from   mseg 
where  mandt = :a1
and    werks = :a2
and    matnr = :a3
and    charg = :a4

Apart from using hints (I don't want to force Z01 because the report has other optional parameters, and if filled other indexes should be selected), any ideas on how this is usually solved? For now the solution we have is to not use WERKS in the where condition (and then CHARG is more selective than MATNR, and Z01 is used).

Question 2: In our MSEG, the density for CHARG is not 1/distinct values (and that happens, I think, because we have MSEG with histograms - don't ask me why, but we do). Question is, how is this density calculated?


select table_name, column_name, num_distinct, 1/num_distinct, density, num_buckets
from   all_tab_columns
where  table_name = 'MSEG'
and   column_name = 'CHARG'

TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT 1/NUM_DISTINCT    DENSITY NUM_BUCKETS
------------------------------ ------------------------------ ------------ -------------- ---------- -----------
MSEG                           CHARG                               6882957     1,4529E-07 ,000115903          61

I saw somewhere that the formula would be

sum(count^2) / ( sum(count)^2 )

but that doesn't seem to produce the expected density:

select sum(sq) / (sum(cnt) * sum(cnt))
from ( select charg, count(*)*count(*) sq, count(*) cnt from sapr3.mseg group by charg )

SUM(SQ)/(SUM(CNT)*SUM(CNT))
---------------------------
                 ,038637666

former_member192616 · ‎07-17-2009

Hi Rui Dantas,

for 1:

could you post your index statistics, please?

i assume both indexes contain client as first field?

starting with 10g distinct keys (index statistic) plays an important role.

Furthermore you probably don't want to use subsitute values, right? (which could make a change)

Furthermore i found out that in10.2.0.4 distinct keys are not only relvant for index selectivty but for

talbe selectivty as well which makes a change as well.

And as always, make sure you have set the current parameter recommendations for your db

(use the attached sql in note 1171650 for an automated Oracle DB parameter check)

for 2:

what i found in the net somewhere:

density = u03A3 cnt2 / ( num_rowsu02DC * u03A3 cnt )

u201Cthe sum of the squared frequencies of all non-popular values divided by the

sum of the frequencies of all non-popular values times the count of rows

with not null values of the histogram columnu201D.

But i didn't found a definition for "frequencies of all non-popular values"...

and another update:

The calculation of density in case of a histogram is documented in the US

patent 6732085:

http://www.freepatentsonline.com/6732085.html

Kind regards,

Hermann

Edited by: Hermann Gahm on Jul 17, 2009 5:16 PM

former_member192616 · ‎07-17-2009

Hi Rui Dantas,

for 1:

could you post your index statistics, please?

i assume both indexes contain client as first field?

starting with 10g distinct keys (index statistic) plays an important role.

Furthermore you probably don't want to use subsitute values, right? (which could make a change)

Furthermore i found out that in10.2.0.4 distinct keys are not only relvant for index selectivty but for

talbe selectivty as well which makes a change as well.

And as always, make sure you have set the current parameter recommendations for your db

(use the attached sql in note 1171650 for an automated Oracle DB parameter check)

for 2:

what i found in the net somewhere:

density = u03A3 cnt2 / ( num_rowsu02DC * u03A3 cnt )

u201Cthe sum of the squared frequencies of all non-popular values divided by the

sum of the frequencies of all non-popular values times the count of rows

with not null values of the histogram columnu201D.

But i didn't found a definition for "frequencies of all non-popular values"...

and another update:

The calculation of density in case of a histogram is documented in the US

patent 6732085:

http://www.freepatentsonline.com/6732085.html

Kind regards,

Hermann

Edited by: Hermann Gahm on Jul 17, 2009 5:16 PM

former_member182566 · ‎07-17-2009

Hi Hermann,

for 1:

i use 9.2.

i am not exactly sure what info you are asking me, but:

NONUNIQUE  Index   MSEG~Z01                     

Column Name                     #Distinct       
CHARG                                  6,882,957
                                                                                NONUNIQUE  Index   MSEG~M                       
                                                
Column Name                     #Distinct                                                      
MANDT                                          1
MATNR                                    225,480
WERKS                                         90
LGORT                                        694
BWART                                        152
SOBKZ                                          7

So, there are in fact some things I ommited to make it simpler:

- MSEG~M has MANDT (but only 1 distinct value, so it shouldn't make a difference)

- MSEG~M has other fields (after the 3 we use, so it shouldn't make a difference)

I understand that the selectivity for two fields (in this case MATNR + WERKS) is the product of the selectivity of each field, so it is expected that it is higher than CHARG alone. The problem is that in reality selectivity (matnr + werks) is much lower than selectivity (matnr) * selectivity (werks), because all materials do not exist in all plants.

I suppose this is a frequent problem for db optimization (and cause of wrong cost estimation), so that's why I was asking what is usually done in these situations.

for 2:

i also thought just after posting that I would have to remove the popular values, because they are problably not used in the calculation of the average density. I'll check if there are popular values in this table's histograms, and also look more carefully to your formula, which seems different from what I had found.

Thanks.

former_member192616 · ‎07-17-2009

Hi Rui Dantas,

for 1.

in the ST05 popup in the bottom there should be a button "index statistics".

These values are interesting for both indexes.

the basic problem is that the optimizer assumes that WERKS and MATNR

are not correlated. But in your database there is some correlation between

those two.

Oracle 9i: Correlations are not considered at all.

Oracle 10.2.0.2: Correlations are only considered for the index access

by taking into account the distinct keys of the index.

Oracle 10.2.0.4: Correlations are considered using dynamic sampling

and simplified column group statistics.

Oracle 11g: General column group statistics are available

(independent of existing indexes).

So what to do?

It's not easy and i don't know if there is a real good solution.

What i did in the past was:

- hint

- patch the satistics (e.g. make CHARG more attractive (more distinct values))

- some self made optimization in abap (evaluate where and hint in special cases)

- substitute values and histograms

- recomend an upgrade from 10.2.0.2 to 10.2.0.4

each thing has pros- and cons- ... sorry

Kind regards,

Hermann

former_member182566 · ‎07-17-2009

Hi Hermann,

Perfect, thanks. The abap dynamic thing is what we've been doing; I guess I'll have to wait for 10g for further improvements.

By the way, about your "patch statistics" solution: to avoid this value being overwritten every time statistics are calculated, can I "freeze" only that column, or would I have to lock the whole table against stat updates (like is done, for example, in oss note 1020260)?

The index statistics:

NONUNIQUE Index MSEG~Z01

Last statistics date 2009-07-11

Analyze Method mple 24,991,486 Rows

Levels of B-Tree 2

Number of leaf blocks 93,802

Number of distinct keys 6,882,957

Average leaf blocks per key 1

Average data blocks per key 2

Clustering factor 18,991,009

NONUNIQUE Index MSEG~M

Last statistics date 2009-07-11

Analyze Method mple 24,991,486 Rows

Levels of B-Tree 3

Number of leaf blocks 183,363

Number of distinct keys 872,451

Average leaf blocks per key 1

Average data blocks per key 19

Clustering factor 16,626,875

Again, thanks a lot for the expert tips.

former_member192616 · ‎07-18-2009

Hi Rui Dantas,

thanks for the statistics. The distinct keys are there but not used for cost calculation in 9i.

In 10g the selectivity would be: min(distinct(MATNR) * distinct(WERKS) ; distinct key of that index) or something

like that... . but i'm not sure if with binds and histograms density will still be used for selectivity... . Maybe you

can try to adjust the density of that column... .

In ORACLE 9 you can only lock the table in DBSTATC (that is SAP level and brtools evaluate it). On DB level

the stats are not locked and can be overwritten in ORACLE 9i. In ORACLE 10g one can lock statistics at

DB level.

Yes, wait for 10g... you'll enjoy...

-much more fun and things to do with new features... solutions for old problems... and new problems...

and new methods to approach them...

Kind regards,

Hermann

Former Member · ‎07-19-2009

You should also remember that one seldom goes against MSEG directly.

If you cannot restrict the access through MSKPF (? parent) you should look at the number of index tables (4 to 6? 8?)

This way you can access a restricted set of keys that will give you the right information to access MSEG by key, or not at all if the information you need is already in one of the index tables...

Have fun

former_member182114 · ‎07-17-2009

Hi Rui,

I agree with you to do not force an index.

Try use substitute values hint as Hermann suggested.

With the values probably the optimizer will choose your Z01 index.

%_HINTS ORACLE '&SUBSTITUTE VALUES&'

Regards, Fernando Da Ró

former_member182566 · ‎07-22-2009

Hi.

>


> Hi Rui,
> Try use substitute values hint as Hermann suggested.
> With the values probably the optimizer will choose your Z01 index.
> 
> %_HINTS ORACLE '&SUBSTITUTE VALUES&'

SUBSTITUTE VALUES is more difficult to test to know for sure (because in some cases Z01 would be used and not in others), but I would say the problem would still be there (oracle doesn't know the correlation between the two fields), so it would still wrongly estimate the number of rows. Anyway I would rather not use SUBSTITUTE VALUES.

>


> If the index Z01 is your index, then you could change that index, why don't you add werks or matnr as an additional field?
> WERKS is short.

Yep, well thought, werks would solve the problem. MSEG~Z01 is already around 600 Mb only with CHARG, so I was hoping not to have to make it bigger, but thanks for the suggestion.

Anyway, I consider the 1st part of the question answered with the correlations thing in 10g.

For the 2nd part, I still haven't found the time to test the formula, but I'll get there soon.

former_member194613 · ‎07-19-2009

If the index Z01 is your index, then you could change that index, why don't you add werks or matnr as an additional field?

WERKS is short.

Siegfried

former_member194613 · ‎07-20-2009

???