cancel
Showing results for 
Search instead for 
Did you mean: 

DB statistics for BW DSO table

Former Member
0 Kudos

Hello,

We have performance problems with one of our DSO tables which contains 16m rec of CATS data.

The ABAP program reads from the DSO table by sec.index (trace shows that the index is used) but still we get like 500k cpu and 280 iocost.

I was looking at the statistics for this table and 2 things that got my attention:

1. statistics are about 1 year old (last date 17.6.2009)

2. Deviation is 24% over the last year (3,2m change/1,6m add)

Are the statistics causing the problems with a select on index and why does the DBCONNECT don't update the statistics automatically (it does run every day and does change quiet a lot of other tables)?

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

> We have performance problems with one of our DSO tables which contains 16m rec of CATS data.

> The ABAP program reads from the DSO table by sec.index (trace shows that the index is used) but still we get like 500k cpu and 280 iocost.

As far as I understand your post, you are happy about the index access - so what's the issue?

As long as the correct execution plan is chosen the absolute costs are not that important.

Much more important is: how much actual work is necessary for the query of your ABAP program?

Maybe the index is not optimal for the selection!?

Maybe too much data is seleted?

> I was looking at the statistics for this table and 2 things that got my attention:

> 1. statistics are about 1 year old (last date 17.6.2009)

So what? Statistics don't get old, they stay "fresh". There is nothing like a "best-before" for statistics.

The only thing that could happen is: your data changes over time and the statistics don't fit to it properly anymore.

> 2. Deviation is 24% over the last year (3,2m change/1,6m add)

One quarter is usually not so much - but since certain attributes of the statistics can change heavily even when only a smaller part of the data is changed, this is just a rule of thumb.

We would have to know more about the statistics as they are now and about the data to tell you more.

regards,

Lars

Former Member
0 Kudos

It's a lookup for a data load proces.

What it does is it selects all records for a certain employee.


SELECT
  "CALDAY" , "REPTT" , "QUANTITY"
FROM
  "/BIC/AEAHRO00400"
WHERE
  "EMPLOYEE" = :A0


Execution Plan

-----------------------------------------------------------------------------------------
System: PBW

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |  2302 | 66758 |   269   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| /BIC/AEAHRO00400   |  2302 | 66758 |   269   (0)|
|*  2 |   INDEX RANGE SCAN          | /BIC/AEAHRO0040003 |  2302 |       |    13   (0)|
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPLOYEE"=:A0)
 

The sec index contains only employee.

We use the internal table in the rest of the program so db access is minimal. We used to do single selects for certain days to get the hours but it didn't make much difference.

Since the exe plan looks good (correct me if I'm wrong plz) the statistics were my last resort.

lbreddemann
Active Contributor
0 Kudos

> -


> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|

> -


> | 0 | SELECT STATEMENT | | 2302 | 66758 | 269 (0)|

> | 1 | TABLE ACCESS BY INDEX ROWID| /BIC/AEAHRO00400 | 2302 | 66758 | 269 (0)|

> |* 2 | INDEX RANGE SCAN | /BIC/AEAHRO0040003 | 2302 | | 13 (0)|

> -


>

> Predicate Information (identified by operation id):

> -


>

> 2 - access("EMPLOYEE"=:A0)

>

> The sec index contains only employee.

>

> We use the internal table in the rest of the program so db access is minimal. We used to do single selects for certain days to get the hours but it didn't make much difference.

>

> Since the exe plan looks good (correct me if I'm wrong plz) the statistics were my last resort.

Please - think about this last sentence for a while!

What are the statistics for?

What is their sole purpose?

To provide the cost based optimizer with information about the data in your tables so that it can figure out the best way to execute your query.

Now you say: the execution plan is good, but maybe the statistics are making the performance bad !?

You certainly see the logical mistake here.

The important next question is (and I posted it before): what is the actual work to be done for this query?

Are really just 2302 rows read?

Are really just 269 logical IOs done?

Depending on what the answers to these questions are, other tuning options may turn up.

regards,

Lars

Former Member
0 Kudos

Please - think about this last sentence for a while!

What are the statistics for?

What is their sole purpose?

To provide the cost based optimizer with information about the data in your tables so that it can figure out the best way to execute your query.

Now you say: the execution plan is good, but maybe the statistics are making the performance bad !?

You certainly see the logical mistake here.

the way I thought was that maybe the cbo can not determine the right way to access the table because it hasn't been updated. so because the index is hit the cbo has done a good job is that what you are saying?

The important next question is (and I posted it before): what is the actual work to be done for this query?

Are really just 2302 rows read?

Are really just 269 logical IOs done?

Depending on what the answers to these questions are, other tuning options may turn up.

It is not a query but an inital data load for BW with loads about 10m records and for a subset of these the bw dataload loops through the table I mentioned before. So the select statement gets executed quiet a lot.

I need to know the amount of hours someone has worked for a give period of time and that for almost each record of the 10m. eg. how many hours has employee 123455 worked in the past week/2weeks/4weeks/16weeks in relation to a given date.

so my thought was to select all the dates (records from /bic/aeahro00400) table and proces them in abap rather then do a select with LT GE in the where statement. Because my understanding is that only EQ will hit the index.

lbreddemann
Active Contributor
0 Kudos

> It is not a query but an inital data load for BW with loads about 10m records and for a subset of these the bw dataload loops through the table I mentioned before. So the select statement gets executed quiet a lot.

Database-wise it is a query

> I need to know the amount of hours someone has worked for a give period of time and that for almost each record of the 10m. eg. how many hours has employee 123455 worked in the past week/2weeks/4weeks/16weeks.

Ever thought about using aggregation and grouping on DB level ?

> so my thought was to select all the dates (records from /bic/aeahro00400) table and proces them in abap rather then do a select with LT GE in the where statement. Because my understanding is that only EQ will hit the index.

You're understanding is wrong!

What you're doing seems to be slow-by-slow processing for something that could/should be done directly on DB level.

If I understand your request correctly, then it might be better to create an index on the time-column and have the database do a range scan against this.

That way, even if you just use standard sql features (e.g. no analytics) you would do 4 index range scans and not one for every employe there is.

regards,

Lars

Former Member
0 Kudos

Ok I will try that thanks

Answers (3)

Answers (3)

Former Member
0 Kudos

Thanks a lot.

volker_borowski2
Active Contributor
0 Kudos

Hi,

RSANAORA might be able to do what you like, depending on release and SP level.

But as Lars pointed out, think if it really makes sense, esp. if the table is big.

When this is an initial load it might be a very good option to have stats runs

at the very early phase of the load.

Reason: Stats are telling the optimizer the table is empty (i.e. 8 Blocks), so in case of

a select, you might get full scans of tables or small indexes, and in the early phase these

are indeed may be "the best" access plans. But after a few 30 secs. the table has the

first MBs and now things start to go wrong.

"new" or "fresh" stats make a huge difference, when an object changes from small to midsize.

If a table is midsize or big, stats can be years old and are still "good" enough to provide the required information.

Hope thi shelps

Volker

Former Member
0 Kudos

I have a question about running DB statistics for the DSO table from the process chain.There is a way to run DB statistics from the process chain for InfoCubes, DTPs and InfoPackages, but I would like to run it for some DSO tables. Maybe an ABAP program?

Thanks!

lbreddemann
Active Contributor
0 Kudos

Hello Lyudmilla,

up to my knowledge there isn't a report or any other option to schedule the collection of DB optimizer statistics for DSO objects.

That's likely because once the DSO tables have left their initial empty status and some data is loaded in, then the statistics will

a) be covered by the standard statistics job whenever necessary

and

b) don't make up for a difference concerning execution plans.

As DSO active data tables usually only have one key index and maybe an custom build index for reporting, the available access paths for the data will very likely dominate the optimizer calculation.

In plain english: it's not there, because you very likely wouldn't need it.

If you in fact do feel that you need it, this should be closely analyzed.

It really shouldn't be necessary to calculcate new statistics everytime you load/delete data; the statistics should provide a modell of the data distribution that is good enough for the optimizer to come up with a good execution plan.

best regards,

Lars

anindya_bose
Active Contributor
0 Kudos

It could be the fact that your thresh hold value for Statistics update has been set over 24% ( like 25 % or 50%).

So until and unless the variation is more than that thresh hold, it will not update automatically. You can update those from DB20.

Steps:

1. Enter the table name

2. Press refresh

3. If the diff is huge, press create statistics. ( as this operation may take time run it in quiet period).

Before doing this you can Activate and Adjust Table indexes from SE14.

Regards

Anindya