Solved: Table Design and Performance Issues

Former Member · ‎01-08-2005

Hi,

I need to fetch order numbers for a given plant, workcenter and for scheduled finish date from AFKO table.

There are various conditions involved as in :

a) To check whether it is a released order

b) To check whether it is a created order etc.

These various conditions brings in join and at some places

select-for-all queries.

Report is working fine, only issue is performance, especially with the AFKO and AFPO tables. These tables have close to 500,000 records. So fetching data from these tables not on primary key(plant and date) using SE11 itself takes close to 25 seconds. Executing the report, it takes about 1 1/2 minutes, considering the join conditions and select-for-all entries, which are additional.

For this , I have come up with 3 designs:


Design 1
----------
Run a program(PROGRAM1) and , for the give user 
conditions, select all(approximately 10 fields) the 
required fields from AFKO and AFPO, put that in a Z-table 
in background. So next time when the user runs his 
program(PROGRAM2), data will be fetched from Z table. As 
the number of records will be less in Z table, time taken 
to fetch the records will be less. 


Problems foreseen
-----------------
a) Z table will have exactly the same data as AFKO and 
AFPO. So data is being duplicated.


b) User will enter the data in this program and run it in 
background mode. In case the user forgets to run this 
program, possibilities of stale data in Z-table

Design 2
---------
Instead of creating a Z table , create an index in AFKO 
based on dates and plant. Dates and Plant will always be 
the condition to fetch from these tables.

Problems Foreseen
------------------
a) Not sure about the implications of creating an index on
a base table will have on other programs

b) Again not sure about any extra access/authority is 
required to create an index on base table

Design 3
---------
Similar to design 1, but in this case, I will only store 
the date(Scheduled Finish Date) and ther order number in Z
table. So as far as the problem of duplication arises, it 
is only with the date stored in the Z table. As date is 
one of the criteria to search for the order numbers, if I 
create an index on this Z table on dates, the response 
would be quick and later would be searching the AFKO table
now on primary key, so again the response would be 
quick(considering that AFKO already has an index based on 
AUFNR(order no.)).

Problems Foreseen
------------------
As mentioned earlier, duplication has now been reduced to 
a singular field and again possibility of stale data has 
also been reduce to date field.

Especially in the designs which involve Z - table , some prime question that I should be asking is

a) How frequently will PROGRAM1 be running ? (Every day)

b) How much data this Z - table store ?

c) When will the records in the Z - table be deleted ? (Otherwise it would be just another AFKO table and response time will be as slow)

Are there any better designs than this, to nail this problem. Kindly share. I especially want a design, which removes the possibility of using a Z table. Is this possible ?

I am aware that there are some open threads which you would want to question. Feel free, as design, involves a lot of questions, although I have tried my best to elucidate my points.

Thank you for going through this entire post.

Regards,

Subramanian V.

Former Member · ‎01-10-2005

Hello Subramanian,

Just a few thoughts of mine here -

1. As you have rightly said, maintaining the Z-table is going to be a cause for concern. Also, the number of entries in the Z table might even become greater then the number of entries in the actual table, because the each user would be giving a different selection criteria. Isn't that correct?

2. Have you explored all the avenues to optimize the ABAP part of the program? Indexes and ABAP optimization could give you some good results.

3. If you have tried the all the possible options that you know (including the previous suggestion), you might want to consider the option of doing some kind of parallel processing. You could split the selection conditions and use something like

 CALL FUNCTION func IN BACKGROUND TASK.

or

 CALL FUNCTION func ...STARTING NEW TASK task name.

I shall let you know if I can think something better.

Regards,

Anand Mandalika.

Former Member · ‎01-08-2005

Hi Subramanian,

I face the same situation here. Just have a look at this also,

1. You can use the PROGRAM1 to create a file that stores the data update from the AFKO and AFPO tables daily/ weekly or monthly. you can use the file naming conventions as YYYYMMDD (with extensions).

2. When you run the PROGRAM2, this should generate a report with the data present in the generated file.

3. Thus as per your suggestion, there will be no duplicate z table, rather a file, and hence the delay will be small. This file will be different as their are named are generated based on date, thus reporting will become easier, if report is to be generated for monthly data updates / weekly updates or daily updates etc.

Hope this is another option without creating a Z table.

If you come across better solution please let me know,

Thanks and Regards,

Kathirvel

Former Member · ‎01-08-2005

Hi Kathirvel,

Thanks for going through that entire post of mine and posting a reply. Having a file store database data, I am quite cynical about that. Duplication still exists, not in tables,however in files and having to store this data datewise in a file, increases complexity multifold. Fetching data would take much more time, as I cannot use any search criteria on the file. I have to download that entire data.

Nevertheless, it is an option. I shall keep you posted, once I get a solution.

Regards,

Subramanian V.

Former Member · ‎01-08-2005

Couple of quick thoughts on a lazy Saturday afternoon. Hope all are having a nice weekend.

In regards to avoiding a Z table, you can always do an EXPORT to INDX, and likewise an IMPORT from INDX later.

In regards to creating an index, I always try and remember that we spent a lot of many to buy a data base management system and providing an indexing capability is one of the features that you paid for. This should only have a positive or neutral impact on any other reading programs, with a small and probably undetectable decrease in performace for updating programs.

The Z table you are describing is very much like the matchcodes that we used heavily in R2 and are now considered obsolete in R3. As of 4.6C, matchcodes are hidden under a menu option from SE11.

Maintaining another table is just a different way of adding work to the data base manager but in a way that no other users would ever be able to take advantage of it. Might as well add the index instead and then not only will your program be able to use it but also other users like SE16.

Enough rambling for now, back to painting the kitchen but at least the football game will be on soon!

Former Member · ‎01-10-2005

Hello Subramanian,

Just a few thoughts of mine here -

1. As you have rightly said, maintaining the Z-table is going to be a cause for concern. Also, the number of entries in the Z table might even become greater then the number of entries in the actual table, because the each user would be giving a different selection criteria. Isn't that correct?

2. Have you explored all the avenues to optimize the ABAP part of the program? Indexes and ABAP optimization could give you some good results.

3. If you have tried the all the possible options that you know (including the previous suggestion), you might want to consider the option of doing some kind of parallel processing. You could split the selection conditions and use something like

 CALL FUNCTION func IN BACKGROUND TASK.

or

 CALL FUNCTION func ...STARTING NEW TASK task name.

I shall let you know if I can think something better.

Regards,

Anand Mandalika.

Former Member · ‎01-10-2005

Thanks for your repsonse, Poornanand.

a) For point no. 1, entries will not be greater than base table, because entries are copied from base table, so even if new user runs, data is not stored based on user or time. If same data is fetched, then this data will be overwritten.

b) For point 2, yes I have explored index part, unfortunately, the search criteria does not fall in any secondary index already defined. Perhaps I need to create new index. All that I am fretful of is, it should not have rippling effects in other transactions/programs.

c) For Point 3, it would have been wonderful to do that, but those select queries are inter dependent. So I cannot do that, even.

Regards,

Subramanian V.

Former Member · ‎01-10-2005

Hi Subramanian,

Since you have already tried out the first two options, let us have a deeper look at the third option - parallel processing.

You have mentioned that this option cannot be employed because the select queries are interdependent. This is something that I'm not very clear with. Assuming that the report has a selection screen, I was suggesting that you may be able to split the conditions for selection, not the selection process itself.

For Example, let us say there are two (interdependent, as you say) SELECT statements, which run for, let us say, 6 plants. Now you can run three different tasks where the two SELECTS run for 2 plants each.

Hope you got the point. If you think that I have erred somewhere, then please oblige to correct me. Else, let me know what you think of this approach.

Regards,

Anand Mandalika.

Former Member · ‎01-11-2005

Thanks for your suggestion.

I should close this thread, one of the reasons being, I just found out that, we want stale data. As amusing and surprising it may seem, it still is the requirement. So Z-table has to be used or this can be done using Change Log method, CHDATA and CHPOS tables.

Just for curiousity sake, I would want to dwell more on the parallel processing of select queries. So from your example, I figure that, assuming I have six plants in my internal table, I run my next select query in such a way that, three new tasks are opened for the plants in that internal table(say splitting the internal table to multiples of 2). In the end, collect all the results and show it.


for eg:
Plant
-----
A
B
C
D
E
F

A , B --> Task 1
C , D --> Task 2
E , F --> Task 3

Let me know if I am on the right track

Some questions that come into my mind are :

a) 6 plants - split into multiples of 2, what happens if there 10, 100, 1000 , 10000, a million on so forth. Perhaps I need a lesson on Parallel Processing concepts, eh!!

b) How do I know whether all tasks get completed ? I am sure there is some flag that I need to check. Can you throw light on this point ?

Thanks for sharing this piece of information.

Regards,

Subramanian V.

P.S. - Activity in ABAP has been so hectic, I had to go 4 pages back just to dig my post

Former Member · ‎01-11-2005

In regards to "How do I know whether all tasks get completed ?", view the help on CALL FUNCTION...STARTING NEW TASK. It has example code at the end that shows how to know when the new tasks have completed and how to get data back from the new task.

Former Member · ‎01-12-2005

Hello Subramanian,

How you would want to split the input is something that you will have to decide for yourself. Some questions you might want to consider would be :

1. How many parallel processes do you want to have?

2. Does your system have the resources to execute the required number of parallel tasks?

There's some excellent documentation on <a href="http://help.sap.com/saphelp_46c/helpdata/en/fa/096e92543b11d1898e0000e8322d00/content.htm">Implementing Parallel Processing</a>, which I'm sure you will find to be very helpful.

Go through the information in the link and please do get back if you got some more doubts.

=====================================================================

By the way, since you are saying that you have a very large amount of data, one of the options that you must consider (which I guess is usually forgotten) is Extract Datasets (or field-groups). It might be construed that their use is only restricted to the case where the lines have different structures. But in my experience, they have, on occassion, given some really good results when it came to handling large volumes of data. Please do give it a thought.

Regards,

Anand Mandalika.

P.S. You can still use parallel processing if you decide to go for the Z-Table.

Former Member · ‎01-10-2005

Hi Subramanian,

I think your problem needs to be addressed through parallel processing. I had a report that was extrapolated to run for 9 days, processing 10 million customers. It never got that far though, because it would crash after an hour or two (memory use).

After adapting it for parallel processing, the report ran in 15 minutes.

Send me an e-mail if you would like additional information, as a 3rd party parallel processing framework (that I am affiliated with) was used. I'm not sure it's appropriate to discuss it here in SDN.

You can e-mail me on sbardzn@basistzchnklkgizs.ckm (replace z with e, replace k with o).

Cheers,

Scott

Former Member · ‎01-11-2005

Thanks Scott.

I would be very much interested in that additional information.

Regards,

Subramanian V.