Solved: buffering data using hash table - is this wise?

Former Member · ‎08-22-2007

Hi,

I'm required (for a forecasting Z-development) to fill a huge table with data from sales orders, deliveries, and material master. Since the table is an infostructure and they want to be able to search/report on quite some material master fields, the resulting table has a long key and therefor I need to take extra care that I fill it up correctly.

For performance reasons,

Is it a good idea to use hashed tables (one for each relevant material master table, e.g. MARA and MVKE) as a data buffer in the function module I use to create 'initial' (= key-only) lines for this purpose? or is there a better way to have consistence and performance at the same time?

(the hashed tables are global to the function group, and they're never refreshed by the function module; the function module is called about 20.000 times by the main program. Every time it doesn't find the info it's looking for in the hash table, it fetches it from the database, but first it looks into the hash table)

JozsefSzikszai · ‎08-22-2007

hi Rob,

if you use READ TABLE statement many times on the huge internal table, it is wise to create it as hashed table.

hope this helps

ec

JozsefSzikszai · ‎08-22-2007

hi Rob,

if you use READ TABLE statement many times on the huge internal table, it is wise to create it as hashed table.

hope this helps

ec

Former Member · ‎08-22-2007

Eric,

I realize that. The thing is, the table starts off small (one entry) and is bound to get quite big in the end, since I keep adding every 'page miss' (i.e. record that wasn't in there yet).

I also hope that the lifetime of the table (global in the function group) will be long enough (= complete runtime of the report) so that I don't lose the buffer and have to start over again ('cause fetching all that data again from the database is bound to give a performance hit).

former_member194613 · ‎08-22-2007

hashed tables allow the fastest access of all internal table, but you need always the full table key.

I do not really understand your question:

> Is it a good idea to use hashed tables (one for each relevant material master

> table, e.g. MARA and MVKE) as a data buffer in the function module I use to

> create 'initial' (= key-only) lines for this purpose?

How many tables do you want to create, what are the keys, how do you access,

how many lines do you expect?

What is the problem?

Siegfried

Former Member · ‎08-24-2007

Is it safe to assume that the most performant way to give a function module a sort of 'look-aside buffer' (i.e. to limit the number of actual database reads), is to provide it with a global hash table.

safe means also: can I be sure the function module's global hash table won't be lost unless I clear it myself, as long as the main program keeps running. (it's perfectly allright, of course, that the hash table gets deleted when the main program stops running)

former_member194613 · ‎08-24-2007

the hashed key is available as long as the hashed table is there.

Note, a collect on a standard table gets an implicit hashed key, but you not change the standard as long as you use the collect. Chanes will detroy the implicit hashed key which makes the collect slow. But that is something different.

Hashed tables are the best choice, if know that you access always single lines, and you know the full unique table key.

Sorted tables are the best choice when you want to access key ranges, i.e. using onyl a part of the key (leading part).

Siegfried

Former Member · ‎08-24-2007

There are a number of issues to consider:

Hashed table is efficient way to store data buffer, but filling it one record at a time each time there is a miss may not be best way to fill it - an initial select to populate it with all required values will be much more efficient in terms of database access if you are able to define an appropriate where block. If you are processing all or most of a table reading it all at one go will be quickest.

Total memory available could be an issue. 64 bit platform gives you the most memory, but there are a number of system imposed limits which could mean this is still not enough.

Are you buffering only the data you read, or also buffering the data you will be writing to the new table? The buffering of data to write and then inserting at one time may cause issues - you may need to commit updates every few 1000 records, and this may result in memory being freed up. trying too many inserts between commits may cause database to run out of space for rollback logging. Also, depending on your database, there may be space issues for redo logs.

You could use a CLASS instance with METHODS instead of FUNCTION GROUP / MODULE - not sure which uses memory most efficiently or if there is any difference. With object instance you definately control when it frees up memory - it retains it as long as instantiated.

How many material master records do you have? 200,000 such records could easily be read into an internal table and held for lookup - with multiple fields. eg 1000 bytes of fields would only require approx 200MB memory - quite small really. In 32 bit internal tables have most of 2GB available in total I think - 64 bit has more.

The overhead of communication between database and application server means that selecting each record individually as you need it would probably be over 10 times slower than just selecting everything up front. If you select everything and dont need it all, you are effectively trading a bit of memory for a performance improvement.

Processing Sales Orders and Deliveries against the material master would be best done in packets using PACKAGE SIZE addition to limit total records processed at a point in time.

Hope this is of some help.

Andrew

Former Member · ‎08-24-2007

Thanks andrew,

I found out that the way I did it before (fill up the database beforehand with one giant select statement) was very slow because I had to do it with a 'for all entries' statement...

here's the 'pseudocode':

select 1:

select ... from vbak

inner join vbap...

inner join mara... "to rule out some unnecessary materials

where ... in select_options (+- 1 year time window!!!)

select 2:

select ... from mara

inner join mvke

for all entries in previous_table

where matnr = mara-matnr and

vkorg = (...) and

vtweg = (...)

This was excrutiatingly slow, because there were massively much more lines in vbak/vbap than in mvke/mara. 'For all entries' gets quite slow at times That's where the idea of a function module with its own buffer came from. Now I only do the first select (slow but ok), and the function module gathers its own data. It was not possible for me to restrict the master data enough otherwise... and the buffering method is sure to have a lot more hits than misses.

Former Member · ‎08-24-2007

Speed of FOR ALL ENTRIES varies according to database platform and some parameter profiles.

I beleive the code is converted into one of two possible WHERE block formats by the application server (may be more options - I have only seen the two):

1. WHERE matnr = value_1 OR matnr = value_2 OR matnr = value_3...value_n

or

2. WHERE matnr in (value_1, value_2, value_3, ... value_m)

In option 1 "n" can be very low - around 10 in some cases so DB optimiser is not confused into full table scan.

In option 2, "m" is likely to be a few hundred, and this is a lot quicker, but not as widely supported as option 1.

There are OSS notes on this and the related parameter settings. Looking at an SQL trace of a SELECT ... FOR ALL ENTRIES will show which is being used.

as you have described this may not be relevant due to complexity of your selections.

Andrew