Skip to Content
avatar image
Former Member

buffering data using hash table - is this wise?

Hi,

I'm required (for a forecasting Z-development) to fill a huge table with data from sales orders, deliveries, and material master. Since the table is an infostructure and they want to be able to search/report on quite some material master fields, the resulting table has a long key and therefor I need to take extra care that I fill it up correctly.

For performance reasons,

Is it a good idea to use hashed tables (one for each relevant material master table, e.g. MARA and MVKE) as a data buffer in the function module I use to create 'initial' (= key-only) lines for this purpose? or is there a better way to have consistence and performance at the same time?

(the hashed tables are global to the function group, and they're never refreshed by the function module; the function module is called about 20.000 times by the main program. Every time it doesn't find the info it's looking for in the hash table, it fetches it from the database, but first it looks into the hash table)

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

4 Answers

  • Best Answer
    Aug 22, 2007 at 11:28 AM

    hi Rob,

    if you use READ TABLE statement many times on the huge internal table, it is wise to create it as hashed table.

    hope this helps

    ec

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Eric,

      I realize that. The thing is, the table starts off small (one entry) and is bound to get quite big in the end, since I keep adding every 'page miss' (i.e. record that wasn't in there yet).

      I also hope that the lifetime of the table (global in the function group) will be long enough (= complete runtime of the report) so that I don't lose the buffer and have to start over again ('cause fetching all that data again from the database is bound to give a performance hit).

  • Aug 22, 2007 at 02:58 PM

    hashed tables allow the fastest access of all internal table, but you need always the full table key.

    I do not really understand your question:

    > Is it a good idea to use hashed tables (one for each relevant material master

    > table, e.g. MARA and MVKE) as a data buffer in the function module I use to

    > create 'initial' (= key-only) lines for this purpose?

    How many tables do you want to create, what are the keys, how do you access,

    how many lines do you expect?

    What is the problem?

    Siegfried

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Is it safe to assume that the most performant way to give a function module a sort of 'look-aside buffer' (i.e. to limit the number of actual database reads), is to provide it with a global hash table.

      safe means also: can I be sure the function module's global hash table won't be lost unless I clear it myself, as long as the main program keeps running. (it's perfectly allright, of course, that the hash table gets deleted when the main program stops running)

  • Aug 24, 2007 at 12:28 PM

    the hashed key is available as long as the hashed table is there.

    Note, a collect on a standard table gets an implicit hashed key, but you not change the standard as long as you use the collect. Chanes will detroy the implicit hashed key which makes the collect slow. But that is something different.

    Hashed tables are the best choice, if know that you access always single lines, and you know the full unique table key.

    Sorted tables are the best choice when you want to access key ranges, i.e. using onyl a part of the key (leading part).

    Siegfried

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Aug 24, 2007 at 02:41 PM

    There are a number of issues to consider:

    Hashed table is efficient way to store data buffer, but filling it one record at a time each time there is a miss may not be best way to fill it - an initial select to populate it with all required values will be much more efficient in terms of database access if you are able to define an appropriate where block. If you are processing all or most of a table reading it all at one go will be quickest.

    Total memory available could be an issue. 64 bit platform gives you the most memory, but there are a number of system imposed limits which could mean this is still not enough.

    Are you buffering only the data you read, or also buffering the data you will be writing to the new table? The buffering of data to write and then inserting at one time may cause issues - you may need to commit updates every few 1000 records, and this may result in memory being freed up. trying too many inserts between commits may cause database to run out of space for rollback logging. Also, depending on your database, there may be space issues for redo logs.

    You could use a CLASS instance with METHODS instead of FUNCTION GROUP / MODULE - not sure which uses memory most efficiently or if there is any difference. With object instance you definately control when it frees up memory - it retains it as long as instantiated.

    How many material master records do you have? 200,000 such records could easily be read into an internal table and held for lookup - with multiple fields. eg 1000 bytes of fields would only require approx 200MB memory - quite small really. In 32 bit internal tables have most of 2GB available in total I think - 64 bit has more.

    The overhead of communication between database and application server means that selecting each record individually as you need it would probably be over 10 times slower than just selecting everything up front. If you select everything and dont need it all, you are effectively trading a bit of memory for a performance improvement.

    Processing Sales Orders and Deliveries against the material master would be best done in packets using PACKAGE SIZE addition to limit total records processed at a point in time.

    Hope this is of some help.

    Andrew

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member Former Member

      Speed of FOR ALL ENTRIES varies according to database platform and some parameter profiles.

      I beleive the code is converted into one of two possible WHERE block formats by the application server (may be more options - I have only seen the two):

      1. WHERE matnr = value_1 OR matnr = value_2 OR matnr = value_3...value_n

      or

      2. WHERE matnr in (value_1, value_2, value_3, ... value_m)

      In option 1 "n" can be very low - around 10 in some cases so DB optimiser is not confused into full table scan.

      In option 2, "m" is likely to be a few hundred, and this is a lot quicker, but not as widely supported as option 1.

      There are OSS notes on this and the related parameter settings. Looking at an SQL trace of a SELECT ... FOR ALL ENTRIES will show which is being used.

      as you have described this may not be relevant due to complexity of your selections.

      Andrew