Skip to Content
1

Data Cluster Mistery: Data Cluster vs. RAWSTRING

Aug 28, 2017 at 04:31 PM

286

avatar image

Hi,
In NW 7.51 SAP converted the "classic" structure of data cluster tables from multiple rows of datatype LRAW to single row of the "new" datatype RAWSTRING (ABAP News for Release 7.51 – New Structure for Export/Import Tables).

Now the question left is what would be the best solution (size efficient) to save hugh datasets in DB in legacy system (earlier NW versions than 7.51):
Using old (multiple rows) data cluster table or a simple DB table with single RAWSTRING field?

I have done some tests and the results were quite confusing.
I’ve saved an internal table of MARA with 5000 records.
For conversion to XSTRING I used the EXPORT TO DATA BUFFER (COMPRESSION ON) command.

The results were:

  1. Data cluster table for ITAB (not converted to XSTRING) – 664KB.
  2. Data cluster table for converted XSTRING – 272KB.
  3. DB table with RAWSTRING – 536KB.

Is there any explanation for it?

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

2 Answers

Horst Keller
Aug 29, 2017 at 07:22 AM
2

According to the following program, the length of the data clusters is exactly the same for an internal table stored in a data buffer or in the CLUSTD fields of the two possible export/import table structures, for compression on and off.

How have you counted? Don't forget, that besides the cluster itself, there are also the administrative columns in export/import table structures. Of course, this overhead is higher for the old tabular storage.

DATA(rnd) = cl_abap_random_int=>create(
               seed = CONV i( sy-uzeit )
               min = 1
               max = 1000 ).
TYPES itab TYPE TABLE OF i WITH EMPTY KEY.
DATA(itab) = VALUE itab( FOR i = 1 UNTIL i > 100000
                         ( rnd->get_next( ) ) ).

DATA(out) = cl_demo_output=>new( ).

out->next_section( 'Compression off' ).

DATA data_buffer TYPE xstring.
EXPORT itab = itab TO DATA BUFFER data_buffer
                   COMPRESSION OFF.

EXPORT itab = itab TO DATABASE demo_indx_blob(rn)
                      ID 'RandomNumbers' COMPRESSION OFF.
SELECT SINGLE clustd
       FROM demo_indx_blob
       WHERE relid = 'RN' AND
             id    = 'RandomNumbers'
       INTO  @DATA(indx_blob_clustd).

EXPORT itab = itab TO DATABASE demo_indx_table(rn)
                   ID 'RandomNumbers' COMPRESSION OFF.
SELECT *
       FROM demo_indx_table
       WHERE relid = 'RN' AND
             id    = 'RandomNumbers'
       ORDER BY srtf2
       INTO TABLE @DATA(indx_table).
DATA(indx_table_clustd) = VALUE xstring( ).
LOOP AT indx_table INTO DATA(indx_table_wa).
  CONCATENATE indx_table_clustd
              indx_table_wa-clustd(indx_table_wa-clustr) 
              INTO indx_table_clustd
              IN BYTE MODE.
ENDLOOP.

out->write( xstrlen( data_buffer )
  )->write( xstrlen( indx_blob_clustd )
  )->write( xstrlen( indx_table_clustd ) ).

CLEAR data_buffer.
DELETE FROM DATABASE demo_indx_table(rn) ID 'RandomNumbers'.
DELETE FROM DATABASE demo_indx_blob(rn)  ID 'RandomNumbers'.

out->next_section( 'Compression on' ).

EXPORT itab = itab TO DATA BUFFER data_buffer
                   COMPRESSION ON.

EXPORT itab = itab TO DATABASE demo_indx_blob(rn)
                      ID 'RandomNumbers' COMPRESSION ON.
SELECT SINGLE clustd
       FROM demo_indx_blob
       WHERE relid = 'RN' AND
             id    = 'RandomNumbers'
       INTO  @indx_blob_clustd.

EXPORT itab = itab TO DATABASE demo_indx_table(rn)
                   ID 'RandomNumbers' COMPRESSION ON.
SELECT *
       FROM demo_indx_table
       WHERE relid = 'RN' AND
             id    = 'RandomNumbers'
       ORDER BY srtf2
       INTO TABLE @indx_table.
indx_table_clustd = VALUE xstring( ).
LOOP AT indx_table INTO indx_table_wa.
  CONCATENATE indx_table_clustd
              indx_table_wa-clustd(indx_table_wa-clustr) 
              INTO indx_table_clustd
              IN BYTE MODE.
ENDLOOP.

out->write( xstrlen( data_buffer )
  )->write( xstrlen( indx_blob_clustd )
  )->write( xstrlen( indx_table_clustd ) ).

DELETE FROM DATABASE demo_indx_table(rn) ID 'RandomNumbers'.
DELETE FROM DATABASE demo_indx_blob(rn)  ID 'RandomNumbers'.

out->display( ).
Show 6 Share
10 |10000 characters needed characters left characters exceeded

Hi Horst,
I've been waiting for your answer :)

Regarding the measuring,

1. Since there isn't any API/command to query the record size in DB (At least not one I'm aware of),
I've created 3 new DB tables and checked the DB table size.
2. I'm not sure I understood the purpose of your testing program.
As far as I understand, all 3 variables should contain exactly the same value (regardless the way they are stored in DB). Hence, all 3 must be exactly in the same size, shouldn't they?
My query was regarding DB size.

0

The test program proves that all three clusters have the same size.

  • EXPORT TO DATA BUFFER, well, you have a XSTRING with the data cluster. If you store it as a RAWSTRING in a self defined database table, you get additional administrative cost of at least one key field.
  • EXPORT TO DATABASE with BLOB. One RAWSTRING in the database plus the predefined administrative columns RELID, key fields, self-defined fields once per data cluster.
  • EXPORT TO DATABASE into RAW field. Split of data cluster over several lines with redundant administrative columns in each line.

So in fact, I don't understand your question. Where is the mystery?

0

Why the size of data cluster table ("classic" multiple rows of LRAW) is smaller than simple DB table with single RAWSTRING field (when both store exactly the same XSTRING value)?

0

Is it? Your above numbers don't say that, or? In fact, from the description it is not that clear what you've done.

1

Well, so it seems (unless I've missed something):

2. Data cluster table ("classic" multiple rows of LRAW) for converted XSTRING – 272KB.

3. Simple DB table with RAWSTRING (for the same converted XSTRING) – 536KB.

0

Just to be complete:

"all 3 variables should contain exactly the same value (regardless the way they are stored in DB)"

Not really. As a rule, the size is about the same. But for optimization reasons, the binary content of the data cluster can differ between those that are stored in one string and those that are splitted over several lines.

Therefore, you must always use the appropriate IMPORT.

0
Horst Keller
Aug 30, 2017 at 05:49 AM
0

To summarize:

  • We speak about export/import tables. Those are transparent database tables, not to be mixed up with the special cluster tables.
  • The statement EXPORT serializes ABAP data into a binary format called data cluster, with or without compression
  • A data cluster can be stored in different media.
  • For storing in an export/import table there are two ways: the whole cluster in one RAWSTRING or the data cluster split up into several LRAW fields.

You claim that storing one and the same binary data in one RAWSTRING uses more space on the database than splitting it in several LRAW fields. If it is true, that finding should be independent from data clusters. You should be able to reproduce it by writing any xstring to a RAWSTRING or to several LRAW fields. Then it is a question to the database. If you can't reproduce it, use Open SQL to examine your data clusters in order to find the difference.

Show 10 Share
10 |10000 characters needed characters left characters exceeded

(Almost) correct.
Sorry about the confusion between Data cluster (export/import) tables and cluster tables.

The question is if it's really database related question (for any RAWSTRING vs. LRAW multiples rows tables) or something in the data cluster export/compression mechanism.

I had to abandon my assumption regarding the XSTRING length and re-checked it in my report.

My actual program consists of two steps:

1. EXPORT itab = itab TO DATA BUFFER data_buffer COMPRESSION ON.

2. EXPORT data_buffer2 = data_buffer TO DATABASE (COMPRESSION ON, which is the default).

I was surprised to found out the following results:

  1. data_buffer size after first step/compression - 484143 bytes.
  2. data buffer2 size after second step/compression (either via EXPORT TO DATABASE or EXPORT TO DATA BUFFER) - 202322 bytes.

i.e. There is additional compression after initial compression from itab to XSTRING.

(That made me wonder if this compression can be executed endlessly :), but here I wasn't surprised again and found out that after third compression data buffer size is increasing every time).

P.S.

After understanding that the difference lies in the additional compression, I've added additional

EXPORT TO DATA BUFFER COMPRESSION ON command and rechecked the results.

The size of simple DB table with RAWSTRING was smaller than data cluster table (240KB), as expected.

0

I also suspected something like this and did the same (double compression for data buffer) in my example, but didn't see any difference.

0

Here my program that doesn't show any "double compression":

DATA(rnd) = cl_abap_random_int=>create(
               seed = CONV i( sy-uzeit )
               min = 1
               max = 1000 ).
TYPES itab TYPE TABLE OF i WITH EMPTY KEY.
DATA(itab) = VALUE itab( FOR i = 1 UNTIL i > 100000
                         ( rnd->get_next( ) ) ).

DATA(out) = cl_demo_output=>new( ).

DATA data_buffer TYPE xstring.
EXPORT itab = itab TO DATA BUFFER data_buffer
                   COMPRESSION ON.
out->write( xstrlen( data_buffer ) ).

DATA data_buffer_again TYPE xstring.
EXPORT buffer = data_buffer TO DATA BUFFER data_buffer_again
                               COMPRESSION ON.
out->write( xstrlen( data_buffer_again ) ).

EXPORT buffer = data_buffer TO DATABASE demo_indx_blob(rn)
                            ID 'RandomNumbers' COMPRESSION ON.
SELECT SINGLE clustd
       FROM demo_indx_blob
       WHERE relid = 'RN' AND
             id    = 'RandomNumbers'
       INTO  @DATA(indx_blob_clustd).
out->write( xstrlen( indx_blob_clustd ) ).

EXPORT buffer = data_buffer TO DATABASE demo_indx_table(rn)
                            ID 'RandomNumbers' COMPRESSION ON.
SELECT *
       FROM demo_indx_table
       WHERE relid = 'RN' AND
             id    = 'RandomNumbers'
       ORDER BY srtf2
       INTO TABLE @DATA(indx_table).
DATA(indx_table_clustd) = VALUE xstring( ).
LOOP AT indx_table INTO DATA(indx_table_wa).
  CONCATENATE indx_table_clustd
              indx_table_wa-clustd(indx_table_wa-clustr)
              INTO indx_table_clustd
              IN BYTE MODE.
ENDLOOP.
out->write( xstrlen( indx_table_clustd ) ).


out->display( ).

2

At least in my test system (NW 7.4 SP9), I do see a difference:

  1. data_buffer - 211913 bytes
  2. data_buffer_again - 211001 bytes.

Anyway, I guess it depends in your dataset.
In my original example, I've used the contents of DB table MARA for itab.

0

Yep, a small difference but not a factor of 2.

0

Please check with larger dataset (MARA table, for example) or even simpler just add a dummy field (CHAR250 with empty values, for example) to ITAB structure.

1

Now, I see it too.

 DATA(rnd) = cl_abap_random_int=>create(
               seed = CONV i( sy-uzeit )
               min = 1
               max = 1000 ).

TYPES:
 char255 type c LENGTH 255,
 BEGIN OF struct,
   col1 TYPE i,
   col2 type char255,
 END OF struct,
 itab TYPE TABLE OF struct WITH EMPTY KEY.
DATA(itab) = VALUE itab( FOR i = 1 UNTIL i > 100000
                         ( col1 = rnd->get_next( ) ) ).
DATA(out) = cl_demo_output=>new( ).

DATA data_buffer TYPE xstring.
EXPORT itab = itab TO DATA BUFFER data_buffer
                   COMPRESSION ON.
out->write( xstrlen( data_buffer ) ).

DATA data_buffer_again TYPE xstring.
EXPORT buffer = data_buffer TO DATA BUFFER data_buffer_again
                               COMPRESSION ON.
out->write( xstrlen( data_buffer_again ) ).

Looks strange.

-> Development

4
Show more comments