Solved: Concurrent compression streams with CL_ABAP_GZIP_T...

former_member602713 · ‎04-03-2019

Hi,

When using CL_ABAP_GZIP_TEXT_STREAM, it requires a buffer interface if_abap_gzip_text_handler~use_out_buf which is defined as static.

This works fine when I need a single compression stream. E.g. looping through a single table, writing data out using a single gzip stream.

However, when I need multiple independent concurrent compression streams (writing records to multiple files concurrently), then this does not work as the user_outbuf class has static variables / methods, meaning the buffer values would get mixed together. E.g. looping through one table writing its record out, and querying another table to get related records and writing out to a different compression stream of the same type user_outbuf.

Is there a cleaner way to solve this problem, rather than redefining the same class with a different name over and over again to get concurrency support? e.g. (replicate the class definitions / implementations), then reference:
uref1 TYPE REF TO user_outbuf1,
uref2 TYPE REF TO user_outbuf2,
etc

Thanks,
Jay 🙂

Example code:

REPORT TEST.

" Define a buffer handler class
CLASS user_outbuf DEFINITION.
  PUBLIC SECTION.
    INTERFACES if_abap_gzip_text_handler.
    CLASS-DATA:
         buffer     TYPE x LENGTH 1000, " predefine size of the buffer
         buffer_len TYPE i VALUE -1. " -1 means the total length of buffer
ENDCLASS.

CLASS user_outbuf IMPLEMENTATION.
  METHOD if_abap_gzip_text_handler~use_out_buf.
   WRITE: / buffer.
  ENDMETHOD.
ENDCLASS.

START-OF-SELECTION.

DATA:
      uref       TYPE REF TO user_outbuf,
      uref2      TYPE REF TO user_outbuf,
      csref      TYPE REF TO CL_ABAP_GZIP_TEXT_STREAM.

" create a copy of the buffer
CREATE OBJECT uref.
" create a copy of the gzip compression class
CREATE OBJECT csref
   EXPORTING  CONVERSION      = 'DEFAULT'
              OUTPUT_HANDLER  = uref.
" setup the buffer
csref->set_out_buf(
  IMPORTING
      out_buf   = uref->buffer
      out_buf_len = uref->buffer_len
).
" compress some data
CALL METHOD  csref->compress_text_stream
    EXPORTING
      TEXT_IN = 'Some text'
      TEXT_IN_LEN = -1.
CALL METHOD  csref->compress_text_stream_end
    EXPORTING  TEXT_IN        = 'Last text'
               TEXT_IN_LEN    = -1.
" create a new buffer, which, 
"  due to if_abap_gzip_text_handler~use_out_buf being static
"  the buffer variables also have to be static, therefore,
"  cannot be concurrently reused in another compression instance
CREATE OBJECT uref2.
WRITE: /  uref2->buffer.

Sandra_Rossi · ‎04-03-2019

Your code doesn't reflect your question, as it doesn't have parallel compression.
You receive the GZIP instance (parameter GZIP_STREAM), so if you retain which output handler corresponds to which GZIP instance, you'll be able to determine the right output handler (I admit that the SAP design is weird).
Why do you need to compress parallely?

jrodriguezferna · ‎04-03-2019

I'm not sure that I understand your problem, but I think that this code could be help you to solve.

report zzzjc_test01.

" Define a buffer handler class
class user_outbuf definition.
  public section.
    interfaces if_abap_gzip_text_handler.
    class-data:
      buffer     type x length 1000, " predefine size of the buffer
      buffer_len type i value -1. " -1 means the total length of buffer
endclass.

class user_outbuf implementation.
  method if_abap_gzip_text_handler~use_out_buf.
    write: / buffer.
  endmethod.
endclass.

start-of-selection.

  types begin of lty_list_buffers.
  types uref  type ref to user_outbuf.
  types csref type ref to cl_abap_gzip_text_stream.
  types end   of lty_list_buffers.

  types lty_list_buffers_tt type standard table of lty_list_buffers with empty key.

  data lt_list_buffers type lty_list_buffers_tt.

  " your input text as an itenal table
  data lt_input_text type standard table of string with empty key.

  loop at lt_input_text assigning field-symbol(<ls_input_text>).

    append initial line to lt_list_buffers assigning field-symbol(<ls_list_buffers>).

    <ls_list_buffers>-uref = new user_outbuf( ).

    <ls_list_buffers>-csref
        = new cl_abap_gzip_text_stream(
            conversion = 'DEFAULT'
            output_handler = <ls_list_buffers>-uref ).

    <ls_list_buffers>-csref->set_out_buf(
        importing
        out_buf     = <ls_list_buffers>-uref->buffer
        out_buf_len = <ls_list_buffers>-uref->buffer_len
    ).

    <ls_list_buffers>-csref->compress_text_stream(
      exporting
        text_in     = <ls_input_text>
        text_in_len = -1
    ).

    <ls_list_buffers>-csref->compress_text_stream_end(
      exporting
        text_in     = <ls_input_text>
        text_in_len = -1
    ).

  endloop.

  loop at lt_list_buffers assigning <ls_list_buffers>.
    write / <ls_list_buffers>-uref->buffer.
  endloop.

Sandra_Rossi · ‎04-03-2019

Your code doesn't reflect your question, as it doesn't have parallel compression.
You receive the GZIP instance (parameter GZIP_STREAM), so if you retain which output handler corresponds to which GZIP instance, you'll be able to determine the right output handler (I admit that the SAP design is weird).
Why do you need to compress parallely?

Sandra_Rossi · ‎04-04-2019

Example to demonstrate the point #2, by using a static internal table GZIPPERS. My code probably looks complex, but I tried to make it reusable (in fact right now I don't consider it a good reusable class, I'd like to rewrite it completely but I don't have time right now ;-)).

" PART 1 : REUSABLE CODE
INTERFACE lif_gzip_output_handler_new.
  METHODS use_out_buf
    IMPORTING
      out_buf     TYPE xsequence
      out_buf_len TYPE i DEFAULT 0
      part        TYPE i.
  METHODS get_out_buf EXPORTING ref_out_buf TYPE REF TO data ref_out_buf_len TYPE REF TO data.
ENDINTERFACE.
CLASS lcl_gzip_text_stream_new DEFINITION.
  PUBLIC SECTION.
    INTERFACES if_abap_gzip_text_handler.
    TYPES : BEGIN OF ty_zipper,
              gzip_stream    TYPE REF TO cl_abap_gzip_text_stream,
              output_handler TYPE REF TO lif_gzip_output_handler_new,
            END OF ty_zipper,
            ty_zippers TYPE HASHED TABLE OF ty_zipper WITH UNIQUE KEY gzip_stream.
    METHODS constructor
      IMPORTING
        ref_string     TYPE REF TO string
        in_buf_len     TYPE i DEFAULT 200
        compress_level TYPE i DEFAULT 6
        conversion     TYPE abap_encod DEFAULT 'DEFAULT'
        use_outbuf     TYPE REF TO lif_gzip_output_handler_new.
    METHODS next_chunk.
    DATA: done TYPE abap_bool READ-ONLY.
  PRIVATE SECTION.
    CLASS-DATA:
      gzippers TYPE ty_zippers.
    DATA:
      ref_string TYPE REF TO string,
      in_buf_len TYPE i,
      in_offset  TYPE i,
      gzipper    TYPE REF TO cl_abap_gzip_text_stream.
ENDCLASS.
CLASS lcl_gzip_text_stream_new IMPLEMENTATION.

  METHOD constructor.
    me->ref_string = ref_string.
    me->in_buf_len = in_buf_len.
    done = abap_false.
    gzipper = NEW cl_abap_gzip_text_stream(
        compress_level = compress_level
        conversion     = conversion
        output_handler = me ).
    use_outbuf->get_out_buf( IMPORTING ref_out_buf = DATA(ref_out_buf) ref_out_buf_len = DATA(ref_out_buf_len) ).
    ASSIGN ref_out_buf->* TO FIELD-SYMBOL(<out_buf>).
    ASSIGN ref_out_buf_len->* TO FIELD-SYMBOL(<out_buf_len>).
    gzipper->set_out_buf( IMPORTING out_buf = <out_buf> out_buf_len = <out_buf_len> ).
    INSERT VALUE ty_zipper( gzip_stream = gzipper output_handler = use_outbuf ) INTO TABLE gzippers.
  ENDMETHOD.

  METHOD next_chunk.
    CHECK done = abap_false.
    ASSIGN ref_string->* TO FIELD-SYMBOL(<string>).
    IF in_offset >= strlen( <string> ).
      done = abap_true.
      RETURN.
    ENDIF.


    IF in_offset + in_buf_len < strlen( <string> ).
      DATA(chunk) = <string>+in_offset(in_buf_len).
      gzipper->compress_text_stream( text_in = chunk ).
      ADD in_buf_len TO in_offset.
    ELSE.
      chunk = <string>+in_offset.
      gzipper->compress_text_stream_end( text_in = chunk ).
      in_offset = strlen( <string> ).
      done = abap_true.
    ENDIF.
  ENDMETHOD.

  METHOD if_abap_gzip_text_handler~use_out_buf.
    DATA(user_outbuf) = gzippers[ gzip_stream = gzip_stream ]-output_handler.
    user_outbuf->use_out_buf(
        out_buf     = out_buf
        out_buf_len = out_buf_len
        part        = part ).
  ENDMETHOD.
ENDCLASS.

" PART 2 : DEMO CODE
CLASS lcl_use_outbuf DEFINITION.
  PUBLIC SECTION.
    INTERFACES lif_gzip_output_handler_new.
    DATA: gzip_data   TYPE xstring READ-ONLY,
          out_buf     TYPE x LENGTH 100,
          out_buf_len TYPE i VALUE -1.
ENDCLASS.
CLASS lcl_use_outbuf IMPLEMENTATION.
  METHOD lif_gzip_output_handler_new~use_out_buf.
    gzip_data = gzip_data && out_buf(out_buf_len).
  ENDMETHOD.
  METHOD lif_gzip_output_handler_new~get_out_buf.
    ref_out_buf = REF #( out_buf ).
    ref_out_buf_len = REF #( out_buf_len ).
  ENDMETHOD.
ENDCLASS.

START-OF-SELECTION.
  DATA(a) = `Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod `
         && `tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim ven`
         && `iam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea com`
         && `modo consequat. Duis aute irure dolor in reprehenderit in voluptate veli`
         && `t esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat c`
         && `upidatat non proident, sunt in culpa qui officia deserunt mollit anim id`
         && ` est laborum.`.

  DATA(use_outbuf1) = NEW lcl_use_outbuf( ).
  DATA(use_outbuf2) = NEW lcl_use_outbuf( ).
  DATA(compressor1) = NEW lcl_gzip_text_stream_new( ref_string = REF #( a ) in_buf_len = 90 use_outbuf = use_outbuf1 ).
  DATA(compressor2) = NEW lcl_gzip_text_stream_new( ref_string = REF #( a ) in_buf_len = 60 use_outbuf = use_outbuf2 ).
  WHILE compressor1->done = abap_false OR compressor2->done = abap_false.
    compressor1->next_chunk( ).
    DO 2 TIMES.
      compressor2->next_chunk( ).
    ENDDO.
  ENDWHILE.

  DATA a_again TYPE string.
  cl_abap_gzip=>decompress_text( EXPORTING gzip_in = use_outbuf1->gzip_data IMPORTING text_out = a_again ).
  ASSERT a_again = a.
  cl_abap_gzip=>decompress_text( EXPORTING gzip_in = use_outbuf2->gzip_data IMPORTING text_out = a_again ).
  ASSERT a_again = a.

former_member602713 · ‎04-25-2019

An interesting approach using a "shim" layer in between to select the correct stream / buffer, which does seem to work! Thanks for the example 😄

former_member602713 · ‎04-03-2019

Hi Sandra, Juan,

Thanks for responding, much appreciated. 🙂

>> point 1 - code doesn't reflect the question

The code was meant to demonstrate the new buffer reference uref2 (please note the 2 at the end of the name) created after uref is used provides the same value back for the buffer value, thereby, demonstrating they can't be used concurrently due to the static variable buffer. If two or more compression streams were created concurrently, then their data would intertwine and become corrupt.

>> point 3 - why do I need to compress in parallel

Say I need to extract specific records from both BKPF (Journal Headers) and BSEG (Journal Entries), I first query BKPF and loop through each record, but for reach record, I query the associated journal lines from BSEG for the given BKPF header. Both tables are streamed simultaneously (concurrently) to different files aptly named BKPF.csv.gz and BSEG.csv.gz...

>> Juan's example

Yes, your example works when writing serially to independent files, but would not work if we need to write simultaneously to different files as they all use the same class user_outbuf which uses a static class-data buffer, meaning all instances share the same buffer variable value (as demonstrated by my example - last 2 lines).

>> Sandra's point 2 - the SAP design is weird

Yes, I agree. Looks like its designed to be used once at any time with the same buffer class.

>> Thoughts / options thus far:

1) Define a buffer class for every instance we intend to use concurrently (bkpf_outbuf, bseg_outbuf, etc). This does not feel like good OO design, especially if we don't know how many levels (classes) we will need until runtime.

2) Take a local copy of the SAP package CL_ABAP_GZIP_TEXT_STREAM and tweak the interface so that its not static, then this problem should go away (I hope).

Thanks,
Jay 🙂

former_member602713 · ‎04-04-2019

Quick update on this, attempted to clone the CL_ABAP_GZIP_TEXT_STREAM package, but it has kernel method references which requires editing of the abkmeth.seg file (https://help.sap.com/doc/abapdocu_752_index_htm/7.52/en-US/abenkernel_methods.htm) (can view definitions through RSKMETH) to be enabled in the new package. Unfortunately, this does not appear to be Transport friendly.

Instead, will try the other suggested option, defining a buffer class per table name. 😞

Concurrent compression streams with CL_ABAP_GZIP_TEXT_STREAM