Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

ABAP character encoding conversion

marco-silva
Participant

Hello,

I'm trying to export a data file in encoding ISO-8859-15. This code type allows characters like €, Š, š, Ž, ž, Œ, œ or Ÿ.

I've looked into the ABAP tools available for this purpose, especially thru this great blog from sandra.rossi.

Here you can find my testing code:

*&---------------------------------------------------------------------*
*& Report ZBC_FILE_ENCODING
*&---------------------------------------------------------------------*
*& Encoding test report
*&---------------------------------------------------------------------*

REPORT zbc_file_encoding.

DATA: gv_file              TYPE text255,
      gt_file              LIKE STANDARD TABLE OF gv_file,
      gv_filename          TYPE string,
      gv_path              TYPE string,
      gv_fullpath          TYPE string,
      gv_bin_filesize      TYPE i,
      gt_bin_data          TYPE solix_tab,
      gv_xfile             TYPE xstring,
      gv_sfile             TYPE string,
      gv_sap_codepage      TYPE cpcodepage,
      gv_default_file_name TYPE string,
      gv_external_name     TYPE tcp00a-cpattr,
      go_abap_conv_obj     TYPE REF TO cl_abap_conv_obj,
      gv_incode            TYPE cpcodepage,
      gv_outcode           TYPE cpcodepage.

gv_file = 'A;B;C;é;€;Š;š;Ž;ž;Œ;œ;Ÿ'.
APPEND gv_file TO gt_file.
gv_file = 'D;E;F;é;€;Š;š;Ž;ž;Œ;œ;Ÿ'.
APPEND gv_file TO gt_file.

LOOP AT gt_file INTO gv_file.
  CONCATENATE gv_sfile gv_file cl_abap_char_utilities=>cr_lf INTO gv_sfile.
ENDLOOP.

gv_external_name = 'ISO-8859-15'.

CALL FUNCTION 'SCP_CODEPAGE_BY_EXTERNAL_NAME'
  EXPORTING
    external_name = gv_external_name
  IMPORTING
    sap_codepage  = gv_sap_codepage
  EXCEPTIONS
    OTHERS        = 1.

IF sy-subrc NE 0.
  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.

CALL FUNCTION 'SCP_GET_CODEPAGE_NUMBER'
  EXPORTING
    database_also = ' '
  IMPORTING
    appl_codepage = gv_incode
  EXCEPTIONS
    OTHERS        = 1.

IF sy-subrc NE 0.
  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.

gv_outcode = gv_sap_codepage.

CREATE OBJECT go_abap_conv_obj
  EXPORTING
    incode   = gv_incode
    outcode  = gv_outcode
    miss     = 'S'
    ctrlcode = '.'
  EXCEPTIONS
    OTHERS   = 1.

IF sy-subrc NE 0.
  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.

CALL METHOD go_abap_conv_obj->convert
  EXPORTING
    inbuff    = gv_sfile
    outbufflg = gv_bin_filesize
  IMPORTING
    outbuff   = gv_xfile
  EXCEPTIONS
    OTHERS    = 1.

IF sy-subrc NE 0.
  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.

CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
  EXPORTING
    buffer        = gv_xfile
  IMPORTING
    output_length = gv_bin_filesize
  TABLES
    binary_tab    = gt_bin_data.

CONCATENATE 'test_encoding_' gv_external_name '.csv' INTO gv_default_file_name.

CALL METHOD cl_gui_frontend_services=>file_save_dialog
  EXPORTING
    default_file_name = gv_default_file_name
  CHANGING
    filename          = gv_filename
    path              = gv_path
    fullpath          = gv_fullpath
  EXCEPTIONS
    OTHERS            = 1.

IF sy-subrc NE 0.
  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.

CALL METHOD cl_gui_frontend_services=>gui_download
  EXPORTING
    bin_filesize         = gv_bin_filesize
    filename             = gv_fullpath
    filetype             = 'BIN'
    show_transfer_status = ' '
  CHANGING
    data_tab             = gt_bin_data
  EXCEPTIONS
    OTHERS               = 1.

IF sy-subrc NE 0.
  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.

Unfortunately, at the end, I don't get what I'm expecting:

Notepad++ says that the format is Windows-1252, the special characters are messed up and even the carriage return and line feed are not recognized.

Any idea of what I'm doing wrong and how to achieve my goal?

Thanks in advance for your help.

Best regards,

Marco Silva

1 ACCEPTED SOLUTION

Sandra_Rossi
Active Contributor

It's just a Notepad++ question. It cannot guess efficiently what the actual code page is (the information is not stored) so there's a very simple algorithm to sniff the first bytes and guess with a high risk of false positives.

Manually force Notepad++ to consider it's ISO-8859-15 via the menu, and it will display the characters corresponding to the ISO-8859-15 character set:

PS: thanks for the minimal reproducible example! (no time lost by people who try to answer)

10 REPLIES 10

former_member259807
Active Participant

Hi Marco,

does it have to be ISO-8859-15 or is this provicient as well?

marco-silva
Participant
0 Kudos

Hello,

I have to produce a file that legally should be in the ISO-8859-15 encoding. But I guess the main point is to allow the Euro symbol (€).

Anyway, since I can find the ISO-8859-15 attribute in table TCP00A of SAP for page code 1164, shouldn't be the system capable of creating the file in the right encoding?

Thanks.

Marco

0 Kudos

I guess you will still have the issue of the file itself using the correct encoding, but in N++ at least you can see a correct representation using the encoding.

Update:

this is done by not adding the CRLF to the table lines. No conversion of the table lines. so just supply the table (gt_file) and do the gui_download using codepage 1164.

Sandra_Rossi
Active Contributor

It's just a Notepad++ question. It cannot guess efficiently what the actual code page is (the information is not stored) so there's a very simple algorithm to sniff the first bytes and guess with a high risk of false positives.

Manually force Notepad++ to consider it's ISO-8859-15 via the menu, and it will display the characters corresponding to the ISO-8859-15 character set:

PS: thanks for the minimal reproducible example! (no time lost by people who try to answer)

Thank you Sandra.

So I guess my file is correctly created (despite the line breaks are not correctly interpreted).

Hmm # seems not good. It is # and not CR nor LF. Because of MISS='S' and CTRLCODE='.'? (substitute control characters by SUBSTC which is # by default). Ue CTRLCODE='T' (try)

Anyway, why not using the recommended class CL_ABAP_CODEPAGE?

0 Kudos

You're right. I started with class CL_ABAP_CODEPAGE, but since I thought I wasn't getting the expected result, I tried others and ended up with CL_ABAP_CONV_OBJ. But I'll get back to CL_ABAP_CODEPAGE, it seems to perform correctly the task.

Thanks a lot for your help!

AFAIK, notepad++ use uchardet project to identify code page. Check at https://gitlab.freedesktop.org/uchardet/ if you want to raise an issue 🙂

0 Kudos

Raymond Giuseppi I guess that the only way to identify the code page/character set is to count the number occurrences of each character and compare to a model of statistical character counts per language/character set, so with the given characters it can't work, but with this text in Estonian that should work (I hope): Caron, tuntud ka kui hachek, kiil, tšekk, ümberpööratud ümbermõõt, ümberpööratud müts, on diakriitik.

UPDATE: hmmm, no, it doesn't work. Maybe statistics are not available for Estonian...

0 Kudos

Marco SILVA by the way, why ISO-8859-15, and not UTF-8 (= the first 3 bytes of the file can contain a BOM which identifies that it's a UTF-8 file; it's much more practical than ISO).