Solved: Character conversion: Unicode to non-unicode

jack_graus2 · ‎11-02-2017

There are lot of post on this subject. But none of them make clear to me how it works.

Data in SAP is maintained in Unicode format. But some external partners require the data to be converted into a different non-Unicode character set.

Example:

In SAP we have a character ß = C39F.

Our interface partner expects character set CP437. In this character set the code for character ß = E1.

So I expect SAP Unicode character C39F to be converted into character E1.

How to accomplish ?

Regards Jack

lo_xml            = cl_ixml=>create( ).
lo_encoding       = lo_xml->create_encoding( byte_order = 0 character_set = 'CP437' ).
lo_document       = lo_xml->create_document( ).
lo_stream_factory = lo_xml->create_stream_factory( ).
lo_stream         = lo_stream_factory->create_ostream_xstring( lv_data ).
lo_stream->set_encoding( lo_encoding ).
lo_renderer       = lo_xml->create_renderer( ostream = lo_stream document = lo_document  ).
lo_element        = lo_document->create_element_ns( `data` ).
lo_element->set_attribute_ns( name = 'attribute' value = `AAAAßßßß` ).
lo_document->append_child( lo_element ).
lo_renderer->render( ).

Sandra_Rossi · ‎11-02-2017

Your code is correct, but a few code pages are not fully described in table TCP00A (Relationship between standardized name and SAP code page number). For instance, it would work with code page "iso-8859-1".

To complete TCP00A, you may see an example with GB18030 in note 1901768 - code page undefined for gb18030 .

Go to transaction SCP. Enter code page 1107 (SAP code page for CP437).

Change the code page.

Add attribute H 0001 CP437 and SAVE.

Try your program again.

.

EDIT: the Eszett character is the Unicode character U+00DF. C39F is its code value in UTF-8. When you indicate an invalid or unknown encoding (in TCP00A), the encoding is ignored, so UTF-8 will be used with a rendering to an XSTRING variable.

Sandra_Rossi · ‎11-02-2017

Your code is correct, but a few code pages are not fully described in table TCP00A (Relationship between standardized name and SAP code page number). For instance, it would work with code page "iso-8859-1".

To complete TCP00A, you may see an example with GB18030 in note 1901768 - code page undefined for gb18030 .

Go to transaction SCP. Enter code page 1107 (SAP code page for CP437).

Change the code page.

Add attribute H 0001 CP437 and SAVE.

Try your program again.

.

EDIT: the Eszett character is the Unicode character U+00DF. C39F is its code value in UTF-8. When you indicate an invalid or unknown encoding (in TCP00A), the encoding is ignored, so UTF-8 will be used with a rendering to an XSTRING variable.

jack_graus2 · ‎11-02-2017

That solves the problem.

That also explain to me the missing link between code page and character set.

Most characters are now correctly converted into the destination character set. Characters that are missing are converted into the missing conversion character which is '#' by default. That is all fine.

A next conversion in the 'nice to have category' would be a conversion of character Ë. Character Ë is not available in the destination character set. So it gets converted into #. Nice to have is to have it converted into E. Do you know this is possible ?

Thanks and regards

Sandra_Rossi · ‎04-29-2019

Sorry I missed your question (please use Comment instead of Answer so that I'm automatically informed, or copy/paste my name).

Yes it's possible with SCP, but it may be a little bit complex if you're not used to work with code pages. Another solution is to first replace the accentuated characters with the function module SCP_REPLACE_STRANGE_CHARS, then do the code page conversion.