11-15-2016 2:30 PM
Hi all,
SAP NetWeaver 7.4, ERP 6.0 EHP7.
I have a UTF-8 file that contains the character 'Ñ' represented as a double-byte character - 0xC391.
The file is placed on the application server and an ABAP program then opens and reads the file. The OPEN statement specifies ENCODING UTF-8.
The file does not have a BOM at the start and when it reads the records containing 'Ñ', it takes each of the two bytes separately and incorrectly interprets them as 'Ñ'.
I have inserted a call to method CL_ABAP_FILE_UTILITIES=>CHECK_UTF8 and the file is recognised as being UTF-8.
If I open the file in Notepad and save it with encoding UTF-8, a BOM is inserted in the first three bytes of the file. If I then use this file as input the program correctly interprets the two bytes as 'Ñ'.
It is optional to have a BOM in a UTF-8 file and I have not come across any documentation stating that a BOM is required in a UTF-8 file for it to be correctly interpreted in ABAP.
Is it necessary to have a BOM in a UTF-8 file if it contains double-byte characters?
Are there any ways of dealing with this situation in ABAP rather than having to insert a BOM before processing the file?
Thanks in advance.
Billy Johnson
11-15-2016 3:13 PM
BOM is optional.
Regards,
Raymond
11-15-2016 3:56 PM
Hello Raymond,
Apologies for misleading you. It is '0xC3' '0x91' rather than '0xC391'.
TEXT MODE was specified and it was tried with and without 'SKIPPING BYTE-ORDER MARK'.
Regards
Billy
11-15-2016 8:46 PM
"It is '0xC3' '0x91' rather than '0xC391'" : it's the same thing, i.e. 2 bytes
11-16-2016 6:56 AM
In which kind of data do you READ the DATASET, what do you see thru AL11 when clicking on file?
Regards,
Raymond
11-16-2016 7:31 AM
11-15-2016 9:01 PM
à is the Unicode character U+00C3 (0xC383 in UTF-8 - https://fr.wikipedia.org/wiki/%C3%83 ), and also 0xC3 in Latin 1, etc.
As you have run CHECK_UTF8 successfully, I guess your file is UTF-8 and is read correctly, but there's a subsequent misinterpretation.
Which byte values do you get in the ABAP debugger right after READ DATASET?