Skip to Content
avatar image
Former Member

Strange Character Issue in CSV file

Hello Experts,

My scenario is Proxy to File. Receiver file Should be CSV file with Pipe delimited. Sending customer information from ECC to PI to File. they have to send different countries customer details(Ex: they were sending russian, turkish etc..). to handle the special characters initially we have used UTF-8 but its failed to convert russian and turkish characters after that we used UTF-16, this encoding format handled all the special characters.but problem is added some special character to first field value(þÿ1200101) in CSV file. actually receiver application accept only UTF-8 and UTF-16 formats only. i tried use UTF-16LE but application does not support. could you please help me how to remove the special character.is there any way to handle it by wrriting UDF to handle this with out Java mapping. Kindly do the needful.

Thank you.

Regards,

Sanjay.

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

2 Answers

  • Jan 01, 2015 at 04:13 PM

    Sanjay,

    All characters (Russian, Turkish....) can be encoded in UTF-8. UTF-8 is the de facto standard. Please use UTF-8 in SAP PI. Please set UTF-8 encoding in SAP Proxy (set in SM59 RFC).

    If þÿ are removed in middle-ware, it is data loss.

    FYI. "Thy" meaning:- archaic or dialect form of "your". Unicode.


    Iceland.PNG (27.6 kB)
    Add comment
    10|10000 characters needed characters exceeded

    • Sanjay,

      I agree with @Stefan Grube.

      BOM are add to starting of text stream or file, to give heads-up to text editor about encoding.

      UTF-8 - BOM - 0xEF,0xBB,0xBF (but if text editor does not support UTF-8, it is displayed as ).

      UTF-16 - BOM - U+FEFF (but if text editor does not support UFT-16, it is displayed as þÿ ÿþ, depending on endianness).

      Unicode standard does not recommend using BOM.

      Please use UFT-8. There is no character which cannot be represented in UTF-8.

      Please use foxe editor as your default editor (do not use Microsoft notepad, as BOM is add to starting of files).

  • Jan 01, 2015 at 11:39 PM

    This is a byte order mark, which is standard part of UTF-16.

    Byte order mark - Wikipedia, the free encyclopedia

    If  the receiver does not support UTF-16LE, you could try UTF-16BE instead.

    I wonder why russian or turkish letters fails with UTF-8, as the characters are part of UTF-8.

    Are you checking the file with an editor that is able to display UTF-8 characters correctly?

    Do not use Microsoft Notepad for this purpose!

    Add comment
    10|10000 characters needed characters exceeded