Skip to Content

Unicode collation

I read that Unicode preserves the 127 ascii character collating sequence. Yet, as I listed function groups in Unicode DEV and non Unicode PRD for comparison I noticed the underscore "_" character sorts as a lower sequence than alphabetics in Unicode and higher in non Unicode. In ASCII underscore is coded as 97. I haven't found a Unicode table yet showing the numeric assignment to Unicode characters though it would be quite large. This may have unexpected ramifications that I will be watching for. Comment encouraged!

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Sep 20, 2010 at 07:34 AM

    Hi Jeffrey,

    Unicode systems use the ICU (International component for Unicode) library for sorting.

    Please have a look at

    http://userguide.icu-project.org/collation

    for further info.

    If you want to check the sorting independent of ABAP, please test it with

    http://minaret.info/test/sort.msp

    or

    http://demo.icu-project.org/icu-bin/locexp?_=root&d_=de&x=col

    There you will see, that underscore is sorted in a higher sequence than "normal" characters (which is compatible with Non-Unicode).

    In ABAP, you can also copy the program RSCP0102 to customer name range and adapt the words provided by this report.

    Regarding underscore, it will give you the same result as in the links mentioned.

    So somehow the sorting result you experienced in your DEV system might have been based on binary mode - in that case sorting is different.

    Please also have a look at SAP notes 50337 and 952625.

    Best regards,

    Nils Buerckel

    SAP AG

    P.S.

    This link gives you a good description how sorting works in non-Unicode systems:

    /people/hannes.kuehnemund/blog/2008/08/15/sort-varietes-between-operating-systems

    Edited by: Nils Buerckel on Sep 22, 2010 11:16 AM

    Add comment
    10|10000 characters needed characters exceeded