cancel
Showing results for 
Search instead for 
Did you mean: 

UTF-8 in Sybase ASE char columns && arabic codepoints

Former Member
0 Kudos

Note: This question is about Sybase Adaptive Server Enterprise 15.x or 16. I'm not fully sure, if this forum is the correct one. If you know a better one, please be nice and direct me to it. Thanks.

We store text data in UTF-8 in Sybase VARCHAR columns with an UNIQUE INDEX on this column. We face a dup key violation when inserting the following two Arabic words:

بمشًْذالسيّد
بمشًذالسيد

A deeper analyzing of the code points show in hex:

d8  a8
d9  85
d8  b4
d9  8b
d9  92  <---- ARABIC SUKUN  
d8  b0
d8  a7
d9  84
d8  b3
d9  8a
d9  91  <---- ARABIC SHADDA 
d8  af

d8  a8
d9  85
d8  b4
d9  8b
d8  b0
d8  a7
d9  84
d8  b3
d9  8a
d8  af

i.e. the two words are "nearly" identical and the first word only contain two additional chars ARABIC SUKUN and ARABIC SHADDA which have some meaning in pronunciation of the words (I'm German and have no further knowledge about Arabic, I don't even know what the above words mean. They came in as biographic data into our database.).

Is it possible that Sybase ASE throws away the two codepoints 0xd991 and 0xd992 when creating the INDEX as some kind of normalization? And if so, is this somehow something which could be avoided by configuration.

Accepted Solutions (0)

Answers (1)

Answers (1)

Hi Matthias,

This is probably better handled in a support incident as this is unlikely to be something others have come across. If you want to go that route then please log an incident and ask for me by name in the description and we can take it from there.

If you prefer to handle this here though we can do.

Either way I will need to see the output from sp_helpsort and the DDL of the table and the index.

Cheers,
Andy Ashwood
Senior Support Engineer, SAP