Note: This question is about Sybase Adaptive Server Enterprise 15.x or 16. I'm not fully sure, if this forum is the correct one. If you know a better one, please be nice and direct me to it. Thanks.
We store text data in UTF-8 in Sybase VARCHAR columns with an UNIQUE INDEX on this column. We face a dup key violation when inserting the following two Arabic words:
A deeper analyzing of the code points show in hex:
d8 a8 d9 85 d8 b4 d9 8b d9 92 <---- ARABIC SUKUN d8 b0 d8 a7 d9 84 d8 b3 d9 8a d9 91 <---- ARABIC SHADDA d8 af d8 a8 d9 85 d8 b4 d9 8b d8 b0 d8 a7 d9 84 d8 b3 d9 8a d8 af
i.e. the two words are "nearly" identical and the first word only contain two additional chars ARABIC SUKUN and ARABIC SHADDA which have some meaning in pronunciation of the words (I'm German and have no further knowledge about Arabic, I don't even know what the above words mean. They came in as biographic data into our database.).
Is it possible that Sybase ASE throws away the two codepoints 0xd991 and 0xd992 when creating the INDEX as some kind of normalization? And if so, is this somehow something which could be avoided by configuration.