on 07-23-2018 9:55 AM
Note: This question is about Sybase Adaptive Server Enterprise 15.x or 16. I'm not fully sure, if this forum is the correct one. If you know a better one, please be nice and direct me to it. Thanks.
We store text data in UTF-8 in Sybase VARCHAR columns with an UNIQUE INDEX on this column. We face a dup key violation when inserting the following two Arabic words:
بمشًْذالسيّد
بمشًذالسيد
A deeper analyzing of the code points show in hex:
d8 a8
d9 85
d8 b4
d9 8b
d9 92 <---- ARABIC SUKUN
d8 b0
d8 a7
d9 84
d8 b3
d9 8a
d9 91 <---- ARABIC SHADDA
d8 af
d8 a8
d9 85
d8 b4
d9 8b
d8 b0
d8 a7
d9 84
d8 b3
d9 8a
d8 af
i.e. the two words are "nearly" identical and the first word only contain two additional chars ARABIC SUKUN and ARABIC SHADDA which have some meaning in pronunciation of the words (I'm German and have no further knowledge about Arabic, I don't even know what the above words mean. They came in as biographic data into our database.).
Is it possible that Sybase ASE throws away the two codepoints 0xd991 and 0xd992 when creating the INDEX as some kind of normalization? And if so, is this somehow something which could be avoided by configuration.
Hi Matthias,
This is probably better handled in a support incident as this is unlikely to be something others have come across. If you want to go that route then please log an incident and ask for me by name in the description and we can take it from there.
If you prefer to handle this here though we can do.
Either way I will need to see the output from sp_helpsort and the DDL of the table and the index.
Cheers,
Andy Ashwood
Senior Support Engineer, SAP
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
86 | |
10 | |
10 | |
9 | |
7 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.