This one is a bit tricky, as the behaviour of the LENB function in Excel is "weird".
The idea is that the function would return the number of bytes used for an input string. That already tells us, that this is about how Excel (!) stores the strings - not about strings per se.
So, for LENB the deal is: if the default character set of the operating system (!) supports double-byte characters (DBCS) then the LENB will return 2 times the number of characters in the string as the number of bytes.
If the default character set of the OS does not support DBCS LENB behaves like LEN, so for each character in the string, only one byte is counted.
See LEN, LENB functions for reference.
Now, the problem here is that this has not much to do with how HANA (or other DBMS) store bytes.
HANA uses CESU-8 string encoding which allows for storing only the significant bytes (e.g. characters from the ASCII range don't need more than 1 byte, while characters with larger Unicode point numbers, like the Korean(?) ones you gave as an example, take two or three bytes).
The point here is that by using the
LENGTH(to_binary (... ))
approach you get the actual number of bytes used in CESU-8 to represent the characters.
Two differences here between Excel and HANA:
Excel doesn't use CESU-8 but the DBCS character encoding scheme of the OS.
Due to that, the byte-counting functions for strings are different: CESU-8 can have between 1 and 3 bytes and need to be checked for each character while the Excel version simply assumes two bytes for each character in the string.
Obviously, if you want to emulate the Excel behaviour in HANA, just use
LENGTH (...) *2 as EXCEL_LENB
and you're done.
Another thread with similar discussion
https://archive.sap.com/discussions/thread/682964
Hope it helps
Thanks
Ashraf
maybe you can try this
SELECT LENGTH (to_binary('length in char')) as length FROM DUMMY;
Add comment