on 01-09-2019 5:14 AM
HI.
I want to get the string length as BYTE in HANA.HANA seems to have no function like Datalength in MSSQL.
How can I get the length of a string in bytes?
Thanks.
This one is a bit tricky, as the behaviour of the LENB function in Excel is "weird".
The idea is that the function would return the number of bytes used for an input string. That already tells us, that this is about how Excel (!) stores the strings - not about strings per se.
So, for LENB the deal is: if the default character set of the operating system (!) supports double-byte characters (DBCS) then the LENB will return 2 times the number of characters in the string as the number of bytes.
If the default character set of the OS does not support DBCS LENB behaves like LEN, so for each character in the string, only one byte is counted.
See LEN, LENB functions for reference.
Now, the problem here is that this has not much to do with how HANA (or other DBMS) store bytes.
HANA uses CESU-8 string encoding which allows for storing only the significant bytes (e.g. characters from the ASCII range don't need more than 1 byte, while characters with larger Unicode point numbers, like the Korean(?) ones you gave as an example, take two or three bytes).
The point here is that by using the
LENGTH(to_binary (... ))
approach you get the actual number of bytes used in CESU-8 to represent the characters.
Two differences here between Excel and HANA:
Excel doesn't use CESU-8 but the DBCS character encoding scheme of the OS.
Due to that, the byte-counting functions for strings are different: CESU-8 can have between 1 and 3 bytes and need to be checked for each character while the Excel version simply assumes two bytes for each character in the string.
Obviously, if you want to emulate the Excel behaviour in HANA, just use
LENGTH (...) *2 as EXCEL_LENB
and you're done.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi lars.breddemann ,
I don't think LENGTH(...)*2 would emulate the excel behaviour,
the result hyesoo provide is 25~
seems like Korean character count as 2, and there are 12 Korean characters and one SPACE in the string
so length = 12 * 2 + 1 = 25
so things got much complicate => to identify each character is double-type or single-byte, then count them together..
Jinni
If that's the case, then the LENB function description (link provided in my answer) does not work as described.
If DBCS is used, all characters should be counted as 2 bytes - space is explicitly considered to be a character. So it seems that the documentation lacks the exception for space counted as a single byte.
As for the SQL command to emulate this, I'd use the OCCURRENCES_REGEXPR here:
SELECT
(LENGTH(REPLACE('한글이어디까지 써지나보자', ' ', '')) * 2)
+ (OCCURRENCES_REGEXPR (' ' IN '한글이어디까지 써지나보자')) AS EXCEL_LENB
FROM DUMMY;
In fact, I'd probably write a scalar function to encapsulate this non-obvious behaviour and thereby reduce the average "WTF per line of code" factor for the SELECT statement:
create or replace function EXCEL_LENB(IN string NVARCHAR(5000))
returns LENB INT
as
begin
declare _SPACE constant NVARCHAR(1) := ' ';
LENB := (LENGTH(REPLACE(:string, :_SPACE, '')) * 2)
+ (OCCURRENCES_REGEXPR (:_SPACE IN :string));
end;
SELECT
EXCEL_LENB('한글이어디까지 써지나보자') AS EXCEL_LENB
FROM DUMMY;
Hi lars.breddemann and HyeSoo Yoon,
I guess not only SPACE count as 1 in excel LENB function....
what if a string mix Korean and English characters ? LENB("한글이어디까지 써지나보자abcde") = ?
would English character count as 2 or 1? and the numeric character? ex. '12345+-*%$='....
probably the scalar function will not be as simple as we image
Jinni
What should I do in this case?
SELECT (LENGTH(REPLACE('가나다라마 바사 아자차(10ml)', ' ', '')) * 2) + (OCCURRENCES_REGEXPR (' ' IN '가나다라마 바사 아자차(10ml)')) AS EXCEL_LENB FROM DUMMY;
The return Value is 34.
However, in the DATALENGTH function of MSSQL, it is 28.
It may not be easy to implement this as a DATALENGTH value.
Thanks.
hyesoo: you want to re-implement a proprietary function that does not work as documented and you ask what you should do? Either you find the correct specification and implement a custom function in HANA based on it or you take one step further back and look at why you need to have this function in HANA.
As already explained, the function does not represent the actual size of the string in HANA but only in Excel. Why do you need to have this information in your HANA system?
To be clear on this: this function does not give you the "true" memory requirement of a string, but how much memory Excel uses to encode the string internally. With a different encoding, this value changes.
So, why do you need to have a "string-size-in-Excel" function in HANA?
I know what you mean.
In the end, it is different from the byte length function used in Excel or MSSQL because the string encoding method is different.
The reason I raised this question, This is because of the interface with other databases, for example, Oracle.
When the length of a specific column in Oracle is 30 characters, HANA needs the logic to check that the column value of this interface target does not exceed 30 characters.
Because this column is the column in the item master, we need to add logic to B1's Transaction NOTI.
So I asked how to get the string byte length to implement this logic.
Thank you.
Now that's a completely different question then!
How different DBMS store data differs widely between setups. Your example, Oracle, allows for many different storage settings, including UTF versions (which are used with Unicode SAP systems). On top of that, the length() semantics change depending on whether a fixed-byte character set is used or not. For Unicode systems, length() commonly returns the number of characters - not bytes.
Finally, what specific logic do you believe you need to change in your BI solution because of this difference?
Anyhow - this question and the discussions are already at the limit of what can be done in this forum format. I recommend opening a new question addressing your actual problem.
maybe you can try this
SELECT LENGTH (to_binary('length in char')) as length FROM DUMMY;
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Another thread with similar discussion
https://archive.sap.com/discussions/thread/682964
Hope it helps
Thanks
Ashraf
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
83 | |
10 | |
10 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.