cancel
Showing results for 
Search instead for 
Did you mean: 

How to get Datalength in HANA ?

0 Kudos

HI.

I want to get the string length as BYTE in HANA.

HANA seems to have no function like Datalength in MSSQL.

How can I get the length of a string in bytes?

Thanks.

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor

This one is a bit tricky, as the behaviour of the LENB function in Excel is "weird".

The idea is that the function would return the number of bytes used for an input string. That already tells us, that this is about how Excel (!) stores the strings - not about strings per se.

So, for LENB the deal is: if the default character set of the operating system (!) supports double-byte characters (DBCS) then the LENB will return 2 times the number of characters in the string as the number of bytes.

If the default character set of the OS does not support DBCS LENB behaves like LEN, so for each character in the string, only one byte is counted.

See LEN, LENB functions for reference.

Now, the problem here is that this has not much to do with how HANA (or other DBMS) store bytes.

HANA uses CESU-8 string encoding which allows for storing only the significant bytes (e.g. characters from the ASCII range don't need more than 1 byte, while characters with larger Unicode point numbers, like the Korean(?) ones you gave as an example, take two or three bytes).

The point here is that by using the

LENGTH(to_binary (... )) 

approach you get the actual number of bytes used in CESU-8 to represent the characters.

Two differences here between Excel and HANA:
Excel doesn't use CESU-8 but the DBCS character encoding scheme of the OS.
Due to that, the byte-counting functions for strings are different: CESU-8 can have between 1 and 3 bytes and need to be checked for each character while the Excel version simply assumes two bytes for each character in the string.

Obviously, if you want to emulate the Excel behaviour in HANA, just use

LENGTH (...) *2 as EXCEL_LENB

and you're done.

0 Kudos

Thank you for your sincere reply.

bdc_erp
Participant

Hi lars.breddemann ,

I don't think LENGTH(...)*2 would emulate the excel behaviour,

the result hyesoo provide is 25~

seems like Korean character count as 2, and there are 12 Korean characters and one SPACE in the string

so length = 12 * 2 + 1 = 25

so things got much complicate => to identify each character is double-type or single-byte, then count them together..

Jinni

0 Kudos

So, the SQL Statement is...

SELECT LENGTH(REPLACE('한글이어디까지 써지나보자', ' ', '')) * 2 + (LENGTH('한글이어디까지 써지나보자') - LENGTH(REPLACE('한글이어디까지 써지나보자', ' ', ''))) FROM DUMMY;

Thank you so much.

lbreddemann
Active Contributor

If that's the case, then the LENB function description (link provided in my answer) does not work as described.

If DBCS is used, all characters should be counted as 2 bytes - space is explicitly considered to be a character. So it seems that the documentation lacks the exception for space counted as a single byte.

As for the SQL command to emulate this, I'd use the OCCURRENCES_REGEXPR here:

SELECT 
         (LENGTH(REPLACE('한글이어디까지 써지나보자', ' ', '')) * 2)
       + (OCCURRENCES_REGEXPR (' ' IN '한글이어디까지 써지나보자'))  AS EXCEL_LENB   
 FROM DUMMY;

In fact, I'd probably write a scalar function to encapsulate this non-obvious behaviour and thereby reduce the average "WTF per line of code" factor for the SELECT statement:

create or replace function EXCEL_LENB(IN string NVARCHAR(5000))
 returns LENB INT
 as
 begin
 declare _SPACE constant NVARCHAR(1) := ' ';
 
   LENB := (LENGTH(REPLACE(:string, :_SPACE, '')) * 2)
         + (OCCURRENCES_REGEXPR (:_SPACE IN :string));
 end;
 
 SELECT 
         EXCEL_LENB('한글이어디까지 써지나보자') AS EXCEL_LENB   
 FROM DUMMY;

bdc_erp
Participant
0 Kudos

Hi lars.breddemann and HyeSoo Yoon,

I guess not only SPACE count as 1 in excel LENB function....

what if a string mix Korean and English characters ? LENB("한글이어디까지 써지나보자abcde") = ?

would English character count as 2 or 1? and the numeric character? ex. '12345+-*%$='....

probably the scalar function will not be as simple as we image

Jinni

0 Kudos

What should I do in this case?

SELECT (LENGTH(REPLACE('가나다라마 바사 아자차(10ml)', ' ', '')) * 2) + (OCCURRENCES_REGEXPR (' ' IN '가나다라마 바사 아자차(10ml)')) AS EXCEL_LENB FROM DUMMY;

The return Value is 34.

However, in the DATALENGTH function of MSSQL, it is 28.

It may not be easy to implement this as a DATALENGTH value.

Thanks.

lbreddemann
Active Contributor
0 Kudos

hyesoo: you want to re-implement a proprietary function that does not work as documented and you ask what you should do? Either you find the correct specification and implement a custom function in HANA based on it or you take one step further back and look at why you need to have this function in HANA.

As already explained, the function does not represent the actual size of the string in HANA but only in Excel. Why do you need to have this information in your HANA system?

To be clear on this: this function does not give you the "true" memory requirement of a string, but how much memory Excel uses to encode the string internally. With a different encoding, this value changes.

So, why do you need to have a "string-size-in-Excel" function in HANA?

0 Kudos

I know what you mean.

In the end, it is different from the byte length function used in Excel or MSSQL because the string encoding method is different.

The reason I raised this question, This is because of the interface with other databases, for example, Oracle.

When the length of a specific column in Oracle is 30 characters, HANA needs the logic to check that the column value of this interface target does not exceed 30 characters.

Because this column is the column in the item master, we need to add logic to B1's Transaction NOTI.

So I asked how to get the string byte length to implement this logic.

Thank you.

lbreddemann
Active Contributor
0 Kudos

Now that's a completely different question then!

How different DBMS store data differs widely between setups. Your example, Oracle, allows for many different storage settings, including UTF versions (which are used with Unicode SAP systems). On top of that, the length() semantics change depending on whether a fixed-byte character set is used or not. For Unicode systems, length() commonly returns the number of characters - not bytes.

Finally, what specific logic do you believe you need to change in your BI solution because of this difference?
Anyhow - this question and the discussions are already at the limit of what can be done in this forum format. I recommend opening a new question addressing your actual problem.

Answers (2)

Answers (2)

bdc_erp
Participant
0 Kudos

maybe you can try this

SELECT LENGTH (to_binary('length in char')) as length FROM DUMMY;
0 Kudos

Thanks, but byte length and the binary length are different.

BINARY Length return 37, but the actual byte length is 25 in Excel.

lbreddemann
Active Contributor
0 Kudos

@hyesoo As you compare this to some function in MS Excel, please share what function(s) you use on the input data.

0 Kudos

I use function LENB() in Excel ...

And... the return Value is 25

Thanks.

former_member456023
Contributor
0 Kudos

Another thread with similar discussion

https://archive.sap.com/discussions/thread/682964

Hope it helps

Thanks
Ashraf

0 Kudos

Thank you, but I need to write an SQL statement that can get the length of the bytes in the string from the HANA DB.