Skip to Content
0

How can we detect Language of a String using Language Detection parameter in HANA Text Analysis?

Apr 10 at 06:27 PM

48

avatar image

Hi

I have tried using Language Detection parameter while Text Analysis to detect the language of a string. But It could not detect languages properly. Say, When I tried Chinese, It is considered as Japanese even when Chinese is supported by HANA.

Can you please suggest better way to derive language for the documents present in a table?

create fulltext index LANG_TEST_INDEX_TA_DOC2 on LANG_TEST(DOCUMENT)
CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER'
LANGUAGE DETECTION ('EN','DE','JA','ZF','ZH')
TEXT ANALYSIS ON;

Thanks and Regards

Srujan Gannamaneni

10 |10000 characters needed characters left characters exceeded

You may want to add example texts that lead to the misidentification. Best would be to provide a complete test case including the table structure, the data and your queries.

0

Hi Lars,

Please find the steps and respective output below.

create COLUMN table lang_test
(
id integer primary key,
document nvarchar(1000)
);

create fulltext index LANG_TEST_INDEX_TA_DOC2 on LANG_TEST(DOCUMENT)
CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER'
LANGUAGE DETECTION ('EN','ZF','ZH','JA')
TEXT ANALYSIS ON;

----English----
INSERT INTO lang_test
(ID,DOCUMENT)
VALUES
(1,'I am Speaking English');

--- Hindi--- Not supported
INSERT INTO lang_test2
(ID,DOCUMENT)
VALUES
(2,'मैं हिंदी बोल रहा हूँ');

----German----
INSERT INTO lang_test2
(ID,DOCUMENT)
VALUES
(3,'Ich spreche Deutsch');

---Japanese---
INSERT INTO lang_test2
(ID,DOCUMENT)
VALUES
(4,'私は日本語を話している');

----Korean---
INSERT INTO lang_test2
(ID,DOCUMENT)
VALUES
(5,'나는 한국어로 말하고있다.');

-----Simplied Chinese---
INSERT INTO lang_test2
(ID,DOCUMENT)
VALUES
(6,'我在说简体中文');

----Traditional Chinese----
INSERT INTO lang_test2
(ID,DOCUMENT)
VALUES
(7,'我說的是繁體中文');

select * from "$TA_LANG_TEST_INDEX_TA_DOC2"

It is not recognizing any other language apart from the language which is passed first in to Language Detection parameter.
So I tried just by selecting one Language, Say Japanese

It has shown Hindi and Chinese to be Japanese.

Please suggest in case if I have missed anything.

Thanks and Regards

Srujan Gannamaneni

0
* Please Login or Register to Answer, Follow or Comment.

0 Answers