Hi
I have tried using Language Detection parameter while Text Analysis to detect the language of a string. But It could not detect languages properly. Say, When I tried Chinese, It is considered as Japanese even when Chinese is supported by HANA.
Can you please suggest better way to derive language for the documents present in a table?
create fulltext index LANG_TEST_INDEX_TA_DOC2 on LANG_TEST(DOCUMENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' LANGUAGE DETECTION ('EN','DE','JA','ZF','ZH') TEXT ANALYSIS ON;
Thanks and Regards
Srujan Gannamaneni
You may want to add example texts that lead to the misidentification. Best would be to provide a complete test case including the table structure, the data and your queries.
Hi Lars,
Please find the steps and respective output below.
create COLUMN table lang_test ( id integer primary key, document nvarchar(1000) ); create fulltext index LANG_TEST_INDEX_TA_DOC2 on LANG_TEST(DOCUMENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' LANGUAGE DETECTION ('EN','ZF','ZH','JA') TEXT ANALYSIS ON; ----English---- INSERT INTO lang_test (ID,DOCUMENT) VALUES (1,'I am Speaking English'); --- Hindi--- Not supported INSERT INTO lang_test2 (ID,DOCUMENT) VALUES (2,'मैं हिंदी बोल रहा हूँ'); ----German---- INSERT INTO lang_test2 (ID,DOCUMENT) VALUES (3,'Ich spreche Deutsch'); ---Japanese--- INSERT INTO lang_test2 (ID,DOCUMENT) VALUES (4,'私は日本語を話している'); ----Korean--- INSERT INTO lang_test2 (ID,DOCUMENT) VALUES (5,'나는 한국어로 말하고있다.'); -----Simplied Chinese--- INSERT INTO lang_test2 (ID,DOCUMENT) VALUES (6,'我在说简体中文'); ----Traditional Chinese---- INSERT INTO lang_test2 (ID,DOCUMENT) VALUES (7,'我說的是繁體中文'); select * from "$TA_LANG_TEST_INDEX_TA_DOC2"
It is not recognizing any other language apart from the language which is passed first in to Language Detection parameter.
So I tried just by selecting one Language, Say Japanese
It has shown Hindi and Chinese to be Japanese.
Please suggest in case if I have missed anything.
Thanks and Regards
Srujan Gannamaneni