Skip to Content

How can we detect Language of a String using Language Detection parameter in HANA Text Analysis?

Hi

I have tried using Language Detection parameter while Text Analysis to detect the language of a string. But It could not detect languages properly. Say, When I tried Chinese, It is considered as Japanese even when Chinese is supported by HANA.

Can you please suggest better way to derive language for the documents present in a table?

create fulltext index LANG_TEST_INDEX_TA_DOC2 on LANG_TEST(DOCUMENT)
CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER'
LANGUAGE DETECTION ('EN','DE','JA','ZF','ZH')
TEXT ANALYSIS ON;

Thanks and Regards

Srujan Gannamaneni

Add comment
10|10000 characters needed characters exceeded

  • You may want to add example texts that lead to the misidentification. Best would be to provide a complete test case including the table structure, the data and your queries.

  • Hi Lars,

    Please find the steps and respective output below.

    create COLUMN table lang_test
    (
    id integer primary key,
    document nvarchar(1000)
    );
    
    create fulltext index LANG_TEST_INDEX_TA_DOC2 on LANG_TEST(DOCUMENT)
    CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER'
    LANGUAGE DETECTION ('EN','ZF','ZH','JA')
    TEXT ANALYSIS ON;
    
    ----English----
    INSERT INTO lang_test
    (ID,DOCUMENT)
    VALUES
    (1,'I am Speaking English');
    
    --- Hindi--- Not supported
    INSERT INTO lang_test2
    (ID,DOCUMENT)
    VALUES
    (2,'मैं हिंदी बोल रहा हूँ');
    
    ----German----
    INSERT INTO lang_test2
    (ID,DOCUMENT)
    VALUES
    (3,'Ich spreche Deutsch');
    
    ---Japanese---
    INSERT INTO lang_test2
    (ID,DOCUMENT)
    VALUES
    (4,'私は日本語を話している');
    
    ----Korean---
    INSERT INTO lang_test2
    (ID,DOCUMENT)
    VALUES
    (5,'나는 한국어로 말하고있다.');
    
    -----Simplied Chinese---
    INSERT INTO lang_test2
    (ID,DOCUMENT)
    VALUES
    (6,'我在说简体中文');
    
    ----Traditional Chinese----
    INSERT INTO lang_test2
    (ID,DOCUMENT)
    VALUES
    (7,'我說的是繁體中文');
    
    select * from "$TA_LANG_TEST_INDEX_TA_DOC2"

    It is not recognizing any other language apart from the language which is passed first in to Language Detection parameter.
    So I tried just by selecting one Language, Say Japanese

    It has shown Hindi and Chinese to be Japanese.

    Please suggest in case if I have missed anything.

    Thanks and Regards

    Srujan Gannamaneni

  • Get RSS Feed

0 Answers