Skip to Content
avatar image
Former Member

HANA Text analysis Language detection

Hi Experts,

I have a scenario were I need to process a document Content.. In this Content there are german(majority) & english words.. Ideally the text index table $TA should result with the appropriate language for all these words.. But unfortunately it gives me only DE as the language Output for all Tokens.

eg. In a document with german Content, there are words like Sentiment, in, store ,etc which should be identified as english but instead they are identified as german..

I have created the text index with LANGUAGE DETECTION ('EN','DE')

Any info on this ? @Anthony Waite

-Avinash

Add comment
10|10000 characters needed characters exceeded

  • Follow
  • Get RSS Feed

2 Answers

  • Best Answer
    Apr 19, 2016 at 06:50 PM

    Hi Avinash,

    Please refer to the Language Detection Problems discussion for the openSAP: Text Analytics with SAP HANA Platform course.

    Note: you may need to register in order to see the post.

    Cheers

    Add comment
    10|10000 characters needed characters exceeded

  • Apr 19, 2016 at 09:57 PM

    Hi Avinash,

    I think currently there is no support for multi language texts as shown below:

    It may be choosing the primary language of the column and hence you are getting only DE

    Regards,

    Krishna Tangudu


    26.PNG (19.7 kB)
    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Yea Krishna,

      Looks like mixed language text is not supported.. It detects only a single language based on the Input..Should wait for future Releases 😊

      -Avinash