Skip to Content
avatar image
Former Member

PAL ANOMALY DETECTION Structure/Architecture

I have another question towards the SAP PAL Anomaly Detection (AD) which is installed on HANA version1 in my case.

I have a column which holds data in an ordinal fashion. Such as Red = 1, Orange = 2 and Yellow = 3, in order to map the distances between the colours in one variable. However, for this variable/column the centres are defined as 0, which does not make any sense since i defined them in a numerical/ordinal fashion. Could it be that the AD filtered the variable for its characteristics?

In case this is an AD algorithm specific issue: To my undestanding the AD works as K-Means, could i simply use a KMeans which finds its number of clusters itself, and then use it as alternative? I would of course have to calculate the distances to their cluster centres but that is the topic of another thread.

Thanks in advance

Best regards

Nicholas

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Best Answer
    avatar image
    Former Member
    Nov 14, 2017 at 09:30 AM

    Hello,

    Your column containing colors is normally called categorical data in PAL.

    Normally it should be transfer to binary columns like (1, 0, 0) for Red, (0, 1, 0) for Orange and (0, 0, 1) for Yellow. The column number equals the category number, and each category makes exactly one column value 1 and leaves the rest value 0.

    Since AD dose not support category values, it have to be transferred manually.

    Or you can use K-means instead, as you mentioned. But you have to calculate the distance and identify anomaly points by hand after you get all clusters, maybe with a SQL or something.

    Best Regards

    Zee

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member Former Member

      Hello,

      Well, the reason is simple.

      AD does not recognize your ordering, it simply treats it as an numerical variable.

      Kmeans does not recognize your ordering either.

      In this condition I suggest make ordering start with 0 which still holds your ordering and similarity.

      Then categorize the result to the nearest one if it returns an float number.

      Hope it helps.

      Best Regards

      Zee