Skip to Content
avatar image
Former Member

PAL Anomaly Detection Score/Distance Value

I am currently using the PAL to run the Anomaly Detection natively on HANA v1 in a couple of instances, however, i am having some difficulties understanding some of the output and necessary input. My questions should be easy for someone who knows their way around.

(1) What is the formal definition of what is called 'Score' as Output in the 'Statistics Table' from the Anomaly Detection. I understand that using an euclidian distance function, the Score will be the distance from the observation to its respective local cluster centre, as i defined local cluster outlier. But I tried reconstruction of the 'SCORE' by euclidian distance, but it seems as if the Score was errected using a weighting matrix. If so how was the weighting matrix used?

My config for ANOMALY DETECTION:

INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('GROUP_NUMBER',,null,null); -> Default
INSERT INTO #PAL_CONTROL_TBL VALUES ('INIT_TYPE',4,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('DISTANCE_LEVEL',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MAX_ITERATION',100,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('NORMALIZATION',2,null,null);

INSERT INTO #PAL_CONTROL_TBL VALUES ('OUTLIER_DEFINE',1,null,null);
Everything else default as in PAL Documentation

Best Regards

Nicholas

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Best Answer
    avatar image
    Former Member
    Nov 14, 2017 at 09:17 AM

    Hello,

    "Score" is defined as distance to its cluster center or total distance to all centers depending on the parameter OUTLIER_DEFINE.

    The results in example of PAL document can be reconstruct as well.

    If you come across a situation otherwise, could you please give a full SQL script so we can reproduce it?

    Best Regards

    Zee

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member Former Member

      Hi Nicholas,

      My guess is that some of your columns of center table are typed as integer.

      Integer or not, internally it always use double type to calculate the centers and score/distance, but convert it to integer when output to the center table since some columns are integer. So there may be some precision loss in center table, in such case.

      May I suggest that to use all double columns in output center table and try your case again.

      In case my guess is wrong could you provide a complete example? Including a test data set and full parameters.

      Best Regards

      Zee