Skip to Content
0

PAL Anomaly Detection Score/Distance Value

Nov 08, 2017 at 03:42 PM

69

avatar image
Former Member

I am currently using the PAL to run the Anomaly Detection natively on HANA v1 in a couple of instances, however, i am having some difficulties understanding some of the output and necessary input. My questions should be easy for someone who knows their way around.

(1) What is the formal definition of what is called 'Score' as Output in the 'Statistics Table' from the Anomaly Detection. I understand that using an euclidian distance function, the Score will be the distance from the observation to its respective local cluster centre, as i defined local cluster outlier. But I tried reconstruction of the 'SCORE' by euclidian distance, but it seems as if the Score was errected using a weighting matrix. If so how was the weighting matrix used?

My config for ANOMALY DETECTION:

INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('GROUP_NUMBER',,null,null); -> Default
INSERT INTO #PAL_CONTROL_TBL VALUES ('INIT_TYPE',4,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('DISTANCE_LEVEL',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MAX_ITERATION',100,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('NORMALIZATION',2,null,null);

INSERT INTO #PAL_CONTROL_TBL VALUES ('OUTLIER_DEFINE',1,null,null);
Everything else default as in PAL Documentation

Best Regards

Nicholas

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

1 Answer

Best Answer
avatar image
Former Member
Nov 14, 2017 at 09:17 AM
1

Hello,

"Score" is defined as distance to its cluster center or total distance to all centers depending on the parameter OUTLIER_DEFINE.

The results in example of PAL document can be reconstruct as well.

If you come across a situation otherwise, could you please give a full SQL script so we can reproduce it?

Best Regards

Zee

Show 2 Share
10 |10000 characters needed characters left characters exceeded
Former Member

Hi Zee,

thanks for your reply. You are right it is indeed defined as the distance to the cluster it was assigned to, however I could not reconstruct the distance score as configurated (euclidian and normalized). What did i do: normalize the results data table, construct the differences using the centres table. From this i constructed the pythagorean distance. Unfortunately, my score does not fit the PAL score.

Maybe you have another idea?

Thanks again for your effort.

Best regards

Nicholas

0
Former Member
Former Member

Hi Nicholas,

My guess is that some of your columns of center table are typed as integer.

Integer or not, internally it always use double type to calculate the centers and score/distance, but convert it to integer when output to the center table since some columns are integer. So there may be some precision loss in center table, in such case.

May I suggest that to use all double columns in output center table and try your case again.

In case my guess is wrong could you provide a complete example? Including a test data set and full parameters.

Best Regards

Zee

0