on 11-08-2017 3:42 PM
I am currently using the PAL to run the Anomaly Detection natively on HANA v1 in a couple of instances, however, i am having some difficulties understanding some of the output and necessary input. My questions should be easy for someone who knows their way around.
(1) What is the formal definition of what is called 'Score' as Output in the 'Statistics Table' from the Anomaly Detection. I understand that using an euclidian distance function, the Score will be the distance from the observation to its respective local cluster centre, as i defined local cluster outlier. But I tried reconstruction of the 'SCORE' by euclidian distance, but it seems as if the Score was errected using a weighting matrix. If so how was the weighting matrix used?
My config for ANOMALY DETECTION:
INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('GROUP_NUMBER',,null,null); -> Default
INSERT INTO #PAL_CONTROL_TBL VALUES ('INIT_TYPE',4,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('DISTANCE_LEVEL',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MAX_ITERATION',100,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('NORMALIZATION',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('OUTLIER_DEFINE',1,null,null);
Everything else default as in PAL Documentation
Best Regards
Nicholas
Hello,
"Score" is defined as distance to its cluster center or total distance to all centers depending on the parameter OUTLIER_DEFINE.
The results in example of PAL document can be reconstruct as well.
If you come across a situation otherwise, could you please give a full SQL script so we can reproduce it?
Best Regards
Zee
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Zee,
thanks for your reply. You are right it is indeed defined as the distance to the cluster it was assigned to, however I could not reconstruct the distance score as configurated (euclidian and normalized). What did i do: normalize the results data table, construct the differences using the centres table. From this i constructed the pythagorean distance. Unfortunately, my score does not fit the PAL score.
Maybe you have another idea?
Thanks again for your effort.
Best regards
Nicholas
Hi Nicholas,
My guess is that some of your columns of center table are typed as integer.
Integer or not, internally it always use double type to calculate the centers and score/distance, but convert it to integer when output to the center table since some columns are integer. So there may be some precision loss in center table, in such case.
May I suggest that to use all double columns in output center table and try your case again.
In case my guess is wrong could you provide a complete example? Including a test data set and full parameters.
Best Regards
Zee
User | Count |
---|---|
86 | |
10 | |
10 | |
9 | |
6 | |
6 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.