cancel
Showing results for 
Search instead for 
Did you mean: 

PAL Anomaly Detection Score/Distance Value

Former Member
0 Kudos

I am currently using the PAL to run the Anomaly Detection natively on HANA v1 in a couple of instances, however, i am having some difficulties understanding some of the output and necessary input. My questions should be easy for someone who knows their way around.

(1) What is the formal definition of what is called 'Score' as Output in the 'Statistics Table' from the Anomaly Detection. I understand that using an euclidian distance function, the Score will be the distance from the observation to its respective local cluster centre, as i defined local cluster outlier. But I tried reconstruction of the 'SCORE' by euclidian distance, but it seems as if the Score was errected using a weighting matrix. If so how was the weighting matrix used?

My config for ANOMALY DETECTION:

INSERT INTO #PAL_CONTROL_TBL VALUES ('THREAD_NUMBER',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('GROUP_NUMBER',,null,null); -> Default
INSERT INTO #PAL_CONTROL_TBL VALUES ('INIT_TYPE',4,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('DISTANCE_LEVEL',2,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('MAX_ITERATION',100,null,null);
INSERT INTO #PAL_CONTROL_TBL VALUES ('NORMALIZATION',2,null,null);

INSERT INTO #PAL_CONTROL_TBL VALUES ('OUTLIER_DEFINE',1,null,null);
Everything else default as in PAL Documentation

Best Regards

Nicholas

Accepted Solutions (1)

Accepted Solutions (1)

Hello,

"Score" is defined as distance to its cluster center or total distance to all centers depending on the parameter OUTLIER_DEFINE.

The results in example of PAL document can be reconstruct as well.

If you come across a situation otherwise, could you please give a full SQL script so we can reproduce it?

Best Regards

Zee

Former Member
0 Kudos

Hi Zee,

thanks for your reply. You are right it is indeed defined as the distance to the cluster it was assigned to, however I could not reconstruct the distance score as configurated (euclidian and normalized). What did i do: normalize the results data table, construct the differences using the centres table. From this i constructed the pythagorean distance. Unfortunately, my score does not fit the PAL score.

Maybe you have another idea?

Thanks again for your effort.

Best regards

Nicholas

0 Kudos

Hi Nicholas,

My guess is that some of your columns of center table are typed as integer.

Integer or not, internally it always use double type to calculate the centers and score/distance, but convert it to integer when output to the center table since some columns are integer. So there may be some precision loss in center table, in such case.

May I suggest that to use all double columns in output center table and try your case again.

In case my guess is wrong could you provide a complete example? Including a test data set and full parameters.

Best Regards

Zee

Answers (0)