Hi SAP PA team,
I have been working on trying to model anomaly detection in SAP HANA and I found that SAP HANA PAL has provided a function "ANOMALY" which as per its documentation uses the K-Means with distance function(identifies furthest points as outliers).
However consider an example which fits more like a high width and low height ellipse. Hence PAL's ANOMALY function is likely to fail in this scenario.
True anomaly is red but closer to centroid, however yellow will be detected as anomaly since it is further from centroid X:
In this case even non-anomalous examples might fit at a far away distance( high probability) as compared to an anomalous one( low probability ) appearing very close to centroid, hence for anomaly detection we should prefer using a probability based model like GMM ( Gaussian mixture ) function over the ANOMALY function.
We would want some expert from the PA team( @Jayanta Roy Orla Cullen ), to shed more light on using "ANOMALY" function in this scenario and as to why has K-means with distance been implemented to identify anomalies( default algorithm ) rather than probability of presence.
Thanks,
Hasan Rafiq