Hello experts,
Could you'll please help me understand whether there is a way to specify the number of clusters expected before running the denstream clustering algorithm in Smart Data Streaming?
How and where is this setting configured?
Thank you.
The purpose of the Denstream Clustering algorithm is to be able to feed in a stream of data and determine whether or not there are any clusters. If there are clusters, then the algorithm will report the clusters identified by a center, a radius and a weight which is essentially the count of events that fit in the cluster minus events that have aged out.
As an unsupervised learning algorithm, a Denstream Clustering model does not start with a set of known (or trained) clusters, so no, you can not specify an explicit number of clusters ahead of time. You can however set a maximum # of clusters expected using the "Max. Number of Categories" parameter. The example in the documentation uses a field with binary values such as male/female or true/false in which case you expect a maximum of 2 clusters since there are only 2 expected values.
Model Properties and Parameters Reference
Coming back to the idea that the purpose of the Denstream Cluster algorithm is to identify previously unknown clusters, the other key tuning parameters are the Epsilon, Beta, Mu and Lambda parameters that determine how wide an area may be considered a single cluster, how far out an individual event has to be in order to be considered an outlier, when to start identifying micro-clusters and how fast to age out old events from the calculation.
Thank you, Robert. That helped.