Skip to Content

​Using partitioning with spatial clustering function

Hi,

I’m trying to find spatial clusters for each user in data set with anonymized users tracks.

The sample of dataset:

SHAPE ID ACCURACY RESPONDENTIDSTR EVENT_TIME

01E9030000000046251AD642400000F256A7F14B400000000000000000 44903950 50 id_31688 2017-03-09 07:03:21

01E903000000006C7473ED42400000FE0C6FDE4B400000000000000000 37169955 50 id_40436 2017-02-08 02:25:24

01E90300000000445EC5ED424000000A2C80DE4B400000000000000000 37169956 50 id_40436 2017-02-08 02:27:25

01E903000000006C7473ED42400000FE0C6FDE4B400000000000000000 37169957 50 id_40436 2017-02-08 02:36:31

01E903000000004A1D89DF42400000E01C09CF4B400000000000000000 37245390 50 id_94313 2017-02-08 12:29:44

01E90300000000B24577ED42400000A6576FDE4B400000000000000000 37245607 50 id_40436 2017-02-08 11:17:18

01E90300000000F6E8579248400000C8E64FE44B400000000000000000 37250321 50 id_63311 2017-02-08 10:04:49

So I use RANK ( ) OVER to divide dataset into partitions by user id to which apply the DBSCAN clustering function.

Like in the ST_ClusterID() (CLUSTER BY LOCATION ...) and RANK() (CLUSTER BY LOCATION ... ORDER BY REVENUE) sample in SAP HANA Use Cases for Spatial Clustering http://bit.ly/2h1cCsL

But unfortunately this query does not work as expected and apply DBSCAN clustering for entire dataset.

For example for RESPONDENTIDSTR='id_40436' when I’m using RANK() OVER for all dataset it returns 25 unique clusters id

SELECT clust.cluster_dbsacan_id, ST_EnvelopeAggr(clust.SHAPE) AS SHAPE, clust.RESPONDENTIDSTR, COUNT(*) AS Panel_count

from(

SELECT OBJECTID, SHAPE, ACCURACY, EVENT_TIME, RESPONDENTIDSTR,

RANK() OVER (PARTITION BY RESPONDENTID ORDER BY EVENT_TIME) AS ranking,

ST_ClusterID() OVER (CLUSTER BY SHAPE USING DBSCAN EPS 0.0005 MINPTS 15) AS cluster_dbsacan_id

FROM GISDEV.TIBURON_30DAYS

WHERE ACCURACY < 100 ) clust

GROUP BY clust.cluster_dbsacan_id, clust.RESPONDENTIDSTR

HAVING clust.cluster_dbsacan_id>0 AND RESPONDENTIDSTR='id_40436'

ORDER BY clust.cluster_dbsacan_id

But when I’m applying using RANK() OVER for records containing only RESPONDENTIDSTR='id_40436' it returns 4 unique clusters id.

SELECT clust.cluster_dbsacan_id, ST_EnvelopeAggr(clust.SHAPE) AS SHAPE, clust.RESPONDENTIDSTR, COUNT(*) AS Panel_count

from(

SELECT OBJECTID, SHAPE, ACCURACY, EVENT_TIME, RESPONDENTIDSTR,

RANK() OVER (PARTITION BY RESPONDENTID ORDER BY EVENT_TIME) AS ranking,

ST_ClusterID() OVER (CLUSTER BY SHAPE USING DBSCAN EPS 0.0005 MINPTS 15) AS cluster_dbsacan_id

FROM GISDEV.TIBURON_30DAYS

WHERE ACCURACY < 100 AND RESPONDENTIDSTR='id_40436') clust

GROUP BY clust.cluster_dbsacan_id, clust.RESPONDENTIDSTR

HAVING clust.cluster_dbsacan_id>0 AND RESPONDENTIDSTR='id_40436'

ORDER BY clust.cluster_dbsacan_id

How do I write sql, so it apply DBSCAN clustering to each unique RESPONDENTIDSTR selection separately.

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

0 Answers