cancel
Showing results for 
Search instead for 
Did you mean: 

Support vector ranking with Data Partition in PA 3.2

former_member186543
Active Contributor
0 Kudos

Hi All,

We are trying to build a "support vector ranking" model for a ranking case in PA 3.2 connected to HANA DB.

Before trying the same, we have a general question around the Hana Partition function available in PA 3.2, so that we can also get model statistics and model comparison function.

Currently we see that this function has only random or sequential option, however in case of "support vector ranking" random split should be on the basis of query id column and not just random splits so that grouping ID column is taken into account. However during the configuration of the HANA partition step this information is not asked from the user in case of SVR.

So we wanted to ask does the HANA partition step currently handles the SVR algorithm's specific requirement of data split as per groups for training / validation etc ?

Looping jayanta.roy , antoine.chabert for quick reference and guidance.

Thanks,

Hasan

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi Hasan,

Just so everyone else watching this thread are updated, the problem you described is a valid issue and since the existing partition node in Expert Analytics will slice the data either randomly or stratified based on a feature, won't work in your scenario. Unfortunately, at this time, I am unable to suggest a workaround but I'll add this to the team's backlog so we support this kind of partitioning in future.

Regards,

Jayant

Answers (1)

Answers (1)

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hello Hasan, I am not very familiar with SVR. However, looking at the doc, it seems to me that what you are looking for might be covered by the Stratified Partition option. Can you please check at

https://help.sap.com/viewer/94dbf2ba9d4047618880187451c3b253/3.3/en-US/1b1a1293d76f484c8ea127793fb39...

and revert?

Kind regards

Antoine

former_member186543
Active Contributor
0 Kudos

Hi Antoine,

Thanks a lot my friend for your answer!

Stratified sampling won't fit in this case. SVR algorithm takes as input a set of records identified by a column called as group ID. So there are multiple groups, inside which records are ranked per group.

So here there should probably be a strategy where out of all data 70% groups( using group id ) are taken as train and 10% validation, 20% test. That ways group's internal ranking label will not be lost for validation metrics calculation.

Stratified sampling on other hand, helps in giving a balanced data set in case of distributed multi class population.

Edit: Just to add further, even the validation metrics fit for ranking algorithms are not MAPE, MSE, RMSE etc. Ranking algorithms usually use search engine ranking metrics like: NDCG( https://en.wikipedia.org/wiki/Discounted_cumulative_gain ). Please see if this is covered in the HANA model statistics and model compare functions ?

Let me know your thoughts !

Thanks,

Hasan