Skip to Content

Support vector ranking with Data Partition in PA 3.2

Nov 13, 2017 at 09:15 AM


avatar image

Hi All,

We are trying to build a "support vector ranking" model for a ranking case in PA 3.2 connected to HANA DB.

Before trying the same, we have a general question around the Hana Partition function available in PA 3.2, so that we can also get model statistics and model comparison function.

Currently we see that this function has only random or sequential option, however in case of "support vector ranking" random split should be on the basis of query id column and not just random splits so that grouping ID column is taken into account. However during the configuration of the HANA partition step this information is not asked from the user in case of SVR.

So we wanted to ask does the HANA partition step currently handles the SVR algorithm's specific requirement of data split as per groups for training / validation etc ?

Looping Jayanta Roy , Antoine CHABERT for quick reference and guidance.



10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

2 Answers

Best Answer
Jayanta Roy
Nov 28, 2017 at 01:04 PM

Hi Hasan,

Just so everyone else watching this thread are updated, the problem you described is a valid issue and since the existing partition node in Expert Analytics will slice the data either randomly or stratified based on a feature, won't work in your scenario. Unfortunately, at this time, I am unable to suggest a workaround but I'll add this to the team's backlog so we support this kind of partitioning in future.



10 |10000 characters needed characters left characters exceeded
Nov 17, 2017 at 08:40 AM

Hello Hasan, I am not very familiar with SVR. However, looking at the doc, it seems to me that what you are looking for might be covered by the Stratified Partition option. Can you please check at

and revert?

Kind regards


Show 1 Share
10 |10000 characters needed characters left characters exceeded

Hi Antoine,

Thanks a lot my friend for your answer!

Stratified sampling won't fit in this case. SVR algorithm takes as input a set of records identified by a column called as group ID. So there are multiple groups, inside which records are ranked per group.

So here there should probably be a strategy where out of all data 70% groups( using group id ) are taken as train and 10% validation, 20% test. That ways group's internal ranking label will not be lost for validation metrics calculation.

Stratified sampling on other hand, helps in giving a balanced data set in case of distributed multi class population.

Edit: Just to add further, even the validation metrics fit for ranking algorithms are not MAPE, MSE, RMSE etc. Ranking algorithms usually use search engine ranking metrics like: NDCG( ). Please see if this is covered in the HANA model statistics and model compare functions ?

Let me know your thoughts !