Skip to Content

Support vector ranking with Data Partition in PA 3.2

Hi All,

We are trying to build a "support vector ranking" model for a ranking case in PA 3.2 connected to HANA DB.

Before trying the same, we have a general question around the Hana Partition function available in PA 3.2, so that we can also get model statistics and model comparison function.

Currently we see that this function has only random or sequential option, however in case of "support vector ranking" random split should be on the basis of query id column and not just random splits so that grouping ID column is taken into account. However during the configuration of the HANA partition step this information is not asked from the user in case of SVR.

So we wanted to ask does the HANA partition step currently handles the SVR algorithm's specific requirement of data split as per groups for training / validation etc ?

Looping Jayanta Roy , Antoine CHABERT for quick reference and guidance.

Thanks,

Hasan

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

2 Answers

  • Best Answer
    Nov 28, 2017 at 01:04 PM

    Hi Hasan,

    Just so everyone else watching this thread are updated, the problem you described is a valid issue and since the existing partition node in Expert Analytics will slice the data either randomly or stratified based on a feature, won't work in your scenario. Unfortunately, at this time, I am unable to suggest a workaround but I'll add this to the team's backlog so we support this kind of partitioning in future.

    Regards,

    Jayant

    Add comment
    10|10000 characters needed characters exceeded

  • Nov 17, 2017 at 08:40 AM

    Hello Hasan, I am not very familiar with SVR. However, looking at the doc, it seems to me that what you are looking for might be covered by the Stratified Partition option. Can you please check at

    https://help.sap.com/viewer/94dbf2ba9d4047618880187451c3b253/3.3/en-US/1b1a1293d76f484c8ea127793fb3963f.html

    and revert?

    Kind regards

    Antoine

    Add comment
    10|10000 characters needed characters exceeded

    • Hi Antoine,

      Thanks a lot my friend for your answer!

      Stratified sampling won't fit in this case. SVR algorithm takes as input a set of records identified by a column called as group ID. So there are multiple groups, inside which records are ranked per group.

      So here there should probably be a strategy where out of all data 70% groups( using group id ) are taken as train and 10% validation, 20% test. That ways group's internal ranking label will not be lost for validation metrics calculation.

      Stratified sampling on other hand, helps in giving a balanced data set in case of distributed multi class population.

      Edit: Just to add further, even the validation metrics fit for ranking algorithms are not MAPE, MSE, RMSE etc. Ranking algorithms usually use search engine ranking metrics like: NDCG( https://en.wikipedia.org/wiki/Discounted_cumulative_gain ). Please see if this is covered in the HANA model statistics and model compare functions ?

      Let me know your thoughts !

      Thanks,

      Hasan