Skip to Content

Delegation restrictions with SAP PA 3.1 when delegating to Native Spark/Hadoop

Team

Current with delegation we have the following restrictions with APL

Restriction

Cases when model training is not delegated to APL:

  • In the Recommendation and Social Analysis modules.
  • When the model uses a custom cutting strategy.
  • When the model uses the option to compute a decision tree.

1) Will the same restrictions apply to delegation with Native Spark as well . ?

2) Why would a cutting strategy affect delegation ....if I would want to manually suggest a cutting strategy (70% Training 20% Validation 10% Testing)

3) If I suggest a manual cutting strategy will all the data be replicated to the PA server ?

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

2 Answers

  • Best Answer
    Dec 06, 2016 at 07:43 AM

    Hi, first thing first, let's not confuse APL delegation (HANA world) and Native Spark Modeling (Hadoop/Spark world) ;-)

    The reference for Native Spark Modeling restrictions is the SAP note https://launchpad.support.sap.com/#/notes/2391541. This answers 1)/.

    2) this is a current restriction where APL delegation does not kick in. I do not have the detailed answer at hand but I can involve the appropriate product manager, if need be. Can you please explain why this is a concern?

    3) when there are restrictions related to APL delegation (HANA world) and Native Spark Modeling (Hadoop/Spark world) it does not mean that the data is replicated, it is in fact queried from the underlying database or data lake and transferred for processing to the PA server (or desktop if desktop is used). The real bottleneck for predictive projects is not necessarily the creation of the predictive model, but rather the scoring of new data rows. For this one, this is purely processed in-database.

    I hope this helps, thanks & regards Antoine

    Add comment
    10|10000 characters needed characters exceeded

    • Thank you Antoine for the detailed explanation .

      1) Since the data is queried we are still bringing data over to PA server for processing and if I am querying more that a million rows this would definitely affect performance

      2) As for Cutting Strategy, based on what I have seen data scientst prefer a customized method than to rely on SAP's cutting strategy gives them more control .

  • Dec 07, 2016 at 09:40 AM

    On Point 2 this is a technical restriction ; the underlying APL stored procedure takes one table as input dataset in its signature ; such procedure does not support custom cutting strategies where the user specifies 2 or 3 input data sets (estimation, validation and test).

    Add comment
    10|10000 characters needed characters exceeded