Skip to Content

Delegation restrictions with SAP PA 3.1 when delegating to Native Spark/Hadoop

Dec 05, 2016 at 09:45 PM


avatar image


Current with delegation we have the following restrictions with APL


Cases when model training is not delegated to APL:

  • In the Recommendation and Social Analysis modules.
  • When the model uses a custom cutting strategy.
  • When the model uses the option to compute a decision tree.

1) Will the same restrictions apply to delegation with Native Spark as well . ?

2) Why would a cutting strategy affect delegation ....if I would want to manually suggest a cutting strategy (70% Training 20% Validation 10% Testing)

3) If I suggest a manual cutting strategy will all the data be replicated to the PA server ?

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

2 Answers

Best Answer
Dec 06, 2016 at 07:43 AM

Hi, first thing first, let's not confuse APL delegation (HANA world) and Native Spark Modeling (Hadoop/Spark world) ;-)

The reference for Native Spark Modeling restrictions is the SAP note This answers 1)/.

2) this is a current restriction where APL delegation does not kick in. I do not have the detailed answer at hand but I can involve the appropriate product manager, if need be. Can you please explain why this is a concern?

3) when there are restrictions related to APL delegation (HANA world) and Native Spark Modeling (Hadoop/Spark world) it does not mean that the data is replicated, it is in fact queried from the underlying database or data lake and transferred for processing to the PA server (or desktop if desktop is used). The real bottleneck for predictive projects is not necessarily the creation of the predictive model, but rather the scoring of new data rows. For this one, this is purely processed in-database.

I hope this helps, thanks & regards Antoine

Show 1 Share
10 |10000 characters needed characters left characters exceeded

Thank you Antoine for the detailed explanation .

1) Since the data is queried we are still bringing data over to PA server for processing and if I am querying more that a million rows this would definitely affect performance

2) As for Cutting Strategy, based on what I have seen data scientst prefer a customized method than to rely on SAP's cutting strategy gives them more control .

Dec 07, 2016 at 09:40 AM

On Point 2 this is a technical restriction ; the underlying APL stored procedure takes one table as input dataset in its signature ; such procedure does not support custom cutting strategies where the user specifies 2 or 3 input data sets (estimation, validation and test).

Show 1 Share
10 |10000 characters needed characters left characters exceeded

Thank you Marc for the additional clarification !