cancel
Showing results for 
Search instead for 
Did you mean: 

Delegation restrictions with SAP PA 3.1 when delegating to Native Spark/Hadoop

former_member197234
Participant
0 Kudos

Team

Current with delegation we have the following restrictions with APL

Restriction

Cases when model training is not delegated to APL:

  • In the Recommendation and Social Analysis modules.
  • When the model uses a custom cutting strategy.
  • When the model uses the option to compute a decision tree.

1) Will the same restrictions apply to delegation with Native Spark as well . ?

2) Why would a cutting strategy affect delegation ....if I would want to manually suggest a cutting strategy (70% Training 20% Validation 10% Testing)

3) If I suggest a manual cutting strategy will all the data be replicated to the PA server ?

Accepted Solutions (1)

Accepted Solutions (1)

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi, first thing first, let's not confuse APL delegation (HANA world) and Native Spark Modeling (Hadoop/Spark world) 😉

The reference for Native Spark Modeling restrictions is the SAP note https://launchpad.support.sap.com/#/notes/2391541. This answers 1)/.

2) this is a current restriction where APL delegation does not kick in. I do not have the detailed answer at hand but I can involve the appropriate product manager, if need be. Can you please explain why this is a concern?

3) when there are restrictions related to APL delegation (HANA world) and Native Spark Modeling (Hadoop/Spark world) it does not mean that the data is replicated, it is in fact queried from the underlying database or data lake and transferred for processing to the PA server (or desktop if desktop is used). The real bottleneck for predictive projects is not necessarily the creation of the predictive model, but rather the scoring of new data rows. For this one, this is purely processed in-database.

I hope this helps, thanks & regards Antoine

former_member197234
Participant
0 Kudos

Thank you Antoine for the detailed explanation .

1) Since the data is queried we are still bringing data over to PA server for processing and if I am querying more that a million rows this would definitely affect performance

2) As for Cutting Strategy, based on what I have seen data scientst prefer a customized method than to rely on SAP's cutting strategy gives them more control .

Answers (1)

Answers (1)

marc_daniau
Advisor
Advisor
0 Kudos

On Point 2 this is a technical restriction ; the underlying APL stored procedure takes one table as input dataset in its signature ; such procedure does not support custom cutting strategies where the user specifies 2 or 3 input data sets (estimation, validation and test).

former_member197234
Participant
0 Kudos

Thank you Marc for the additional clarification !