on 07-26-2017 2:42 PM
Hello Community,
I have a procedure with R code. There I imputate missing values and analyse which variables of my dataset are highly correlated. I tested the code in R Studio and it works fine. But in HANA I get the following error and I have no idea why.
SAP DBTech JDBC: [2048]: column store error: search table error: [34082] Execution of R script failed.;Timeout during R script execution or receiving result data
My procedure:
DROP Table "THESIS"."Case_1::Case1_CORRELATION";
CREATE Column TABLE "THESIS"."Case_1::Case1_CORRELATION" ("NAMES" VARCHAR(40), "RESULT" INTEGER );
DROP PROCEDURE "THESIS"."CORRELATION";
CREATE PROCEDURE "THESIS"."CORRELATION"(IN input "_SYS_BIC"."Case_1/CA_VAR_SAMPLE",OUT result "THESIS"."Case_1::Case1_CORRELATION")
LANGUAGE RLANG AS
BEGIN
library(caret)
library(randomForest)
library(mice)
#only numeric values
resultNumeric<-input[,sapply(input, is.numeric)]
# perform mice imputation, based on random forests.
miceMod <- mice(data= resultNumeric,m=5, visitSequence = 1:ncol(resultNumeric), method="rf")
# generate the completed data.
miceOutput <- complete(miceMod)
# find attributes that are highly corrected
highlyCorrelated <- findCorrelation(cor(miceOutput), cutoff=0.8,verbose=FALSE, names=FALSE)
highlyCorrelated.names<-names(miceOutput[highlyCorrelated])
# Indentifying variable Names of Highly Correlated variables
highlyCorCol <- colnames(miceOutput)[highlyCorrelated]
#save which variables are highly correlated
result.corr<-as.list(highlyCorrelated.names)
result.names<-as.list(names(input))
index <- match(result.names, result.corr, nomatch = 0)
index.x <- ifelse(index>0, 1,0)
result<-data.frame(NAMES= names(input), RESULT= as.integer(indx.x))
END;
CALL "THESIS"."CORRELATION"("_SYS_BIC"."Case_1/CA_VAR_SAMPLE","THESIS"."Case_1::Case1_CORRELATION") WITH OVERVIEW;
R Version: 3.2.3
Rserve:1.7.3
Does anyone have an idea why the error happened?
Thanks in advance.
Ok, for how long does this R code run? Have you set up the
cer_timeout
parameter?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Then it looks like your code actually would run longer than the 5 mins (=300 secs) and that the time out parameter actually does its job.
One way to approach this would be to increase the parameter so that the procedure doesn't hit the timeout limit.
Another option (as in additional not alternative) could be to check whether or not you could supplement steps in R with PAL algorithms to allow for better performance and less data transfer.
Hi,
is it correct to put tables as variables types here: CREATEPROCEDURE"THESIS"."CORRELATION"(IN input "_SYS_BIC"."Case_1/CA_VAR_SAMPLE",OUT result "THESIS"."Case_1::Case1_CORRELATION")
i think you should create types for it.
see SAP_HANA_SQL_and_System_Views_Reference_en.pdf - 3.9.1.50 CREATE PROCEDURE Statement (Procedural)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
85 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.