Hello,
Where can I find more information about the internals on how the communication between SAP HANA and R Server occurs? What I'm trying to find out is that does R receive the data in bulk, or entry by entry?
For example, assuming I have an INPUT_TABLE with a column PERIOD (for a date) and a column VALUE (for some double value); then if I have the following implementation:
CREATE PROCEDURE CALC_STATS( )LANGUAGE SQLSCRIPT ASBEGIN input_data = SELECT * FROM INPUT_TABLE; CALL CALC_INPUT( :input_data, T_OUTPUT_TABLE ); INSERT INTO OUTPUT_TABLE SELECT * FROM :T_OUTPUT_TABLE; END;
And for the RLANG procedure I have:
CREATE PROCEDURE CALC_INPUT( IN input_table INPUT_TABLE, OUT result T_OUTPUT_TABLE )LANGUAGE RLANG ASBEGIN input_period <- input_table$PERIOD input_value <- sum( as..double( input_table$VALUE ) ) result <- data.frame( PERIOD = input_period, VALUE = input_value )END;
After I run CALC_STATS(), my RLANG procedure for the sum() function will treat the input_table$VALUE as a vector representing the values from the whole table; that is, I will get the sum based on the values of ALL entries, whereas the input_period will contain only the value for the current entry. What does this mean?
Does the R-script get some sort of a cursor and then it pulls data as necessary? If I have 100-million of entries, do all of them get sent to the R-server at once? (assuming I do SELECT * FROM ...).
On the example above I would expect that for the following case:
INPUT_TABLE
PERIOD, VALUE
2012-11-01, 100
2012-11-02, 200
2012-11-03, 300
To have:
OUTPUT_TABLE
PERIOD, VALUE
2012-11-01, 100
2012-11-02, 200
2012-11-03, 300
Instead, I have:
OUTPUT_TABLE
PERIOD, VALUE
2012-11-01, 600
2012-11-02, 600
2012-11-03, 600
Why is that? Wouldn't R process an entry at a time?
Thanks in advance for reading thus far and for any assistance you can provide.
Genc