Skip to Content
0
Jan 20, 2017 at 12:28 AM

HANA PAL Gradient Boosted Decision Tree bug and documentation irregularity

313 Views Last edit Jan 25, 2017 at 05:53 AM 4 rev

I would like to get a second opinion of the example code provided in the documentation for gradient boosted decision trees, shown here. It appears that the classification implementation of the procedure is only successful if there are two classifiers. Otherwise there is the following error:

Could not execute 'CALL DEVUSER.PAL_GBDT_PREDICT( PAL_GBDT_PREDICT_TBL, PAL_GBDT_CONTROL_TBL, PAL_GBDT_MODEL_TBL, ...' in 100 ms 971 µs . 
SAP DBTech JDBC: [2048]: column store error: search table error:  [2620] executor: plan operation failed;not all requested columns in output table "$$P4$$_SYS_SS_CE_169321_vers2_lang12_type1_139936671594496:2_TMP_CALL" have the same size.' 

This can be easily tested by adding a third variable to the data set provided, which is a simple change from B to C. I have directly copied the example code from the docs, with this change indicated below. I am running the latest HANA2 Express Edition.

Incidentally, the training data set provided in the docs is identical to the prediction data set provided. Whilst this is not technically incorrect, it is irregular. This is also the case for the regression example.

Furthermore, I believe the bug described above was discovered by the author of the documentation example. This can be seen in the structure of the data set.

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.0, 10.0, 100, 1.0, 'A');
...
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.0, 40.0, 400, 4.0, 'B');
...
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.0, 90.0, 900, 2.0, 'B');

I won't go into the explanation except to say that the final 'B' looks to have been changed from a 'C'. I am of the opinion that this was done deliberately to ensure the example worked.

Here is the code to reproduce the error:

SET SCHEMA DEVUSER; 

DROP TYPE PAL_GBDT_TRAIN_T;
CREATE TYPE PAL_GBDT_TRAIN_T AS TABLE ("ATTRIBUTE1" DOUBLE, "ATTRIBUTE2" DOUBLE, "ATTRIBUTE3" DOUBLE,"ATTRIBUTE4" DOUBLE,"LABEL" varchar(50));

DROP TYPE PAL_GBDT_CONTROL_T;
CREATE TYPE PAL_GBDT_CONTROL_T AS TABLE(NAME VARCHAR(50), INTARGS INTEGER, DOUBLEARGS DOUBLE, STRINGARGS VARCHAR(100));

DROP TYPE PAL_GBDT_CV_T;
CREATE TYPE PAL_GBDT_CV_T AS TABLE("PARAM_NAME" VARCHAR(5000),"PARAM_VALUE" VARCHAR(5000));

DROP TYPE PAL_GBDT_MODEL_T;
CREATE TYPE PAL_GBDT_MODEL_T AS TABLE( "LINE_" INTEGER,	"KEY_" VARCHAR(5000), "VALUE_" VARCHAR(5000));

DROP TABLE PAL_GBDT_TRAIN_PDATA_TBL;
CREATE TABLE PAL_GBDT_TRAIN_PDATA_TBL("POSITION" INT, "SCHEMA_NAME" NVARCHAR(256), "TYPE_NAME" NVARCHAR(256), "PARAMETER_TYPE" VARCHAR(7));

INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (1,'DEVUSER','PAL_GBDT_TRAIN_T','IN');
INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (2,'DEVUSER','PAL_GBDT_CONTROL_T','IN');
INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (3,'DEVUSER','PAL_GBDT_CV_T','OUT');
INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (4,'DEVUSER','PAL_GBDT_MODEL_T','OUT');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_DROP('DEVUSER','PAL_GBDT_TRAIN');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL','GBDTTRAIN','DEVUSER','PAL_GBDT_TRAIN',PAL_GBDT_TRAIN_PDATA_TBL);

DROP TABLE PAL_GBDT_TRAIN_TBL;
CREATE COLUMN TABLE PAL_GBDT_TRAIN_TBL LIKE PAL_GBDT_TRAIN_T;

DROP TABLE PAL_GBDT_CONTROL_TBL;
CREATE COLUMN TABLE PAL_GBDT_CONTROL_TBL LIKE PAL_GBDT_CONTROL_T;
	
DROP TABLE PAL_GBDT_CV_TBL;
create column table PAL_GBDT_CV_TBL	like PAL_GBDT_CV_T;

DROP TABLE PAL_GBDT_MODEL_TBL;
CREATE COLUMN TABLE PAL_GBDT_MODEL_TBL 	LIKE PAL_GBDT_MODEL_T;

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.0, 10.0, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.1, 10.1, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.2, 10.2, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.3, 10.4, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.2, 10.3, 100, 1.0, 'A');

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.0, 40.0, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.1, 40.1, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.2, 40.2, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.3, 40.4, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.2, 40.3, 400, 4.0, 'B');

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.0, 90.0, 900, 2.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.1, 90.1, 900, 1.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.2, 90.2, 900, 2.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.3, 90.4, 900, 1.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.2, 90.3, 900, 1.0, 'C'); --changed

INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('LEARNING_RATE',  null, 0.5,  null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('MIN_SPLIT_LOSS', null, 0.0,  null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('FOLD_NUM',       5,    null, null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('ITER_NUM',       4,    null, null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('CV_METRIC',      null, null, 'ERROR_RATE');
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('REF_METRIC',     null, null, 'AUC');
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('MAX_TREE_DEPTH', 6,    null, null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('RANGE_LEARNING_RATE',  null, null, '[0.1,3,1.0]');
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('RANGE_MIN_SPLIT_LOSS', null, null, '[0.1,3,1.0]');

CALL DEVUSER.PAL_GBDT_TRAIN(PAL_GBDT_TRAIN_TBL,	PAL_GBDT_CONTROL_TBL,PAL_GBDT_CV_TBL,PAL_GBDT_MODEL_TBL) WITH overview;

DROP TYPE PAL_GBDT_PREDICT_T;
CREATE TYPE PAL_GBDT_PREDICT_T AS TABLE ( "ID" INTEGER,	"ATTRIBUTE1" DOUBLE, "ATTRIBUTE2" DOUBLE, "ATTRIBUTE3" DOUBLE,"ATTRIBUTE4" DOUBLE);

DROP TYPE PAL_GBDT_CONTROL_T;
CREATE TYPE PAL_GBDT_CONTROL_T AS TABLE( NAME VARCHAR(50), 	INTARGS INTEGER, DOUBLEARGS DOUBLE, STRINGARGS VARCHAR(100));

DROP TYPE PAL_GBDT_RESULT_T;
CREATE TYPE PAL_GBDT_RESULT_T as table(	"ID" INTEGER,"LABEL" VARCHAR(50),"PROB" DOUBLE);

drop table PAL_GBDT_RESULT_TBL;
create table PAL_GBDT_RESULT_TBL like PAL_GBDT_RESULT_T;

DROP TABLE PAL_GBDT_PREDICT_PDATA_TBL;
CREATE TABLE PAL_GBDT_PREDICT_PDATA_TBL("POSITION" INTEGER, "SCHEMA_NAME" NVARCHAR(256), "TYPE_NAME" NVARCHAR(256), "PARAMETER_TYPE" VARCHAR(7));

INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (1,'DEVUSER','PAL_GBDT_PREDICT_T','IN');
INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (2,'DEVUSER','PAL_GBDT_CONTROL_T','IN');
INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (3,'DEVUSER','PAL_GBDT_MODEL_T','IN');
INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (4,'DEVUSER','PAL_GBDT_RESULT_T','OUT');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_DROP('DEVUSER','PAL_GBDT_PREDICT');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL',	'GBDTPREDICT',	'DEVUSER','PAL_GBDT_PREDICT',PAL_GBDT_PREDICT_PDATA_TBL);

DROP TABLE PAL_GBDT_PREDICT_TBL;
CREATE COLUMN TABLE PAL_GBDT_PREDICT_TBL LIKE PAL_GBDT_PREDICT_T;

DROP TABLE PAL_GBDT_CONTROL_TBL;
CREATE COLUMN TABLE PAL_GBDT_CONTROL_TBL LIKE PAL_GBDT_CONTROL_T;

INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(1,  1.0, 10.0, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(2,  1.1, 10.1, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(3,  1.2, 10.2, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(4,  1.3, 10.4, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(5,  1.2, 10.3, 100, 3);

INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(6,  4.0, 40.0, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(7,  4.1, 40.1, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(8,  4.2, 40.2, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(9,  4.3, 40.4, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(10, 4.2, 40.3, 400, 3);

INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(11, 9.0, 90.0, 900, 2);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(12, 9.1, 90.1, 900, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(13, 9.2, 90.2, 900, 2);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(14, 9.3, 90.4, 900, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(15, 9.2, 90.3, 900, 1);

CALL DEVUSER.PAL_GBDT_PREDICT(PAL_GBDT_PREDICT_TBL,	PAL_GBDT_CONTROL_TBL,PAL_GBDT_MODEL_TBL,PAL_GBDT_RESULT_TBL) WITH overview;

SELECT * FROM PAL_GBDT_RESULT_TBL;

*I did get multi-classification GBDT working using a different data set:

https://answers.sap.com/questions/106570/gradient-boosting-machine-error.html?childToView=112328#answer-112328