Skip to Content

HANA PAL Gradient Boosted Decision Tree bug and documentation irregularity

I would like to get a second opinion of the example code provided in the documentation for gradient boosted decision trees, shown here. It appears that the classification implementation of the procedure is only successful if there are two classifiers. Otherwise there is the following error:

Could not execute 'CALL DEVUSER.PAL_GBDT_PREDICT( PAL_GBDT_PREDICT_TBL, PAL_GBDT_CONTROL_TBL, PAL_GBDT_MODEL_TBL, ...' in 100 ms 971 µs . 
SAP DBTech JDBC: [2048]: column store error: search table error:  [2620] executor: plan operation failed;not all requested columns in output table "$$P4$$_SYS_SS_CE_169321_vers2_lang12_type1_139936671594496:2_TMP_CALL" have the same size.' 

This can be easily tested by adding a third variable to the data set provided, which is a simple change from B to C. I have directly copied the example code from the docs, with this change indicated below. I am running the latest HANA2 Express Edition.

Incidentally, the training data set provided in the docs is identical to the prediction data set provided. Whilst this is not technically incorrect, it is irregular. This is also the case for the regression example.

Furthermore, I believe the bug described above was discovered by the author of the documentation example. This can be seen in the structure of the data set.

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.0, 10.0, 100, 1.0, 'A');
...
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.0, 40.0, 400, 4.0, 'B');
...
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.0, 90.0, 900, 2.0, 'B');

I won't go into the explanation except to say that the final 'B' looks to have been changed from a 'C'. I am of the opinion that this was done deliberately to ensure the example worked.

Here is the code to reproduce the error:

SET SCHEMA DEVUSER; 

DROP TYPE PAL_GBDT_TRAIN_T;
CREATE TYPE PAL_GBDT_TRAIN_T AS TABLE ("ATTRIBUTE1" DOUBLE, "ATTRIBUTE2" DOUBLE, "ATTRIBUTE3" DOUBLE,"ATTRIBUTE4" DOUBLE,"LABEL" varchar(50));

DROP TYPE PAL_GBDT_CONTROL_T;
CREATE TYPE PAL_GBDT_CONTROL_T AS TABLE(NAME VARCHAR(50), INTARGS INTEGER, DOUBLEARGS DOUBLE, STRINGARGS VARCHAR(100));

DROP TYPE PAL_GBDT_CV_T;
CREATE TYPE PAL_GBDT_CV_T AS TABLE("PARAM_NAME" VARCHAR(5000),"PARAM_VALUE" VARCHAR(5000));

DROP TYPE PAL_GBDT_MODEL_T;
CREATE TYPE PAL_GBDT_MODEL_T AS TABLE( "LINE_" INTEGER,	"KEY_" VARCHAR(5000), "VALUE_" VARCHAR(5000));

DROP TABLE PAL_GBDT_TRAIN_PDATA_TBL;
CREATE TABLE PAL_GBDT_TRAIN_PDATA_TBL("POSITION" INT, "SCHEMA_NAME" NVARCHAR(256), "TYPE_NAME" NVARCHAR(256), "PARAMETER_TYPE" VARCHAR(7));

INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (1,'DEVUSER','PAL_GBDT_TRAIN_T','IN');
INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (2,'DEVUSER','PAL_GBDT_CONTROL_T','IN');
INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (3,'DEVUSER','PAL_GBDT_CV_T','OUT');
INSERT INTO PAL_GBDT_TRAIN_PDATA_TBL VALUES (4,'DEVUSER','PAL_GBDT_MODEL_T','OUT');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_DROP('DEVUSER','PAL_GBDT_TRAIN');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL','GBDTTRAIN','DEVUSER','PAL_GBDT_TRAIN',PAL_GBDT_TRAIN_PDATA_TBL);

DROP TABLE PAL_GBDT_TRAIN_TBL;
CREATE COLUMN TABLE PAL_GBDT_TRAIN_TBL LIKE PAL_GBDT_TRAIN_T;

DROP TABLE PAL_GBDT_CONTROL_TBL;
CREATE COLUMN TABLE PAL_GBDT_CONTROL_TBL LIKE PAL_GBDT_CONTROL_T;
	
DROP TABLE PAL_GBDT_CV_TBL;
create column table PAL_GBDT_CV_TBL	like PAL_GBDT_CV_T;

DROP TABLE PAL_GBDT_MODEL_TBL;
CREATE COLUMN TABLE PAL_GBDT_MODEL_TBL 	LIKE PAL_GBDT_MODEL_T;

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.0, 10.0, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.1, 10.1, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.2, 10.2, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.3, 10.4, 100, 1.0, 'A');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(1.2, 10.3, 100, 1.0, 'A');

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.0, 40.0, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.1, 40.1, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.2, 40.2, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.3, 40.4, 400, 4.0, 'B');
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(4.2, 40.3, 400, 4.0, 'B');

INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.0, 90.0, 900, 2.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.1, 90.1, 900, 1.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.2, 90.2, 900, 2.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.3, 90.4, 900, 1.0, 'C'); --changed
INSERT INTO PAL_GBDT_TRAIN_TBL VALUES(9.2, 90.3, 900, 1.0, 'C'); --changed

INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('LEARNING_RATE',  null, 0.5,  null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('MIN_SPLIT_LOSS', null, 0.0,  null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('FOLD_NUM',       5,    null, null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('ITER_NUM',       4,    null, null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('CV_METRIC',      null, null, 'ERROR_RATE');
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('REF_METRIC',     null, null, 'AUC');
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('MAX_TREE_DEPTH', 6,    null, null);
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('RANGE_LEARNING_RATE',  null, null, '[0.1,3,1.0]');
INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('RANGE_MIN_SPLIT_LOSS', null, null, '[0.1,3,1.0]');

CALL DEVUSER.PAL_GBDT_TRAIN(PAL_GBDT_TRAIN_TBL,	PAL_GBDT_CONTROL_TBL,PAL_GBDT_CV_TBL,PAL_GBDT_MODEL_TBL) WITH overview;

DROP TYPE PAL_GBDT_PREDICT_T;
CREATE TYPE PAL_GBDT_PREDICT_T AS TABLE ( "ID" INTEGER,	"ATTRIBUTE1" DOUBLE, "ATTRIBUTE2" DOUBLE, "ATTRIBUTE3" DOUBLE,"ATTRIBUTE4" DOUBLE);

DROP TYPE PAL_GBDT_CONTROL_T;
CREATE TYPE PAL_GBDT_CONTROL_T AS TABLE( NAME VARCHAR(50), 	INTARGS INTEGER, DOUBLEARGS DOUBLE, STRINGARGS VARCHAR(100));

DROP TYPE PAL_GBDT_RESULT_T;
CREATE TYPE PAL_GBDT_RESULT_T as table(	"ID" INTEGER,"LABEL" VARCHAR(50),"PROB" DOUBLE);

drop table PAL_GBDT_RESULT_TBL;
create table PAL_GBDT_RESULT_TBL like PAL_GBDT_RESULT_T;

DROP TABLE PAL_GBDT_PREDICT_PDATA_TBL;
CREATE TABLE PAL_GBDT_PREDICT_PDATA_TBL("POSITION" INTEGER, "SCHEMA_NAME" NVARCHAR(256), "TYPE_NAME" NVARCHAR(256), "PARAMETER_TYPE" VARCHAR(7));

INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (1,'DEVUSER','PAL_GBDT_PREDICT_T','IN');
INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (2,'DEVUSER','PAL_GBDT_CONTROL_T','IN');
INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (3,'DEVUSER','PAL_GBDT_MODEL_T','IN');
INSERT INTO PAL_GBDT_PREDICT_PDATA_TBL VALUES (4,'DEVUSER','PAL_GBDT_RESULT_T','OUT');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_DROP('DEVUSER','PAL_GBDT_PREDICT');

CALL SYS.AFLLANG_WRAPPER_PROCEDURE_CREATE('AFLPAL',	'GBDTPREDICT',	'DEVUSER','PAL_GBDT_PREDICT',PAL_GBDT_PREDICT_PDATA_TBL);

DROP TABLE PAL_GBDT_PREDICT_TBL;
CREATE COLUMN TABLE PAL_GBDT_PREDICT_TBL LIKE PAL_GBDT_PREDICT_T;

DROP TABLE PAL_GBDT_CONTROL_TBL;
CREATE COLUMN TABLE PAL_GBDT_CONTROL_TBL LIKE PAL_GBDT_CONTROL_T;

INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(1,  1.0, 10.0, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(2,  1.1, 10.1, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(3,  1.2, 10.2, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(4,  1.3, 10.4, 100, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(5,  1.2, 10.3, 100, 3);

INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(6,  4.0, 40.0, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(7,  4.1, 40.1, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(8,  4.2, 40.2, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(9,  4.3, 40.4, 400, 3);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(10, 4.2, 40.3, 400, 3);

INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(11, 9.0, 90.0, 900, 2);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(12, 9.1, 90.1, 900, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(13, 9.2, 90.2, 900, 2);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(14, 9.3, 90.4, 900, 1);
INSERT INTO PAL_GBDT_PREDICT_TBL VALUES(15, 9.2, 90.3, 900, 1);

CALL DEVUSER.PAL_GBDT_PREDICT(PAL_GBDT_PREDICT_TBL,	PAL_GBDT_CONTROL_TBL,PAL_GBDT_MODEL_TBL,PAL_GBDT_RESULT_TBL) WITH overview;

SELECT * FROM PAL_GBDT_RESULT_TBL;

*I did get multi-classification GBDT working using a different data set:

https://answers.sap.com/questions/106570/gradient-boosting-machine-error.html?childToView=112328#answer-112328

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Best Answer
    avatar image
    Former Member
    Feb 13, 2017 at 03:31 AM

    Hi Michael,

    Thanks for your question. We noticed this issue and is investigating this issue. For now, you can use the workaround - adding statement

    INSERT INTO PAL_GBDT_CONTROL_TBL VALUES('VERBOSE_OUTPUT', 1, null, null) - which is simiar to

    the one stated in https://answers.sap.com/questions/106570/gradient-boosting-machine-error.html?childToView=112328#answer-112328.

    Best Regards

    Jerome

    Add comment
    10|10000 characters needed characters exceeded