cancel
Showing results for 
Search instead for 
Did you mean: 

HANA update of table slowly, looking for improvments of update procedure

Former Member
0 Kudos

Hello,

I'm facing problem with very slow table updates. Currently I perform several 1000 updates on a table sequential. I would like to know some improvements for a better performance.

I will give a short introduce on what I'm doing exactly:

My starting situation is a table with sequences for different properties. Each value of the sequence represents one measured value for one day.

SEQUENCEPROPERTY
22;25;30;24;20Temperature
23;25;30;20;18Temperature
60;70;50;40;60Humidity
......

On this  I perform a sequential pattern mining using the SPADE implementation in R, getting as result a frequent sequence and the support of it. Let's say the result looks like following one:


IDFREQ-SEQUENCESUPPORTPROPERTY
125;301Temperature
223;250.5Temperature
360;70;500.33Humidity
........


Now I have a procedure which creates dynamical a table where each frequent sequence represents one column. The column name consist's of the property and the ID of the frequent sequence it represents (e.g "Temperature_1" stands for the freq. sequence "25:30"). Additional there are the columns sequence and property.

This table represents if a  frequent sequence occurs in a sequence or not. Therefore I fill them with 1 (occurs) or 0 (occurs not). E.g. as I mentioned column "Temperature_1" stands for the freq. sequence "25;30" this occurs in the sequence "22;25;30;24;20" so it will be "1" as it occurs also in "23;25;30;20;18" the second row in column "Temperature_1" will be also "1" but the third will be "0" as it's a totally other property.


Temperature_1Temperature_2Humidity_3SEQUENCEPROPERTY
10022;25;30;24;20Temperature
11023;25;30;20;18Temperature
00160;70;50;40;60Humidity
...............


Filling the table is also where I facing the performance problems.


In a first step I just insert the sequences and properties in the related columns.


To fill the table now with "0" or "1" for  occurrence or absence of a frequent sequence in a sequence I use the following algorithm:



-- iteration over all columns (frequent sequences)

FOR freqSequenceID IN 1..:maxFreqSequenceID DO

  --get the propertie and the frequent sequence which the current column represents

  SELECT "PROPERTY" INTO property FROM  FREQUENT_SEQUENCE_TBL WHERE "ID" = :freqSequenceID;

  SELECT "SEQUENCE" INTO freqSequence FROM  FREQUENT_SEQUENCE_TBL WHERE "ID" = :freqSequenceID;

  -- concatenate the name of current column

  colName := CONCAT(:property,'_');

  colName := CONCAT(:colName,freqSequenceID);

  -- update all rows in the column with "0"

  insertVar := CONCAT ('UPDATE OCCURENCE_TBL SET  "', :colName);

  insertVar := CONCAT (insertVar, '" = 0');

  EXECUTE IMMEDIATE (:insertVar);

  -- update all rows where the current frequent sequence occurs in the sequence with one

  --      UPDATE OCCURENCE_TBL SET  OCCURENCE_TBL.colName = 1 FROM OCCURENCE_TBL

  --      INNER JOIN "SEQUENCE_TBL" ON OCCURENCE_TBL."SEQUENCE_ID" = SEQUENCE_TBL."SEQUENCE_ID" AND OCCURENCE_TBL."PROPERTY" = :property

  --      WHERE LOCATE ( SEQUENCE_TBL."SEQUENCE", :freqSequence) >= 1 AND SEQUENCE_TBL."PROPERTY" = :property

  insertVar := CONCAT ('UPDATE O SET  O."', :colName);

  insertVar := CONCAT (insertVar, '" = 1 FROM OCCURENCE_TBL AS O INNER JOIN "SEQUENCE_TBL" AS S ON O."SEQUENCE_ID" = S."SEQUENCE_ID" AND O."PROPERTY" = ''');

  insertVar := CONCAT (insertVar, :property);

  insertVar := CONCAT (insertVar, '''WHERE LOCATE ( S."SEQUENCE", ''');

  insertVar := CONCAT (insertVar, :freqSequence);

  insertVar := CONCAT (insertVar,''') >= 1 AND S."PROPERTY" = ''');

  insertVar := CONCAT (insertVar, :property);

  insertVar := CONCAT (insertVar, '''');

  EXECUTE IMMEDIATE (:insertVar);

 

END FOR;

I think that the slow performance of it is because a update is sequential done for each column and not together at all in the end.

I tried already to collect all this update statement and execute them after the loop but it seems that with EXEC / EXECUTE IMMEDIATE only one statement is possible.

Has anyone an idea how to improve the update process of this table?

When it's possible it could be also a virtual table, it must be not persistent. In the next step it would be used for a classification with CART.

Thank you in advance

Matthias

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

Not sure I really get the data model completely, but there is no specific limitation on the UPDATE command when used in EXECUTE IMMEDIATE.

If you can write a query that would produce the 1 or 0 value for each column at once, then you can easily perform the update in one go.

example:

drop table o;

create column table o (A_IND int, B_IND int, C_IND int, oname nvarchar(256) primary key);

The statement to generate the 1 or 0 indication can look like this:

select

    case

        when (locate(lower(object_name), 'a') > 0)

     then 1

     else 0

    end  as a_ind,

    case

        when (locate(lower(object_name), 'b') > 0)

        then 1

        else 0

    end  as b_ind,

    case

        when (locate(lower(object_name), 'c') > 0)

        then 1

        else 0

    end  as c_ind,

    object_name

from objects:

Now pre-fill the table:

insert into o (oname) (select distinct object_name from objects where object_name is not null);

The update command would then be:

update o set

      o.a_ind = oo.a_ind,

      o.b_ind = oo.b_ind,

      o.c_ind = oo.c_ind

from

  (select

     case

         when (locate(lower( object_name), 'a') > 0)

         then 1

         else 0

     end  as a_ind,

     case

         when (locate(lower( object_name), 'b') > 0)

         then 1

         else 0

     end  as b_ind,

     case

         when (locate(lower( object_name), 'c') > 0)

         then 1

         else 0

     end  as c_ind,

     object_name

  from objects) oo, o

where  oo.object_name = o.oname;

It's actually pretty straight forward and also works in an EXECUTE IMMEDIATE.

One important thing to remember is that with column tables in SAP HANA every update creates a full new record - it's not only the touched columns that get stored, but all columns. So updating as many columns as possible and required in one go is a good way to avoid performance penalties.

Former Member
0 Kudos

Maybe I'm doing something wrong but when I try to but more then one UPDATE command inside EXECUTE IMMEDIATE I get an error;

simplified, I mean something like:


EXECUTE IMMEDIATE('UPDATE O SET "a_ind" = 1; UPDATE O SET "b_ind" = 1;');

lbreddemann
Active Contributor
0 Kudos

OK, the problem here is that you try to run multiple update commands at once.

That is not allowed and should not be what you want.

Instead, have one update command which you use to update all columns - just as I've demonstrated.

Former Member
0 Kudos

As you have demonstrated it I have do hard code the columns which should be updated and can't keep it dynamical.

If I have a updated table with new sequences I will also get other frequent sequences. In your solution I must adjust then the whole code for 1, 0 indication.

lbreddemann
Active Contributor
0 Kudos

What is stopping you from creating dynamically new parts of the update statement, similar how you currently create multiple update statements?

The statement shown is only marginally more complex and you would need to create two parts per "frequent sequence" loop iteration:

The update part - all columns to fill

and

the select part.

I guess it's up to you what you want to do: brute force updating the table with poor performance, or a little less trivial dynamic statement creation.

Former Member
0 Kudos

Okay I understood know your solution completely, did't saw before that it could run also in loop iteration and be executed after it.

Answers (0)