on 06-15-2016 2:29 PM
Hello,
I'm facing problem with very slow table updates. Currently I perform several 1000 updates on a table sequential. I would like to know some improvements for a better performance.
I will give a short introduce on what I'm doing exactly:
My starting situation is a table with sequences for different properties. Each value of the sequence represents one measured value for one day.
SEQUENCE | PROPERTY |
---|---|
22;25;30;24;20 | Temperature |
23;25;30;20;18 | Temperature |
60;70;50;40;60 | Humidity |
... | ... |
On this I perform a sequential pattern mining using the SPADE implementation in R, getting as result a frequent sequence and the support of it. Let's say the result looks like following one:
ID | FREQ-SEQUENCE | SUPPORT | PROPERTY |
---|---|---|---|
1 | 25;30 | 1 | Temperature |
2 | 23;25 | 0.5 | Temperature |
3 | 60;70;50 | 0.33 | Humidity |
.. | .. | .. | .. |
Now I have a procedure which creates dynamical a table where each frequent sequence represents one column. The column name consist's of the property and the ID of the frequent sequence it represents (e.g "Temperature_1" stands for the freq. sequence "25:30"). Additional there are the columns sequence and property.
This table represents if a frequent sequence occurs in a sequence or not. Therefore I fill them with 1 (occurs) or 0 (occurs not). E.g. as I mentioned column "Temperature_1" stands for the freq. sequence "25;30" this occurs in the sequence "22;25;30;24;20" so it will be "1" as it occurs also in "23;25;30;20;18" the second row in column "Temperature_1" will be also "1" but the third will be "0" as it's a totally other property.
Temperature_1 | Temperature_2 | Humidity_3 | SEQUENCE | PROPERTY |
---|---|---|---|---|
1 | 0 | 0 | 22;25;30;24;20 | Temperature |
1 | 1 | 0 | 23;25;30;20;18 | Temperature |
0 | 0 | 1 | 60;70;50;40;60 | Humidity |
... | ... | ... | ... | ... |
Filling the table is also where I facing the performance problems.
In a first step I just insert the sequences and properties in the related columns.
To fill the table now with "0" or "1" for occurrence or absence of a frequent sequence in a sequence I use the following algorithm:
-- iteration over all columns (frequent sequences)
FOR freqSequenceID IN 1..:maxFreqSequenceID DO
--get the propertie and the frequent sequence which the current column represents
SELECT "PROPERTY" INTO property FROM FREQUENT_SEQUENCE_TBL WHERE "ID" = :freqSequenceID;
SELECT "SEQUENCE" INTO freqSequence FROM FREQUENT_SEQUENCE_TBL WHERE "ID" = :freqSequenceID;
-- concatenate the name of current column
colName := CONCAT(:property,'_');
colName := CONCAT(:colName,freqSequenceID);
-- update all rows in the column with "0"
insertVar := CONCAT ('UPDATE OCCURENCE_TBL SET "', :colName);
insertVar := CONCAT (insertVar, '" = 0');
EXECUTE IMMEDIATE (:insertVar);
-- update all rows where the current frequent sequence occurs in the sequence with one
-- UPDATE OCCURENCE_TBL SET OCCURENCE_TBL.colName = 1 FROM OCCURENCE_TBL
-- INNER JOIN "SEQUENCE_TBL" ON OCCURENCE_TBL."SEQUENCE_ID" = SEQUENCE_TBL."SEQUENCE_ID" AND OCCURENCE_TBL."PROPERTY" = :property
-- WHERE LOCATE ( SEQUENCE_TBL."SEQUENCE", :freqSequence) >= 1 AND SEQUENCE_TBL."PROPERTY" = :property
insertVar := CONCAT ('UPDATE O SET O."', :colName);
insertVar := CONCAT (insertVar, '" = 1 FROM OCCURENCE_TBL AS O INNER JOIN "SEQUENCE_TBL" AS S ON O."SEQUENCE_ID" = S."SEQUENCE_ID" AND O."PROPERTY" = ''');
insertVar := CONCAT (insertVar, :property);
insertVar := CONCAT (insertVar, '''WHERE LOCATE ( S."SEQUENCE", ''');
insertVar := CONCAT (insertVar, :freqSequence);
insertVar := CONCAT (insertVar,''') >= 1 AND S."PROPERTY" = ''');
insertVar := CONCAT (insertVar, :property);
insertVar := CONCAT (insertVar, '''');
EXECUTE IMMEDIATE (:insertVar);
END FOR;
I think that the slow performance of it is because a update is sequential done for each column and not together at all in the end.
I tried already to collect all this update statement and execute them after the loop but it seems that with EXEC / EXECUTE IMMEDIATE only one statement is possible.
Has anyone an idea how to improve the update process of this table?
When it's possible it could be also a virtual table, it must be not persistent. In the next step it would be used for a classification with CART.
Thank you in advance
Matthias
Not sure I really get the data model completely, but there is no specific limitation on the UPDATE command when used in EXECUTE IMMEDIATE.
If you can write a query that would produce the 1 or 0 value for each column at once, then you can easily perform the update in one go.
example:
drop table o;
create column table o (A_IND int, B_IND int, C_IND int, oname nvarchar(256) primary key);
The statement to generate the 1 or 0 indication can look like this:
select
case
when (locate(lower(object_name), 'a') > 0)
then 1
else 0
end as a_ind,
case
when (locate(lower(object_name), 'b') > 0)
then 1
else 0
end as b_ind,
case
when (locate(lower(object_name), 'c') > 0)
then 1
else 0
end as c_ind,
object_name
from objects:
Now pre-fill the table:
insert into o (oname) (select distinct object_name from objects where object_name is not null);
The update command would then be:
update o set
o.a_ind = oo.a_ind,
o.b_ind = oo.b_ind,
o.c_ind = oo.c_ind
from
(select
case
when (locate(lower( object_name), 'a') > 0)
then 1
else 0
end as a_ind,
case
when (locate(lower( object_name), 'b') > 0)
then 1
else 0
end as b_ind,
case
when (locate(lower( object_name), 'c') > 0)
then 1
else 0
end as c_ind,
object_name
from objects) oo, o
where oo.object_name = o.oname;
It's actually pretty straight forward and also works in an EXECUTE IMMEDIATE.
One important thing to remember is that with column tables in SAP HANA every update creates a full new record - it's not only the touched columns that get stored, but all columns. So updating as many columns as possible and required in one go is a good way to avoid performance penalties.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
What is stopping you from creating dynamically new parts of the update statement, similar how you currently create multiple update statements?
The statement shown is only marginally more complex and you would need to create two parts per "frequent sequence" loop iteration:
The update part - all columns to fill
and
the select part.
I guess it's up to you what you want to do: brute force updating the table with poor performance, or a little less trivial dynamic statement creation.
User | Count |
---|---|
93 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.