How to increase Performance with DELETE FROM TABLE...

leracka · ‎11-10-2023

Hey,

im currently writing a Program to delete Data from multiple Tables. Each table can be very Big (up to 10+ Billion entrys). And i need to delete a few hundred millions from them weekly. Problem is, all my trys are very slow.

My last 2 Trys were the following:

Open a Cursor to SELECT the DATA i wanna delete
Fetch the Data from the cursor in Packages (mostly around 20.000).
Use the fetched Data in a DELETE FROM TABLE

Problem here, i cant use a COMMIT WORK, because it would close the cursor.

Second Try:

SELECT all Data in an internal Table
Select the data from that itab in packages around 20.000 in a loop
delete the packages again with a DELETE FORM TABLE
COMMIT WORK after

Main Problem here, if i select all data in an internal Table, the tables can get very big fast and we run into memory issues.

But both trys are VERY SLOW.

Before i used those 2 statements, i used an DELETE WHERE without a SELECT. Problem here is, i cant use the COMMIT WORK after a certain Threshold.

For Example: One Run took 17 Seconds to SELECT all the Data ( ~2.000.000 rows in all tables combined), pack it in Packages of 20.000 and skip the deletion only.
Adding the Deletion to the exact same program it took way over 45min and run into our timeout.

So both solutions work in the way, it selects the Data really fast and does the Deletion, but its very slow. SInce its a HANA-DB, we cant use an Index here (at least we were told so).

So what can we do?

raymond_giuseppi · ‎11-10-2023

Is 3320379 - ABAP: Performance improvements for EXPORT and DELETE suitable for your Kernel (from 2000002 - FAQ: SAP HANA SQL Optimization, easy-to-remember number) - Look also at SAP Notes 2351294 and 2823243

Otherwise, try parallelizing record deletions without triggering general performance problems.

leracka · ‎11-10-2023

I'll looking at that Link. But it is already parallelized. Im Talking about the Performance for each Task. In Reality there are 300+ Tasks, that need to be done.

Edit: The Note cant be implemented on our system

raymond_giuseppi · ‎11-10-2023

What kind of table are those, standard or custom,
Are those tables changes logged (dbtablog)
Can they be partitionned

leracka · ‎11-10-2023

Standard
Logging is disabled in technical Settings
What do you mean with that? I need to look up what that is

raymond_giuseppi · ‎11-13-2023

Read some documentation in SAP HANA Table Partitioning such as Table Partitioning

Then consider partitionning on some date/month criteria, so delete could be mostly replaced with drop of partitions.

Sandra_Rossi · ‎11-10-2023

It will be slow anyway.

Timeout? Why do you run it in dialog?

What is the original DELETE statement you need to adapt?

Which database indexes exist?

Partitions could help, as Raymond said, but maybe you don't need them.

leracka · ‎11-10-2023

There are no Index on that Tables beside the Primary Keys.

We dont run it in Dialog, but in Backgroud. We set a Timeoutvalue in Selection-Screen in Seconds to limit executions, that run in a dead end. In the Test we seit it to 3600, e.g. 1 Hour, cause 1 Hour is already far too long of an execution time for one Task.

The Deletion is already parallelized, i dont know what you mean with "partitions". The Tables are all Standard-Tables from SAP, no Z* if that helps.

We already tried to use an Index a few Months back in another try (where we used DELETE WHERE), but it didnt help at all. But the Indexes went VERY Big (Multiple TB of Space). So we had to delete them immediately.

And what do you mean with "original DELETE"? There is no original. The Standard has ways to delete data from those tables, but they are also too slow, thats why i have to make this Program right now.

Sandra_Rossi · ‎11-10-2023

Original DELETE is the DELETE you want to do first, without any optimization. Just one DELETE.

DELETE FROM table WHERE ???????????

You still don't provide any information about the structure of the table, which lines you need to delete, which columns are queried, do they belong to the primary key, which position in the primary key, and so on.

Partitions in tables = database concept: look at the definition in the Web, it's explained everywhere.

What is this "Timeoutvalue" concept in Selection-Screen? Never heard of that concept. I guess it's a custom timeout logic you have implemented, which means that your point is you just want to improve the performance. If you say "timeout" without any further explanation, you can understand that people will think that it's the system timeout, not a custom timeout. There's a system timeout in dialog but none in background.

If you don't give details, people will just give you general recommendations, and that won't help you.

leracka · ‎11-13-2023

The Tables are the Standard UDF-Tables like /DMF/UFC_TS etc. The Columns do not belong to the primary key. Examples are: Time_gran TS_SOURCE, diag_id, loc_id, prod_id, tsmp_fr. But they are filled from the selection-screen, so if nothing is entered, the WHERE-Clause for that column will be dismissed.

An Example, where i used DELETE WHERE, i already modified it to include the commit work between, but i dont know if that really helps:

WHILE subrc = 0.

IF i_testmodus = abap_true.

SELECT COUNT(*)

FROM /dmf/ufc_ts AS t

INTO @DATA(res)

UP TO @i_package_size ROWS

WHERE t~diag_id IN @i_diag_id

AND t~prod_id IN @i_prod_guid

AND t~loc_id = @i_loc_guid

AND t~sales_org_id IN @i_sales_org_id

AND t~ts_source IN @i_ts_src

AND t~tstmp_fr >= @i_tstmp_fr

AND t~tstmp_to <= @i_tstmp_to.

subrc = sy-subrc.

r_res += sy-dbcnt.

IF subrc <> 0 AND subrc <> 4.

MESSAGE s023 DISPLAY LIKE 'E'.

LEAVE LIST-PROCESSING.

ENDIF.

ELSE.

DELETE

FROM /dmf/ufc_ts

WHERE diag_id IN @i_diag_id

AND prod_id IN @i_prod_guid

AND loc_id = @i_loc_guid

AND sales_org_id IN @i_sales_org_id

AND ts_source IN @i_ts_src

AND tstmp_fr >= @i_tstmp_fr

AND tstmp_to <= @i_tstmp_to

UP TO @i_package_size ROWS.

subrc = sy-subrc.

r_res += sy-dbcnt.

IF subrc <> 0 AND subrc <> 4.

MESSAGE s024 DISPLAY LIKE 'E'.

LEAVE LIST-PROCESSING.

ENDIF.

COMMIT WORK.

ENDWHILE.

leracka · ‎11-13-2023

The Tables are the Standard UDF-Tables like /DMF/UFC_TS etc. The Columns do not belong to the primary key. Examples are: Time_gran TS_SOURCE, diag_id, loc_id, prod_id, tsmp_fr. But they are filled from the selection-screen, so if nothing is entered, the WHERE-Clause for that column will be dismissed.

An Example, where i used DELETE WHERE, i already modified it to include the commit work between, but i dont know if that really helps:

WHILE subrc = 0.
IF i_testmodus = abap_true.
SELECT COUNT(*)
FROM /dmf/ufc_ts AS t
INTO @DATA(res)
UP TO @i_package_size ROWS
WHERE t~diag_id IN @i_diag_id
AND t~prod_id IN @i_prod_guid
AND t~loc_id = @i_loc_guid
AND t~sales_org_id IN @i_sales_org_id
AND t~ts_source IN @i_ts_src
AND t~tstmp_fr >= @i_tstmp_fr
AND t~tstmp_to <= @i_tstmp_to.
subrc = sy-subrc.
r_res += sy-dbcnt.
IF subrc <> 0 AND subrc <> 4.
MESSAGE s023 DISPLAY LIKE 'E'.
LEAVE LIST-PROCESSING.
ENDIF.
ELSE.
DELETE
FROM /dmf/ufc_ts
WHERE diag_id IN @i_diag_id
AND prod_id IN @i_prod_guid
AND loc_id = @i_loc_guid
AND sales_org_id IN @i_sales_org_id
AND ts_source IN @i_ts_src
AND tstmp_fr >= @i_tstmp_fr
AND tstmp_to <= @i_tstmp_to
UP TO @i_package_size ROWS.
subrc = sy-subrc.
r_res += sy-dbcnt.
IF subrc <> 0 AND subrc <> 4.
MESSAGE s024 DISPLAY LIKE 'E'.
LEAVE LIST-PROCESSING.
ENDIF.
ENDIF.
COMMIT WORK.
ENDWHILE.

matt · ‎11-13-2023

Rather than use a cursor, why not use something like

DO.
  DELETE FROM tab WHERE condition UP TO 20000 ROWS.
  IF sy-subrc IS NOT INITIAL.
    EXIT. 
  ENDIF.
  COMMIT WORK.
ENDDO.

But it depends on your release. UP TO is a fairly recent addition, I believe.

leracka · ‎11-13-2023

Thats exactly what we did 1 Year ago. but the performance wasnt satisfing for our Customers, so we have to develop something new

Sandra_Rossi · ‎11-13-2023

Did you try:

Just do one SELECT to get the primary key values (I assume below that the key is made of prod_id only) of all records to delete.
Write these key values to a new work table ZDELETE_UFC_TS with a package number and i_package_size lines per package number.
Loop at the total number of packages, at each package do DELETE FROM /dmf/ufc_ts WHERE EXISTS ( SELECT * FROM ZDELETE_UFC_TS WHERE package_number = @package_number AND ZDELETE_UFC_TS~prod_id = /dmf/ufc_ts~prod_id ), and COMMIT WORK.
Cleanup your work table

Sandra_Rossi · ‎11-13-2023

When I say "one SELECT", it can be via a cursor so that to write the key values by package, and use ON HOLD to not dump at the INSERT.

leracka · ‎11-16-2023

sandra.rossi I tried that attempt today, but in an AMDP-Procedure:

    DECLARE client varchar(3) := '';
    DECLARE row_cnt int;
    SELECT SESSION_CONTEXT ('CLIENT') AS client INTO client FROM DUMMY;
    data = SELECT TOP :i_del_cnt TS.MANDT
                                ,TS.DIAG_ID
                                ,TS.AGGR_PROFILE_ID
                                ,TS.PROD_ID
                                ,TS.LOC_ID
                                ,TS.SALES_ORG_ID
                                ,TS.DISTR_CHNL_ID
                                ,TS.ORDER_CHNL_ID
                                ,TS.TS_SOURCE
                                ,TS.TIME_GRAN
                                ,TS.TSTMP_FR
                                ,TS.OFR_ID
                                ,TS.EXT_EVENT_ID
           FROM "/DMF/UFC_TS" AS TS
           INNER JOIN :i_prod_ids AS P ON TS.prod_id = P.prod_id
           WHERE time_gran = :i_time_granularity
             AND ts_source = :i_ts_source
             AND diag_id   = :i_diag_id
             AND loc_id    = :i_loc_id
             AND tstmp_fr >= :i_startdate
             AND tstmp_fr <= :i_enddate
             AND mandt     = :client;
    SELECT COUNT(*) INTO row_cnt FROM :data;
    IF :row_cnt > 0
    THEN
        DELETE FROM "/DMF/UFC_TS" AS TS
        WHERE EXISTS (  SELECT * FROM :data AS DTS
                         WHERE TS.MANDT                = DTS.MANDT
                           AND TS.DIAG_ID              = DTS.DIAG_ID
                           AND TS.AGGR_PROFILE_ID      = DTS.AGGR_PROFILE_ID
                           AND TS.PROD_ID              = DTS.PROD_ID
                           AND TS.LOC_ID               = DTS.LOC_ID
                           AND TS.SALES_ORG_ID         = DTS.SALES_ORG_ID
                           AND TS.DISTR_CHNL_ID        = DTS.DISTR_CHNL_ID
                           AND TS.ORDER_CHNL_ID        = DTS.ORDER_CHNL_ID
                           AND TS.TS_SOURCE            = DTS.TS_SOURCE
                           AND TS.TIME_GRAN            = DTS.TIME_GRAN
                           AND TS.TSTMP_FR             = DTS.TSTMP_FR
                           AND TS.OFR_ID               = DTS.OFR_ID
                           AND TS.EXT_EVENT_ID         = DTS.EXT_EVENT_ID
        ) WITH HINT ( CS_ITAB_IN_SUBQUERY );
        e_deleted_count = ::ROWCOUNT;
    END IF;

But that attempt is also slow. It deletes ~ 5.279 Rows per Second.

leracka · ‎11-16-2023

I took a Packagecount of 20k Rows per Deletion with a COMMIT WORK right after.

leracka · ‎11-16-2023

5k* not 20k

Sandra_Rossi · ‎11-16-2023

As I said in my very first comment, "it will be slow anyway".

I'm unable to answer precisely because I still don't know the primary key of your table. If it's made of PROD_ID only, then I don't understand why you delete using many columns which will make it slower.

Also, if you want to discuss the performance, you should post the Execution Plan.

How to increase Performance with DELETE FROM TABLE?