cancel
Showing results for 
Search instead for 
Did you mean: 

[HDB] Avoid duplicate entries

Joseph_BERTHE
Active Contributor
0 Kudos

Hi Expert,

I'm quite new in SQL so sorry if my question is not so interresting...

I would avoir duplicate rows into a SAP HANA table, so I started to create a trigger after input on my table and then do a select into the same table to check a duplicate rows then do a rollback.

It seams that I cannot do a select from the current table and It is not allowed to do a rollback.

So my question is how can I do what I want to do.

Here is what I want to do :


create TRIGGER "NEO_6NR5ZT2KPZ3MK165C5CFGBBEF"."demo1"              

      after INSERT On "NEO_6NR5ZT2KPZ3MK165C5CFGBBEF"."T_IOT_697268A983FC91F50620_1"

      for each row 

      BEGIN

      DECLARE SAMPLE_COUNT INT;

  SELECT COUNT(*) INTO SAMPLE_COUNT FROM "NEO_6NR5ZT2KPZ3MK165C5CFGBBEF"."T_IOT_697268A983FC91F50620_1";

  IF :SAMPLE_COUNT = 0 then

  rollback;

  end if;

END;

Regards,

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

The normal way for avoiding duplicate keys is the creation of a primary key or unique index. Is this no option for you?

Joseph_BERTHE
Active Contributor
0 Kudos

Hello,

thanks for your answer. Unfortunatly it is not possible for me to have primary key because data comes from sensors which send the same data each 30secondes but the main values are not often changed. At thé end I have lot of data with thé same values excepted the timestamp.

Regards,

Former Member
0 Kudos

I still don't get it. If you want to accept duplicate data apart from the timestamp, you can also include the timestamp in the primary key.

Joseph_BERTHE
Active Contributor
0 Kudos

Let's make an example

I have a temperature sensor which fetch the current temperature (of course ). My Gateway read the data each 30s then send the data to SAP HANA IoT.

As you can imagine, the temperature has not been changed during that time (30s), so I have lot of useless data. I want to avoid them.

I can do it in my gateway server but It will make me lot of job whereas a simple trigger would do the job easily.

Table :

| TimeStamp | DeviceId | Value

Regards,

Former Member
0 Kudos

Thanks, now I understand better, so you want to avoid identical values with consecutive timestamps. I think the trigger approach is quite unusual and it would take me some time to do trouble-shooting. Perhaps you can think about a kind of staging table that is later on compressed into another table to exclude redundant information - similar to the E and F fact tables in SAP BW.

former_member185132
Active Contributor
0 Kudos

Use the SIGNAL command to block inserts. Here's the code I used, it blocks ALL inserts to the NUMBERS table. Wrap the signal statement within the IF block to prevent duplicates.


create TRIGGER test_trigger                

  after INSERT On numbers

    for each row   

    BEGIN 

         DECLARE duplicate CONDITION FOR SQL_ERROR_CODE 10001;

         SIGNAL duplicate SET MESSAGE_TEXT='Duplicate Entry';

END;

Joseph_BERTHE
Active Contributor
0 Kudos

Hum... create a duplicate table is not what I've expected on I don't know very well BW architecture so I guest the E and F is a kind of one table for trash and the other of good data.

I don't think I will implement this solution

But, why SAP HANA do not allow us to do a such thing whereas other database do it? It is not a strange requirement at all ?!

I will look into the gateway maybe the algorythme would be changed.

Joseph_BERTHE
Active Contributor
0 Kudos

Nice, but how can I detect a duplicate entry if I cannot do a self SELECT statement ?

former_member185132
Active Contributor
0 Kudos

You're right, a trigger isn't allowed to access the table being changed, even with a select query. Which means my previous response isn't going to work.

How about creating a compound unique constraint on this table with the timestamp field and any other fields, that might work.


ALTER TABLE tbl ADD UNIQUE (f1,f2);

Joseph_BERTHE
Active Contributor
0 Kudos

Hello,

I have already answered to this question above.

Regards,

former_member185132
Active Contributor
0 Kudos

Yes, I did read your previous comment about PKs, but I don't agree.

Let's take a step back that say you could write a logic to stop the duplicates were, in whatever way the word "duplicate" is defined in this context.

That logic would have to identify records having the same values for some combination of fields, and then prevent new records being created for those combinations by triggers or any other mechanism.


So if there is a set of fields that could allow triggers to detect duplication, then why would that same set of fields not be appropriate as a primary key or unique index? What would the trigger do that the DB doesn't automatically do when a PK/unique index is defined?

Former Member
0 Kudos

As already outlined above, Joseph wants to avoid consecutive duplicate values in terms of timestamp. This is something primary key or unique constraint can never support.

lbreddemann
Active Contributor
0 Kudos

Looking at your code sample and your description I don't think you want to prevent duplicates in the DB sense of the word.

What I understand is that you want to sort of "compress" subsequent reads that do not provide new data into a single row and only keep readings about changed temperature readings.

Is that right?

If so, then triggers (currently) don't allow that in SAP HANA.

Have you considered using the UPSERT command?

It replaces/updates existing records and simply inserts non-matching ones.

Something like this may work...

create column table readings (ts timestamp, meter_id int, temperature_celcius decimal (5,3))

insert into readings values (current_timestamp, 1, 23.041);

select * from readings;

/*

TS                      METER_ID    TEMPERATURE_CELCIUS

2015-06-24 07:55:27.251 1           23.041            

*/

insert into readings values (timestamp'2015-06-24 07:55:27.251', 2, 12.041)            

select * from readings;

/*

TS                      METER_ID    TEMPERATURE_CELCIUS

2015-06-24 07:55:27.251 1           23.041            

2015-06-24 07:55:27.251 2           12.041            

*/


upsert readings values (timestamp'2015-06-24 07:59:27.251', 2, 12.041)

                 where meter_id = 2 and temperature_celcius = 12.041;

                

select * from readings;   

/*

TS                      METER_ID    TEMPERATURE_CELCIUS

2015-06-24 07:55:27.251 1           23.041            

2015-06-24 07:59:27.251 2           12.041            

              -^

*/            

BTW: your guessing about the "crap" table in BW is not quite right. There's a much broader concept behind that, which allows to manage data visibility and loading on a loading unit level.

- Lars