cancel
Showing results for 
Search instead for 
Did you mean: 

Duplicates on SELECT DISTINCT statement in HANA DB

roberto_vacca2
Active Contributor
0 Kudos

Hello everyone,

I'm trying to figure out if the DISTINCT statement, with a list of fields, on HANA DB without the ORDER BY statement in the SELECT, may be source of inconsistencies in the uploaded data.

In HANA data they are loaded into memory, but you can know whether it is technically possible that duplicate records are loaded, In the face of standard code that uses the DISTINCT instructions without any sort really?.

After upgrade, Standard code seems to use DISTINCT statements without any ORDER BY instruction.

Thanks for the attention, regards

horst_keller
Product and Topic Expert
Product and Topic Expert
0 Kudos

"Someone Told us That in ABAP SELECT DISTINCT without the ORDER BY clause, may cause duplication of records."

Who told you so? In Open SQL, DISTINCT compares the whole line and there is no need for ORDER BY. It is the business of the database how to achieve the result.

You seem to mix up DISTINCT with other ORDER BY issues. As a rule, the sequence of rows in the result set delivered by the DB is undefined if there is no ORDER BY. This is true especially for HANA. But pool or cluster tables had a sort order. Though never documented, people relied on that. After depooling and declustering on HANA, such programs must be adjusted by adding ORDER BYs.

Accepted Solutions (1)

Accepted Solutions (1)

horst_keller
Product and Topic Expert
Product and Topic Expert

If you are talking about Open SQL, the behavior of DISTINCT is described here.

If you are talking about HANA SQL, the behavior of DISTINCT is described here.

There's no dependency from the ORDER BY clause mentioned there.

The SAP HANA Database offers an SQL interface to its relational databases that is generally independent from HANA's internal memory management and which behaves like standard SQL for the most part.

roberto_vacca2
Active Contributor

Thanks,

Someone Told us That in HANA, the ABAP SELECT DISTINCT without the ORDER BY clause, may cause duplication of records. So is it necessary? We could may think that this is true if data is in memory and ordered by casual.

I'm talking about SELECT DISTINCT -abap instruction. standards like SQL, as you say, it should check duplicates in the selection, deleting adjacent duplicates. As the documentation say: "That DISTINCT Specifies only one copy of each set of duplicate records selected Should be returned".

I saw a lot of instructions in code Standard That do not use the ORDER BY clause.

I hope I've been clear enough. Let me know your ideas.

Thanks a lot Bye

horst_keller
Product and Topic Expert
Product and Topic Expert

"Someone Told us That in HANA, the ABAP SELECT DISTINCT without the ORDER BY clause, may cause duplication of records."

Who told you so? It is not true. The databases sort themselves if DISTINCT is given. No ORDER BY necessary. For DISTINCT, the complete rows of the result set as defined in the SELECT list are taken into account.

You need ORDER BY in order to receive a defined sequence of the result set. Some people relied on an implicit ordering for pool and cluster tables that gets lost after depooling and declustering on HANA. Maybe you mix that up.

roberto_vacca2
Active Contributor
0 Kudos

Thanks Horst.

I was quite sure that "DISTINCT" can't betray its meaning 🙂 Someone in the consultant's world.

I was talking about standard table, not cluster tables necessary.

ORDER BY is only for the final output as far as I know.

Thanks for your advice.

I was wondering about an other thing:

We've just upgraded on HANA DB and we're looking out from ST12 (tracing transactions), that Net Time for an instruction like this, is consuming 3-4 times more than in Oracle:

HANA SYSTEM:

ORACLE SYSTEM

SELECT SINGLE maktx INTO fh_diba-cdescr FROM makt

WHERE matnr EQ fh_diba-matnr AND spras EQ sy-langu.

We can't figure out how is this possible. I've seen that indexes are not available anymore like MAKT~M, but I'm confident that's not the question.

I think there's something more at DB level needing some adjustment.

Thanks a lot for your advices.

Bye

horst_keller
Product and Topic Expert
Product and Topic Expert

SAP HANA is not made for SELECT SINGLE, even Hasso says so ...

roberto_vacca2
Active Contributor

Hi Horst, Thanks for reply.

HANA is not designed for the ABAP ECC Oracle way and I Imagine that HANA Paradigm is different, but every company that comes from ECC 6 it will have probably thousand of "SELECT SINGLE" instructions, even in standard code itself. This means a big change in programming approach and a big rework. This is Not a really save-time

We could expect the same or less computational time , but not 3 times more. I need to investigate more on hardware of Oracle machine and Hana machine, but if the hardware is the same, this situation will take us to a bottleneck.

So what is the best alternative to a SELECT SINGLE in this case?

Could you tell me your ideas about this?

Thanks, I appreciate!

Bye

horst_keller
Product and Topic Expert
Product and Topic Expert
roberto_vacca2
Active Contributor
0 Kudos

Hi Horst, we developed a report to get some statistic using best golden rules and we get same result in time of before.

Every kind of selection with many rows (like SELECT - ENDSELECT, SELECT * , SELECT FOR ALL ENTRIES, OPEN CURSOR with FETCH, JOIN, etc..), is much longer than before in Oracle. We're trying to find out if it's a problem related to DB Proxy access time .

I verified that the quantity of records needs same percentage of time in HANA and ORACLE, (double perc. in the first selections and quite the same perc. in the end) but the time to read data is much more expensive.

Do you know how to check this?

Thanks a lot Roberto

Answers (1)

Answers (1)

Please check if the mandt (client) is used in the select qwery.

Best Regards,

Akash

roberto_vacca2
Active Contributor
0 Kudos

Hi

In the ABAP sentence, MANDT is defined by compiler.

Anyway i've never used CLIENT SPECIFIED clause in select statement.

Thanks.

0 Kudos

Hi,

I was talking about HANA development (native) not from ABAP end.

I was thinking if you are using native query n getting multiple results that might be because of data from different client. So i suggested. Anyways i think you are getting many replies so hopefully you will get your solution.

BR,

Akash