Solved: Why does select statement taking too long when pri...

former_member655064 · ‎02-06-2020

Hi everyone!

I can not figure out why my sql statement is taking too long when the primary key field is present.

This slow is reflecting in others artifacts such as my calculation views. Perhaps there is misunderstanding on my part about how to use/modeling this artificat. I do appreciate if you could to explain.

lbreddemann · ‎02-06-2020

——- Updated answer (after actually testing my ideas)

TL;DR

In this case the difference in runtime comes from the unloading and reloading of the very large data structure that makes up the primary key column.

Because the primary key column has only unique values, it cannot be well compressed and ends up being very large compared to the other columns. HANA’s memory management tries to avoid out-of-memory situations by freeing up memory early and removes the big primary key column data from memory right after using it.

The next time a query needs the data it has to be reloaded into memory - that’s what’s taking a long time (few seconds in this case). Changing the “unloading” behaviour to keep the data in memory makes the query with primary key as fast as the one without.

A more verbose version of this can be found here: https://lbreddemann.org/one-in-a-million/

——-

Ok, with just the query text and the effect you observed there is not much to go on and analyze. I would recommend using EXPLAIN PLAN and PLANVIZ as usual to get some more insights.

However, I have a suspicion here:

Could it be that the three other columns (not the ID column) contain quite a lot of repeating data?

If so, then it is not all that surprising to see the difference in runtime.
See, when HANA returns records it tries to work with the internal representation of actual values (value IDs) as long as possible. That is because in many cases, these value IDs are much smaller and allow processing with SIMD instructions easily.
Only when the actual value of a record field is absolutely required, e.g. when it should be handed to the client, it is filled in. That process is called "materialization" and HANA tries to employ a "late materialization" approach.

So far, so good.

Now, you run your query with columns that each have say 100.000 different values.
That's 300.000 different values HANA has to keep in reach to materialize all result records. Too easy, that's quick.

Next, you run the same query but include the ID column, which by definition has a different value for every single entry. Here HANA has to keep 300.000 + 88Mio (or whatever yor table count was) values ready to materialize the records.

We can assume that the lookup of actual values via the value ID is done by a dictionary lookup (i.e. via HASH function -> O (1)), but looking up 88Mio records is still two orders of magnitude more effort than 300K.

On top of that is, of course, the additional memory that is required to keep the 88 Mio uncompressible values of the primary key.

To be clear here, the important bit is that the primary key contains only unique values. This effect would show with any field that only contains unique values if the table is large.

At this point, you might say: "But I only take 1000 records".

That's true, but the definition of TOP and LIMIT requires that these options are applied to the final result set which obviously has to be completely computed before.

So, what can you do here?

Reduce the number of records that are processed. Put in a filter condition that makes sense for your use case.

michael_piesche · ‎02-06-2020

Have you compared this behaviour against other large tables? After all, you have almost 100 Million entries! Did you try out the limit command at the end instead of the top after the select?

Did you check the records that are retrieved by each of the two calls? Because the only thing I could think of, is that the TOP 1000 with the key, gives you the very first 1000 entries with the lowest ids, whereas the TOP 1000 without the key, gives you maybe 'random' 1000 entries that are easier to fetch than those with primary key. I couldnt find any info though about this assumption with TOP n and no key selected.

agentry_src · ‎02-06-2020

Two thoughts, one is whether the table is indexed. Possibly it should be. How many fields are in the table and is it organized by rows or columns (should be columnar for best use of HANA'S capabilities). Second is try adding an order by ID clause when running the query.

As mentioned above, there are almost 100 million records which is a bit excessive. In light of that, what is your database admin (DBA) doing for archiving records. Despite it being HANA, there are still optimization tasks which should be performed to keep the database running efficiently.

Cheers, Mike

Why does select statement taking too long when primary key is present?

Accepted Solutions (1)

Accepted Solutions (1)

Answers (2)

Answers (2)

Iterating through JSONModel with multiple nested a...

Re: Connecting SAP CAP with SAP Cloud ALM

How to use CopyProvider on Table created by TypeSc...

How to execute/fire onPost when user press enter o...

Re: Connecting SAP CAP with SAP Cloud ALM