on 05-23-2016 9:34 AM
I am new to SAP HANA and came across several issues when handling large tables (>2 billion rows).
As far as I understood, such tables need to be partitioned when written but apart from that on does not need to bother. Therefore I am quite surprised to run constantly into trouble when querying large tables. Regardless of performance-reasons to split such large tables, I am wondering whether there is really no other option.
To be more specific, I am working in the following setup:
- A table "T" with primary key "a" and roughly 2.2 billion rows
- A table "S" with several million rows
- A decomposition of "T" into "T1 and "T2" of similar size such that T=T1 union T2
On these tables the following statements fail:
1.
select T.a, S.a
from T
inner join S
on T.a=S.a
--> SAP DBTech JDBC: [2048]: column store error: search table error: [2598] column search intermediate result exceeds 2 billion rows limitation
2.
select b, count(a)
from (
select a, b from T1
union
select a, b from T2
);
--> SAP DBTech JDBC: [2048]: column store error: search table error: [34104] Intermediate result is too large in CalculationEngine.
3.
create column table MYTABLE as (
select a from T
) partition by roundrobin partitions 16;
--> [129]: transaction rolled back by an internal error: exception 10001001: Search result size limit exceeded: 2236482994
In general, it appears that large tables (>2 billion rows) can be handled whenever they are already partitioned and written. In contrast, when they are created during a query (e.g. by an outer join or a union), HANA mostly terminates with an errror.
Is this behaviour intended by HANA?
Is there an explicit rule what HANA can handle?
Are there hints or other routines to overcome the described issues?
I really appreciate any clarifications.
Cheers,
Mapkyc
Please refer to OSS note 2154870 - How-To: Understanding and defining SAP HANA Limitations
Some of the scenarios shown by you are listed in this note.
Thanks,
Suresh
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for pointing me to the "Limitations"-Note.
It doesn't really improve my understanding of the issue though. The note paritcularly says:
In very specific scenarios limits for (intermediate) result sets exist, e.g. 2147483648 (2 billion) or 4294967296 (4 billion). If a limit is hit, an error like the following is issued:
search table error: [2598] column search intermediate result exceeds 2 billion rows limitation search table error: [2999] Docid sequence with more that 4 billion elements not supported
search table error: [2724] Olap temporary data size exceeded 31/32 bit limit
exception 10001001: Search result size limit exceeded: <result_size>
In order to prevent these errors, you have to optimize the underlying database requests (see SAP Note 2000002).
This is really vague. What are "very specific scenarios"? And what does "optimize the underlying database requests" generally mean?
Hi Houssem,
it seems that although you can create tables with more then 2bn rows (by partitioning), HANA is in general simply not capable of working with those. This holds also for intermediate results which exceed 2bn rows. You will find queries that work on such tables but I couldn't find a characterisation when it does work.
Accordingly, the only solution that I found is to break up my queries into several smaller ones. Which can become very cumbersome... 😞
Cheers,
mapkyc
The limitations around 2 billion records per table partition as well as the limitations around the intermediate result set sizes have been documented and explained quite extensively.
When you write you don't understand what it means to optimise the database requests in order to avoid data processing situations where these limits are hit, what is it that you don't get?
Looking at your queries, I wonder whether these are just general "try out" queries or if there is an actual application use linked to them.
For example:
select b, count(a)
from (
select a, b from T1
union
select a, b from T2
);
Is a pretty odd statement in terms of semantics.
You want a count of a per group by b over the distinct results of the union.
This could be easily rewritten to
SELECT b, sum(cnt_a) FROM (
select b, count(a) cnt_a from T1
UNION ALL
select b, count(a) cnt_a from T2)
Which - as long as B is not the 2 billion unique key, but something that actually leads to some record aggregation - would avoid the otherwise necessary large intermediate result set required for the original UNION.
Having written that now, I'm eager to learn about the many use cases where you have to break up your queries in a cumbersome fashion.
After nearly 6 years of working with SAP HANA most of the cases I've seen where these limitations come up where either test-scenarios (like this one) or naive data model implementations.
Hello,
I have seen this issue multiple times.
Truncating the table will solve the issue.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
as you already mentioned, these tables need to be partitioned on writing. Could you please try to ctreate the structure of the table, partition it and re -load it with the data.
Regards
-Seb.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for your message, Seb.
Not sure, what you mean. Statements 1 and 2 do not refer to writing the table so I guess you refer to statement 3.
When I first create a partitioned structure of the table, I can fill it with more than 2 billion rows. But only if this is done in several steps. I.e. the following works:
create column table MYTAB (
...
) partition by roundrobin partitions 8;
insert into MYTAB (
select * from T1
);
insert into MYTAB (
select * from T2
);
But not:
create column table MYTAB (
...
) partition by roundrobin partitions 8;
insert into MYTAB (
select * from T
);
mapkyc
User | Count |
---|---|
88 | |
23 | |
11 | |
9 | |
8 | |
5 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.