Skip to Content
Mar 10, 2017 at 03:48 PM

SAP HANA - Optimize a join [two columns store tables, OLTP use case]



I have an issue with inefficient execution plan of a SELECT joining two big tables.

  • Expected execution (CPU) time under 1ms (returns 1 row, no aggregation).
  • Actual time around 8ms => consumes 50% of my CPUs (it's OLTP use case => this query gets executed many times by many application threads).
select B.MY_VALUE
from A
join B on B.PK_COL_1 = A.COL1 and B.PK_COL_2 = ? and B.PK_COL_3 = A.COL2
group by B.MY_VALUE

Setup / context

  • 40 millions rows in A
  • 70 millions rows in B
  • Unique values in A.INDEXED_COLUMN + index (non-unique) upon the column.
  • Join on all three B's PK columns => returns 1 single row
  • There are just a few distinct values both in B.NON_SELECTIVE_COLUMN and B.PK_COL_2 => highly inefficient to search by these two columns.

Expected execution plan

  • Use A.INDEXED_COLUMN to find the ONE row in A table.
  • Nested loop - Use the values from table A to find the ONE row in table B by it's primary key.
  • As it processes and returns ONE single row, the execution should take much less than 1ms (true for ORACLE, MSSQL and HANA in ROW STORE).

Actual execution plan

  • The execution takes 8ms when both tables are columnar (AVG time taken from statistics in the HOST_SQL_PLAN_CACHE system view).
  • Actual execution plan is attached: query-planplv.txt (I had to rename it - .plv files are not allowed).
  • Hana Studio does not show the step which takes most time. I can find it manually in the XML but I don't understand the crypto-language used there :)
  • In the logical plan view, it seems that Hana decides to scan the B table for the values/filters given for the PK_COL_2 and NON_SELECTIVE_COLUMN columns - despite Hana expects this to yield over 16 millions of matching rows. Could this be the thing that takes the long time?

I have tried

  • Dozens of hints and their combinations, but with no luck.
  • I spent two days by searching and reading Hana-performance related blogs, SCN and documentation (Troubleshooting and Performance Analysis Guide and SAP Notes).

My questions

  1. Could you please help me to improve this query?
  2. What is the efficient execution plan for this query? I have already learned that there are many differences in the column store approach - so while a nested loop would be the best strategy in traditional databases, it may not be the case with column store...?
  3. Could you please help me to understand the actual execution plan: what is the time-consuming thing? Some hints to understand the meaning of the XML?
  4. All performance-related stuff I found was related to OLAP performance only. Are there any resources related to OLTP performance tuning?

Thanks, Ondrej


overview.png (13.5 kB)
plan.png (13.1 kB)
query-planplv.txt (39.9 kB)
query-planplv.txt (39.9 kB)