cancel
Showing results for 
Search instead for 
Did you mean: 

Index usage in HANA

Former Member
0 Kudos

Hi

I'm learning HANA and I have a question about the usage of indexes and query plans. I use HANA SPS6 Developer Edition VM on Cloudshare for my tests.

I populated a column table with 15,000,000 of rows and I'm trying to test some queries. A simple example:

select * from "LEONID"."CDRs" where NUM_ORIG = '9054052020680';

This query returns one row, even though values in NUM_ORIG column (defined as VARCHAR(15)) doesn't have to be unique. The query consistently takes 15 ms. to finish.

Then, I create an index on NUM_ORIG column:

create index idx_CDRs_NUM_ORIG on "LEONID"."CDRs"(NUM_ORIG);

From INDEXES view, I can see that HANA chooses "INVERTED VALUE" index type for the index. Then, I run my query again, and it consistently returns in 1 ms. Apparently, the index get used because it is the only change that made between the tests. If I drop the index, the query response time returns to 15 ms.

To my great surprise, I don't find any mention of index usage in the query plan. Actually, the query plans for the query with and without indexes looks exactly the same. Is it an expected behavior? Should I monitor the index usage in other place? See the output from EXPLAIN_PLAN_TABLE below, I omitted some obvious information to make it slightly more compact:

OPERATOR_NAME,OPERATOR_DETAILS,TABLE_SIZE,OUTPUT_SIZE,SUBTREE_COST,OPERATOR_ID,PARENT_OPERATOR_ID,LEVEL,POSITION

"COLUMN SEARCH","CDRs.CDR_ID, CDRs.CC_ORIG, CDRs.AC_ORIG, CDRs.NUM_ORIG, CDRs.CC_DEST, CDRs.AC_DEST, CDRs.NUM_DEST, CDRs.STARTTIME, CDRs.ENDTIME, CDRs.DURATION (LATE MATERIALIZATION)",?,1,0.000008691042685,1,?,1,1

"  COLUMN TABLE","FILTER CONDITION: CDRs.NUM_ORIG = '9054052020680'",15000000,1,?,2,1,2,1

I tried to examine the query plan also using SAP HANA Studio, but still see the same plan with and without index and there is no mention of index usage in the plan.

Another related question: is there any best practices for indexing in HANA? I'm asking about it because in other SAP column-based RDBMS, Sybase IQ, the approach for indexing is quite different from traditional row-based RDBMSs, such as Sybase ASE, Oracle or SQL Server.

Thanks in advance

Accepted Solutions (0)

Answers (4)

Answers (4)

Former Member

Hi,

Check this,

http://www.saphana.com/docs/DOC-3169

http://www.saphana.com/community/blogs/blog/2011/09/15/any-attribute-as-an-index

Syntax

CREATE [UNIQUE] [BTREE | CPBTREE] INDEX <index_name> ON <table_name> (<column_name_order>, ...) [ASC | DESC]

Syntax Elements


<index_name> ::= [<schema_name>.]<identifier>  <column_name_order> ::= <column_name> [ASC | DESC] 

Regards

NK

justin_molenaur2
Contributor
0 Kudos

Hi all, this is an interesting discussion. Just wanted to jump in here with a little bit of feedback and some observations from the system I am working on.

I did a similar test as Leonid regarding indices on a table with no keys, and had a volume of around 700,000 records.

SELECT *

FROM "MOLJUS02"."MARA"

WHERE "MATNR" = 'TRM108274';

First run time was about 75ms, but subsequent executions were around 8ms

CREATE INDEX IDX_MOLJUS02_MARA_MATNR on "MOLJUS02"."MARA" ("MATNR");

SELECT *

FROM "MOLJUS02"."MARA"

WHERE "MATNR" = 'TRM108274';

First run time was about 75ms, but subsequent executions were around 8ms

So what I observed is that the runtime was not improved at all after introducing an index.

I decided to take this a step further to dive into a more detailed testing. My interest here is because in the system I am working on now, there are a lot of tables that are non SLT-delivered. These also happen not to have any primary keys defined, and therefore could not make use of the Inverted Index mentioned in this thread. In discussions with a colleague, it was indicated that introducing primary keys on the tables may benefit the runtimes of the models built on top of these.

Aside from the bad design principle of not having keys defined to ensure data contstraints, I was trying to reach the conclusion to the answer of "do primary keys speed up queries?", as seen from a pure SELECT perspective as well as within an Analytic View (star schema) perspective.

Purpose: Determine efficiencies gained in performance by using table with primary keys defined

Hypothesis: Tables with primary keys will perform better with SELECT and JOIN operations due to the use of the inverted index.

Testing methodology

  1. Create copies of two sets of tables, using SLT delivered tables as template. Remove the keys of the tables and insert the same data.
  2. Ensure that the tables are merged and inverted indices are present on the tables with keys
  3. Perform a SELECT test on table with single key choosing a single or IN list of materials and compare results
  4. Perform a SELECT test on table with multi key choosing a single column to apply filter on material and compare results
  5. Create attribute views on all 4 of the tables created in step 1, ensuring primary keys match to Key Attributes.
  6. Create a basic analytical view on a large fact that consumes the attribute views in step 5. All joins are defined as left outer with a n:1 relationship.
  7. Perform a SELECT test on the analytical view using highly aggregated columns from attribute views, forcing a join.
  8. Perform a SELECT test on the analytical view using the most detailed grain in the attribute views, forcing a join.
  9. Perform a SELECT test on the analytical view using the most detailed grain in the attribute views, forcing a join and applying a single MATERIAL filter to the dimension.
  10. Each test above is executed multiple times and average run times are shown.

Table Sizes

MARA (keys MANDT/MATNR) – 703,000

KNVV (keys MADNT, KUNNR, VKORG, SPART, VTWEG) – 726,000

CE12000 (Fact – many keys) – 1.1 billion records

SELECT Results

Query

With Keys

No Keys

Variance from keys to non-keys

MARA (single column key)

6.6ms

11ms

+5.4ms (81%)

KNVV (multi column key)

4ms

5ms

+1ms (25%)

KNVV (smaller query set)

2ms

3ms

+1ms (33%)

Model Results

Dimension and query

With Keys

No Keys

Variance from keys to non-keys

MARA - aggregated

784ms

823ms

+39ms (5%)

MARA – detailed

5,497ms

5,427ms

-70ms (~)

KNVV – aggregated

780ms

750ms

-30ms (3.85%)

KNVV - detailed

1,685ms

1,682ms

-2ms (~)

MARA - filter applied249ms246ms-3ms (~)

So analyzing what I have seen, the conclusion that I have come to is that the presence of a primary key and thus the inverted index actually had no effect on the runtimes when compared to the equivalent data structures that had no keys/indices defined.

Any thoughts on these results or ways to point out how it could be done better? Ultimate goal is performance optimization, and right now this (keys and indices) doesn't look like a very effective angle for improving Analytic View performance.

Regards,

Justin

Former Member
0 Kudos

Hey Justin,

Unfortunately the use of indexes hasn't been well documented. Hoping and team will help with this in 2014. I don't have time for a detailed analysis right now, but here's what I found:

- Indexes will almost never help with Analytic Views because they run in the OLAP engine and are used for aggregations.

- Indexes can help with SQL, Attribute Views and Calc Views, in certain situations. This is situations where the default index on a column isn't well sorted to return the information you wanted.

The latter case is what's happening in 's case. If you do something similar and create a large table with individual document IDs, then an INDEX will help with SELECT * performance.

It can equally help where you have joins between two tables and are looking to find individual records. I have never found indexes to help with OLAP-style aggregations.

Hope this helps you.


John

lbreddemann
Active Contributor
0 Kudos

Hey there,

just real quickly...

- I am planning to put out something on indexes next year. Currently there's just no time for me to prepare this properly.

- It actually doesn't matter which engine accesses the table, as the engine doesn't determine the index usage. There is a thin functional layer around the actual columns (attribute engine - ok, it is something called 'engine'...) that takes requests for column data and decides per column, per request whether or not an index should be used for this access.

This is also the reason why the explain plan cannot tell you about upfront: the SQL layer doesn't know about the column store indexes.

Very roughly, indexes on column store tables can be useful, when

a) the table is really big and scanning takes a lot of time

b) the number of records we are looking for is relatively small

However, inverted indexes don't help with sorting at all - due to the necessary mix-up of at least one main and one delta store and the fact that the sort order of the whole table can change during compression optimization.

As mentioned above, I've something half ready on this topic and will try to push it out next year.

Cheers, Lars

lbreddemann
Active Contributor
0 Kudos

Hey Justin,

the thing here is: OLAP queries are the anti-use case for inverted indexes.

For OLAP queries, we want aggregation, volume scanning and dimension joining.

Inverted indexes really rather help with very specific filter conditions on large tables that happen often.

For the primary key definitions: don't think primary keys as performance enhancements. They are not. Primary key carry semantic meaning in your data model, they provide identity to each record.

- Lars

justin_molenaur2
Contributor
0 Kudos

Thanks for the details Lars, much appreciated. I think I have to take a look back at the details around joins and select from the HPI course for some more info.

One quick question here on the following statement

"Inverted indexes really rather help with very specific filter conditions on large tables that happen often"

Couldn't this mean the same as applying a filter on an attribute view (dimension)? Could this not speed up the reads on that specific table before moving into the join phase of an OLAP query?

Regards and Happy New Year!

Justin

Former Member
0 Kudos

Thanks for quick responses. Unfortunately, answers provided so far don't answer my question. The concept of inverted index is clear to me. My question is why I don't see any mention of index usage in the query plan, even though the index is apparently used by my query.

vivekbhoj
Active Contributor
0 Kudos

I guess Lars or Ravindra can help you better regarding no mention of index usage in the query plan

lbreddemann
Active Contributor
0 Kudos

Hi Leonid,

the usage of inverted indexes is currently not exposed in the explain plan or the plan visualization.

Developments on this are on the way though...

cheers, Lars

Former Member
0 Kudos

Lars, thank you very much for the information. I must admit that I'm very surprised about it, all RDBMSs that I have worked with until now has such a capability. I hope that SAP will fix the issue soon, otherwise query plans are not very useful.

Allow me to ask some related questions:

1. I'd like to understand the usage of optimizer statistics in HANA. Does HANA collects the optimizer statistics by itself? I see that CREATE STATISTICS statement does exist in HANA, so in which cases it should be used? I tried to run CREATE STATISTICS on some columns in my test table, but STATISTICS view is still empty. Is it possible to see optimizer estimations vs. actual values somehow?

2. What types of table joins are offered by HANA? In most of modern "traditional" databases, like SAP Sybase ASE, SAP SQL Anywhere, Oracle, SQL Server, nested-loop, hash and sort-merge joins are available, while SAP Sybase IQ offers also additional join types. I just can't find it in HANA documentation and it is not clear from query plans.

I can split my questions above to different discussion topics, if you like.

Thanks in advance

Leonid Gvirtz

lbreddemann
Active Contributor
0 Kudos

Leonid Gvirtz wrote:

Lars, thank you very much for the information. I must admit that I'm very surprised about it, all RDBMSs that I have worked with until now has such a capability. I hope that SAP will fix the issue soon, otherwise query plans are not very useful.

Well, lots of the information are available in the PlanViz and even if you today don't explicitly see the usage of an inverted index on column store tables, you can already clearly see where time is spend and how much data is moved between different processing steps (plan operations - POP).

So it's not as bad as you put it and sure enough we're working on improving the options to analyze query execution.

Leonid Gvirtz wrote:

Allow me to ask some related questions:

1. I'd like to understand the usage of optimizer statistics in HANA. Does HANA collects the optimizer statistics by itself? I see that CREATE STATISTICS statement does exist in HANA, so in which cases it should be used? I tried to run CREATE STATISTICS on some columns in my test table, but STATISTICS view is still empty. Is it possible to see optimizer estimations vs. actual values somehow?

The CREATE STATISTICS command is there to provide statistic information for objects that can be used in a SQL query as a source table and that don't provide the information by themselves.

Currently this applies to smart data data access tables as these tables not stored within SAP HANA.

For row store and column store tables, the SAP HANA core data structures automatically provide the information required to perform query optimization (like # of distinct values, # of total records, min/max values etc.). No additional statistics collection is required for this.

And sure enough you can check the estimation against the actual values.

Look like you missed out on PlanViz quite a bit...

Leonid Gvirtz wrote:

2. What types of table joins are offered by HANA? In most of modern "traditional" databases, like SAP Sybase ASE, SAP SQL Anywhere, Oracle, SQL Server, nested-loop, hash and sort-merge joins are available, while SAP Sybase IQ offers also additional join types. I just can't find it in HANA documentation and it is not clear from query plans.

Actually more join algorithms are available on each of those (bitmap join, merge join cartesian, semi-anti join... ) but the three you mentioned are clearly the most well known ones.

Anyhow, most of those join algorithms aim at single thread record level execution and are not particularly cache conscious.

In SAP HANA the join functions are implemented with massive parallel execution on in memory data structures in mind. And you're right, these algorithms have not been explained in the documentation (yet).

Again, documentation is work in progress and for the 2 years of SAP HANA being generally available the set of documentation grew considerably.

Looking back at the documentation of Oracle 7 or MS SQL Server 6.5 I don't find extensive explanations of join algorithms there either.

Anyway, even with the short live span of SAP HANA there actually *is* lots of information available outside the official documentation documents.

You may want to check the saphana.com site for this:

JOIN: http://www.saphana.com/docs/DOC-3170

PARALLEL JOIN: http://www.saphana.com/docs/DOC-3174

oh and look:

INDICES: http://www.saphana.com/docs/DOC-3169

Leonid Gvirtz wrote:

I can split my questions above to different discussion topics, if you like.

Thanks in advance

Leonid Gvirtz

No worries about splitting the post.

But now it's up to you to educate yourself. So have fun reading and watching the videos

Personally I find this thread has been answered and won't further engage in this one.

For new, specific questions, feel free to open a new thread.

- Lars

vivekbhoj
Active Contributor
0 Kudos

Hi Leonid,

DO you know how Column Store works in HANA?

If no, then you can learn more about it by attending OpenHPI sessio at:

https://openhpi.de/course/imdb2013

If you know then i guess you know about Attribute Vector and Value IDs.

HANA creates Inverted Index which maps all record IDs to Value IDs.

To monitor indexes, you can check INDEXES and INDEX_COLUMNS Views in SYS schema

You can get information about logical index structure.

Also check M_RS_INDEXES, you can get information for physical storage of indexes on rowstore columns.

Regards,

Vivek