Index usage in HANA

Former Member · ‎09-26-2013

Hi

I'm learning HANA and I have a question about the usage of indexes and query plans. I use HANA SPS6 Developer Edition VM on Cloudshare for my tests.

I populated a column table with 15,000,000 of rows and I'm trying to test some queries. A simple example:

select * from "LEONID"."CDRs" where NUM_ORIG = '9054052020680';

This query returns one row, even though values in NUM_ORIG column (defined as VARCHAR(15)) doesn't have to be unique. The query consistently takes 15 ms. to finish.

Then, I create an index on NUM_ORIG column:

create index idx_CDRs_NUM_ORIG on "LEONID"."CDRs"(NUM_ORIG);

From INDEXES view, I can see that HANA chooses "INVERTED VALUE" index type for the index. Then, I run my query again, and it consistently returns in 1 ms. Apparently, the index get used because it is the only change that made between the tests. If I drop the index, the query response time returns to 15 ms.

To my great surprise, I don't find any mention of index usage in the query plan. Actually, the query plans for the query with and without indexes looks exactly the same. Is it an expected behavior? Should I monitor the index usage in other place? See the output from EXPLAIN_PLAN_TABLE below, I omitted some obvious information to make it slightly more compact:

OPERATOR_NAME,OPERATOR_DETAILS,TABLE_SIZE,OUTPUT_SIZE,SUBTREE_COST,OPERATOR_ID,PARENT_OPERATOR_ID,LEVEL,POSITION

"COLUMN SEARCH","CDRs.CDR_ID, CDRs.CC_ORIG, CDRs.AC_ORIG, CDRs.NUM_ORIG, CDRs.CC_DEST, CDRs.AC_DEST, CDRs.NUM_DEST, CDRs.STARTTIME, CDRs.ENDTIME, CDRs.DURATION (LATE MATERIALIZATION)",?,1,0.000008691042685,1,?,1,1

" COLUMN TABLE","FILTER CONDITION: CDRs.NUM_ORIG = '9054052020680'",15000000,1,?,2,1,2,1

I tried to examine the query plan also using SAP HANA Studio, but still see the same plan with and without index and there is no mention of index usage in the plan.

Another related question: is there any best practices for indexing in HANA? I'm asking about it because in other SAP column-based RDBMS, Sybase IQ, the approach for indexing is quite different from traditional row-based RDBMSs, such as Sybase ASE, Oracle or SQL Server.

Thanks in advance

Former Member · ‎09-26-2013

Hi,

Check this,

http://www.saphana.com/docs/DOC-3169

http://www.saphana.com/community/blogs/blog/2011/09/15/any-attribute-as-an-index

Syntax

CREATE [UNIQUE] [BTREE | CPBTREE] INDEX <index_name> ON <table_name> (<column_name_order>, ...) [ASC | DESC]

Syntax Elements

<index_name> ::= [<schema_name>.]<identifier> <column_name_order> ::= <column_name> [ASC | DESC]

Regards

NK

justin_molenaur2 · ‎12-18-2013

Hi all, this is an interesting discussion. Just wanted to jump in here with a little bit of feedback and some observations from the system I am working on.

I did a similar test as Leonid regarding indices on a table with no keys, and had a volume of around 700,000 records.

SELECT *

FROM "MOLJUS02"."MARA"

WHERE "MATNR" = 'TRM108274';

First run time was about 75ms, but subsequent executions were around 8ms

CREATE INDEX IDX_MOLJUS02_MARA_MATNR on "MOLJUS02"."MARA" ("MATNR");

SELECT *

FROM "MOLJUS02"."MARA"

WHERE "MATNR" = 'TRM108274';

First run time was about 75ms, but subsequent executions were around 8ms

So what I observed is that the runtime was not improved at all after introducing an index.

I decided to take this a step further to dive into a more detailed testing. My interest here is because in the system I am working on now, there are a lot of tables that are non SLT-delivered. These also happen not to have any primary keys defined, and therefore could not make use of the Inverted Index mentioned in this thread. In discussions with a colleague, it was indicated that introducing primary keys on the tables may benefit the runtimes of the models built on top of these.

Aside from the bad design principle of not having keys defined to ensure data contstraints, I was trying to reach the conclusion to the answer of "do primary keys speed up queries?", as seen from a pure SELECT perspective as well as within an Analytic View (star schema) perspective.

Purpose: Determine efficiencies gained in performance by using table with primary keys defined

Hypothesis: Tables with primary keys will perform better with SELECT and JOIN operations due to the use of the inverted index.

Testing methodology

Create copies of two sets of tables, using SLT delivered tables as template. Remove the keys of the tables and insert the same data.
Ensure that the tables are merged and inverted indices are present on the tables with keys
Perform a SELECT test on table with single key choosing a single or IN list of materials and compare results
Perform a SELECT test on table with multi key choosing a single column to apply filter on material and compare results
Create attribute views on all 4 of the tables created in step 1, ensuring primary keys match to Key Attributes.
Create a basic analytical view on a large fact that consumes the attribute views in step 5. All joins are defined as left outer with a n:1 relationship.
Perform a SELECT test on the analytical view using highly aggregated columns from attribute views, forcing a join.
Perform a SELECT test on the analytical view using the most detailed grain in the attribute views, forcing a join.
Perform a SELECT test on the analytical view using the most detailed grain in the attribute views, forcing a join and applying a single MATERIAL filter to the dimension.
Each test above is executed multiple times and average run times are shown.

Table Sizes

MARA (keys MANDT/MATNR) – 703,000

KNVV (keys MADNT, KUNNR, VKORG, SPART, VTWEG) – 726,000

CE12000 (Fact – many keys) – 1.1 billion records

SELECT Results

Query	With Keys	No Keys	Variance from keys to non-keys
MARA (single column key)	6.6ms	11ms	+5.4ms (81%)
KNVV (multi column key)	4ms	5ms	+1ms (25%)
KNVV (smaller query set)	2ms	3ms	+1ms (33%)

Model Results

Dimension and query	With Keys	No Keys	Variance from keys to non-keys
MARA - aggregated	784ms	823ms	+39ms (5%)
MARA – detailed	5,497ms	5,427ms	-70ms (~)
KNVV – aggregated	780ms	750ms	-30ms (3.85%)
KNVV - detailed	1,685ms	1,682ms	-2ms (~)
MARA - filter applied	249ms	246ms	-3ms (~)

So analyzing what I have seen, the conclusion that I have come to is that the presence of a primary key and thus the inverted index actually had no effect on the runtimes when compared to the equivalent data structures that had no keys/indices defined.

Any thoughts on these results or ways to point out how it could be done better? Ultimate goal is performance optimization, and right now this (keys and indices) doesn't look like a very effective angle for improving Analytic View performance.

Regards,

Justin

Former Member · ‎09-26-2013

Thanks for quick responses. Unfortunately, answers provided so far don't answer my question. The concept of inverted index is clear to me. My question is why I don't see any mention of index usage in the query plan, even though the index is apparently used by my query.

vivekbhoj · ‎09-26-2013

Hi Leonid,

DO you know how Column Store works in HANA?

If no, then you can learn more about it by attending OpenHPI sessio at:

https://openhpi.de/course/imdb2013

If you know then i guess you know about Attribute Vector and Value IDs.

HANA creates Inverted Index which maps all record IDs to Value IDs.

To monitor indexes, you can check INDEXES and INDEX_COLUMNS Views in SYS schema

You can get information about logical index structure.

Also check M_RS_INDEXES, you can get information for physical storage of indexes on rowstore columns.

Regards,

Vivek

Index usage in HANA

Accepted Solutions (0)

Answers (4)

Answers (4)

Re: Customize body mail for medium type 5 external...

Re: Combine Models in New Model

Re: CIG Master Data Full Load - Success but didn't...

Re: CIG Master Data Full Load - Success but didn't...

Group Reporting Rule Based COI - TP calculation