Skip to Content
avatar image
Former Member

Index usage in HANA

Hi

I'm learning HANA and I have a question about the usage of indexes and query plans. I use HANA SPS6 Developer Edition VM on Cloudshare for my tests.

I populated a column table with 15,000,000 of rows and I'm trying to test some queries. A simple example:

select * from "LEONID"."CDRs" where NUM_ORIG = '9054052020680';

This query returns one row, even though values in NUM_ORIG column (defined as VARCHAR(15)) doesn't have to be unique. The query consistently takes 15 ms. to finish.

Then, I create an index on NUM_ORIG column:

create index idx_CDRs_NUM_ORIG on "LEONID"."CDRs"(NUM_ORIG);

From INDEXES view, I can see that HANA chooses "INVERTED VALUE" index type for the index. Then, I run my query again, and it consistently returns in 1 ms. Apparently, the index get used because it is the only change that made between the tests. If I drop the index, the query response time returns to 15 ms.

To my great surprise, I don't find any mention of index usage in the query plan. Actually, the query plans for the query with and without indexes looks exactly the same. Is it an expected behavior? Should I monitor the index usage in other place? See the output from EXPLAIN_PLAN_TABLE below, I omitted some obvious information to make it slightly more compact:

OPERATOR_NAME,OPERATOR_DETAILS,TABLE_SIZE,OUTPUT_SIZE,SUBTREE_COST,OPERATOR_ID,PARENT_OPERATOR_ID,LEVEL,POSITION

"COLUMN SEARCH","CDRs.CDR_ID, CDRs.CC_ORIG, CDRs.AC_ORIG, CDRs.NUM_ORIG, CDRs.CC_DEST, CDRs.AC_DEST, CDRs.NUM_DEST, CDRs.STARTTIME, CDRs.ENDTIME, CDRs.DURATION (LATE MATERIALIZATION)",?,1,0.000008691042685,1,?,1,1

"  COLUMN TABLE","FILTER CONDITION: CDRs.NUM_ORIG = '9054052020680'",15000000,1,?,2,1,2,1

I tried to examine the query plan also using SAP HANA Studio, but still see the same plan with and without index and there is no mention of index usage in the plan.

Another related question: is there any best practices for indexing in HANA? I'm asking about it because in other SAP column-based RDBMS, Sybase IQ, the approach for indexing is quite different from traditional row-based RDBMSs, such as Sybase ASE, Oracle or SQL Server.

Thanks in advance

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

4 Answers

  • avatar image
    Former Member
    Sep 26, 2013 at 12:12 PM

    Thanks for quick responses. Unfortunately, answers provided so far don't answer my question. The concept of inverted index is clear to me. My question is why I don't see any mention of index usage in the query plan, even though the index is apparently used by my query.

    Add comment
    10|10000 characters needed characters exceeded

    • Leonid Gvirtz wrote:

      Lars, thank you very much for the information. I must admit that I'm very surprised about it, all RDBMSs that I have worked with until now has such a capability. I hope that SAP will fix the issue soon, otherwise query plans are not very useful.

      Well, lots of the information are available in the PlanViz and even if you today don't explicitly see the usage of an inverted index on column store tables, you can already clearly see where time is spend and how much data is moved between different processing steps (plan operations - POP).

      So it's not as bad as you put it and sure enough we're working on improving the options to analyze query execution.

      Leonid Gvirtz wrote:

      Allow me to ask some related questions:

      1. I'd like to understand the usage of optimizer statistics in HANA. Does HANA collects the optimizer statistics by itself? I see that CREATE STATISTICS statement does exist in HANA, so in which cases it should be used? I tried to run CREATE STATISTICS on some columns in my test table, but STATISTICS view is still empty. Is it possible to see optimizer estimations vs. actual values somehow?

      The CREATE STATISTICS command is there to provide statistic information for objects that can be used in a SQL query as a source table and that don't provide the information by themselves.

      Currently this applies to smart data data access tables as these tables not stored within SAP HANA.

      For row store and column store tables, the SAP HANA core data structures automatically provide the information required to perform query optimization (like # of distinct values, # of total records, min/max values etc.). No additional statistics collection is required for this.

      And sure enough you can check the estimation against the actual values.

      Look like you missed out on PlanViz quite a bit...

      Leonid Gvirtz wrote:

      2. What types of table joins are offered by HANA? In most of modern "traditional" databases, like SAP Sybase ASE, SAP SQL Anywhere, Oracle, SQL Server, nested-loop, hash and sort-merge joins are available, while SAP Sybase IQ offers also additional join types. I just can't find it in HANA documentation and it is not clear from query plans.

      Actually more join algorithms are available on each of those (bitmap join, merge join cartesian, semi-anti join... ) but the three you mentioned are clearly the most well known ones.

      Anyhow, most of those join algorithms aim at single thread record level execution and are not particularly cache conscious.

      In SAP HANA the join functions are implemented with massive parallel execution on in memory data structures in mind. And you're right, these algorithms have not been explained in the documentation (yet).

      Again, documentation is work in progress and for the 2 years of SAP HANA being generally available the set of documentation grew considerably.

      Looking back at the documentation of Oracle 7 or MS SQL Server 6.5 I don't find extensive explanations of join algorithms there either.

      Anyway, even with the short live span of SAP HANA there actually *is* lots of information available outside the official documentation documents.

      You may want to check the saphana.com site for this:

      JOIN: http://www.saphana.com/docs/DOC-3170

      PARALLEL JOIN: http://www.saphana.com/docs/DOC-3174

      oh and look:

      INDICES: http://www.saphana.com/docs/DOC-3169

      Leonid Gvirtz wrote:

      I can split my questions above to different discussion topics, if you like.

      Thanks in advance

      Leonid Gvirtz

      No worries about splitting the post.

      But now it's up to you to educate yourself. So have fun reading and watching the videos 😊

      Personally I find this thread has been answered and won't further engage in this one.

      For new, specific questions, feel free to open a new thread.

      - Lars

  • avatar image
    Former Member
    Sep 26, 2013 at 08:47 AM

    Hi,

    Check this,

    http://www.saphana.com/docs/DOC-3169

    http://www.saphana.com/community/blogs/blog/2011/09/15/any-attribute-as-an-index

    Syntax

    CREATE [UNIQUE] [BTREE | CPBTREE] INDEX <index_name> ON <table_name> (<column_name_order>, ...) [ASC | DESC]

    Syntax Elements


    <index_name> ::= [<schema_name>.]<identifier>  <column_name_order> ::= <column_name> [ASC | DESC] 

    Regards

    NK

    Add comment
    10|10000 characters needed characters exceeded

  • Sep 26, 2013 at 09:11 AM

    Hi Leonid,

    DO you know how Column Store works in HANA?

    If no, then you can learn more about it by attending OpenHPI sessio at:

    https://openhpi.de/course/imdb2013

    If you know then i guess you know about Attribute Vector and Value IDs.

    HANA creates Inverted Index which maps all record IDs to Value IDs.

    To monitor indexes, you can check INDEXES and INDEX_COLUMNS Views in SYS schema

    You can get information about logical index structure.

    Also check M_RS_INDEXES, you can get information for physical storage of indexes on rowstore columns.

    Regards,

    Vivek

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Dec 18, 2013 at 07:48 PM

    Hi all, this is an interesting discussion. Just wanted to jump in here with a little bit of feedback and some observations from the system I am working on.

    I did a similar test as Leonid regarding indices on a table with no keys, and had a volume of around 700,000 records.

    SELECT *

    FROM "MOLJUS02"."MARA"

    WHERE "MATNR" = 'TRM108274';

    First run time was about 75ms, but subsequent executions were around 8ms

    CREATE INDEX IDX_MOLJUS02_MARA_MATNR on "MOLJUS02"."MARA" ("MATNR");

    SELECT *

    FROM "MOLJUS02"."MARA"

    WHERE "MATNR" = 'TRM108274';

    First run time was about 75ms, but subsequent executions were around 8ms

    So what I observed is that the runtime was not improved at all after introducing an index.

    I decided to take this a step further to dive into a more detailed testing. My interest here is because in the system I am working on now, there are a lot of tables that are non SLT-delivered. These also happen not to have any primary keys defined, and therefore could not make use of the Inverted Index mentioned in this thread. In discussions with a colleague, it was indicated that introducing primary keys on the tables may benefit the runtimes of the models built on top of these.

    Aside from the bad design principle of not having keys defined to ensure data contstraints, I was trying to reach the conclusion to the answer of "do primary keys speed up queries?", as seen from a pure SELECT perspective as well as within an Analytic View (star schema) perspective.

    Purpose: Determine efficiencies gained in performance by using table with primary keys defined

    Hypothesis: Tables with primary keys will perform better with SELECT and JOIN operations due to the use of the inverted index.

    Testing methodology

    1. Create copies of two sets of tables, using SLT delivered tables as template. Remove the keys of the tables and insert the same data.
    2. Ensure that the tables are merged and inverted indices are present on the tables with keys
    3. Perform a SELECT test on table with single key choosing a single or IN list of materials and compare results
    4. Perform a SELECT test on table with multi key choosing a single column to apply filter on material and compare results
    5. Create attribute views on all 4 of the tables created in step 1, ensuring primary keys match to Key Attributes.
    6. Create a basic analytical view on a large fact that consumes the attribute views in step 5. All joins are defined as left outer with a n:1 relationship.
    7. Perform a SELECT test on the analytical view using highly aggregated columns from attribute views, forcing a join.
    8. Perform a SELECT test on the analytical view using the most detailed grain in the attribute views, forcing a join.
    9. Perform a SELECT test on the analytical view using the most detailed grain in the attribute views, forcing a join and applying a single MATERIAL filter to the dimension.
    10. Each test above is executed multiple times and average run times are shown.

    Table Sizes

    MARA (keys MANDT/MATNR) – 703,000

    KNVV (keys MADNT, KUNNR, VKORG, SPART, VTWEG) – 726,000

    CE12000 (Fact – many keys) – 1.1 billion records

    SELECT Results

    Query

    With Keys

    No Keys

    Variance from keys to non-keys

    MARA (single column key)

    6.6ms

    11ms

    +5.4ms (81%)

    KNVV (multi column key)

    4ms

    5ms

    +1ms (25%)

    KNVV (smaller query set)

    2ms

    3ms

    +1ms (33%)

    Model Results

    Dimension and query

    With Keys

    No Keys

    Variance from keys to non-keys

    MARA - aggregated

    784ms

    823ms

    +39ms (5%)

    MARA – detailed

    5,497ms

    5,427ms

    -70ms (~)

    KNVV – aggregated

    780ms

    750ms

    -30ms (3.85%)

    KNVV - detailed

    1,685ms

    1,682ms

    -2ms (~)

    MARA - filter applied 249ms 246ms -3ms (~)

    So analyzing what I have seen, the conclusion that I have come to is that the presence of a primary key and thus the inverted index actually had no effect on the runtimes when compared to the equivalent data structures that had no keys/indices defined.

    Any thoughts on these results or ways to point out how it could be done better? Ultimate goal is performance optimization, and right now this (keys and indices) doesn't look like a very effective angle for improving Analytic View performance.

    Regards,

    Justin

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member Lars Breddemann

      Thanks for the details Lars, much appreciated. I think I have to take a look back at the details around joins and select from the HPI course for some more info.

      One quick question here on the following statement

      "Inverted indexes really rather help with very specific filter conditions on large tables that happen often"

      Couldn't this mean the same as applying a filter on an attribute view (dimension)? Could this not speed up the reads on that specific table before moving into the join phase of an OLAP query?

      Regards and Happy New Year!

      Justin