Skip to Content
avatar image
Former Member

HANA Star Join in a Normalized OLTP System

Hi expert,

In a HANA system primarily intended for OLTP, usually the tables are normalized because that's needed for all the insert/update/delete operations. In such an environment to create OLAP models, from my point of view there could be 2 ways. Let me elaborate using an example, the fact comes from T1, T2 and T3 and Dimension 1 comes from T4 and T5 whilst Dimension 2 comes from T6 and T7. Now there are 2 ways to model it:

1. Use a star join where the data foundation is a Calculation View with T1 join T2 join T3, dimension 1 is a Calculation View with T4 join T5 and dimension 2 is another Calculation View with T6 and T7. This is essentially a virtual star schema. The fact and dimensions are not denormalized physically in the database.

2. Instead of a star join, just use a relational-styled modelling approach. (T1 join T2 join T3) left outer join T4 LOJ T5 LOJ T6 LOJ T7.

You see here HANA is not used as an EDW where data is denormalized physically, would you please suggest what's the difference between Option 1 and Option 2? I understand that the idea of star-schema is the denormalization of tables to facilitate SELECT operations and also putting the fact table centralized, but both Option 1 and Option 2 achieve the denormalization virtually and the left outer join to dimensions making fact as real facts, so what's the difference?

Thanks and kind regards,

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

2 Answers

  • Best Answer
    Mar 29, 2017 at 01:48 AM

    I think by listing the two alternatives you are half-way towards an answer.

    Having long join-chains is a two-fold problem for a star-schema query:

    1. the query optimizer has to evaluate all possible join combinations to find the best order of joins. That can take a very long time.

    2. The star schema joins all dimensions directly with the fact table, so you don't get join connections to the tables of other dimensions. In other words, you cannot possibly join "outside" of the star, but always only against the fact table. For the arbitrary join connections of a normalised data model, this is not true - the optimiser may choose a join path between any of the tables, if the data model permits it.

    On top of that, the star schema forces all aggregation and grouping to happen on the fact table, while all filtering is done via selections on the dimension tables and later joining to the fact table.

    For best performance, the assumption is that you want to aggregate and group the fact table records after the filtering has been done. With the standard 2-way joins and the star schema this is not possible, as the fact table is hit by the first join operation.

    In contrast, the star join makes exactly this possible: collecting the filters on from the dimensions first and apply them together to the fact table, before aggregating and grouping it.

    In addition, this approach also allows for simply leaving out joins against dimensions that have not been filtered and not been requested in the output.

    Hope that makes it a bit clearer.

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Mar 29, 2017 at 04:44 AM

    Hi Lars,

    Many thanks for your explanation, so even if we are to build an EDW using HANA, I reckon it's not necessary to build a denormalized physical data model, is my understanding right?


    Add comment
    10|10000 characters needed characters exceeded

    • That's a fairly different topic, yes ... maybe worthwhile to have in another discussion. Meanwhile, if your question is answered and the mystery of the Star Join has been solved, how about accepting the answer and closing the question?