0

# HANA Star Join in a Normalized OLTP System

Mar 28, 2017 at 10:47 PM

211

Former Member

Hi expert,

In a HANA system primarily intended for OLTP, usually the tables are normalized because that's needed for all the insert/update/delete operations. In such an environment to create OLAP models, from my point of view there could be 2 ways. Let me elaborate using an example, the fact comes from T1, T2 and T3 and Dimension 1 comes from T4 and T5 whilst Dimension 2 comes from T6 and T7. Now there are 2 ways to model it:

1. Use a star join where the data foundation is a Calculation View with T1 join T2 join T3, dimension 1 is a Calculation View with T4 join T5 and dimension 2 is another Calculation View with T6 and T7. This is essentially a virtual star schema. The fact and dimensions are not denormalized physically in the database.

2. Instead of a star join, just use a relational-styled modelling approach. (T1 join T2 join T3) left outer join T4 LOJ T5 LOJ T6 LOJ T7.

You see here HANA is not used as an EDW where data is denormalized physically, would you please suggest what's the difference between Option 1 and Option 2? I understand that the idea of star-schema is the denormalization of tables to facilitate SELECT operations and also putting the fact table centralized, but both Option 1 and Option 2 achieve the denormalization virtually and the left outer join to dimensions making fact as real facts, so what's the difference?

Thanks and kind regards,
James

Lars Breddemann
Mar 29, 2017 at 01:48 AM
0

I think by listing the two alternatives you are half-way towards an answer.

Having long join-chains is a two-fold problem for a star-schema query:

1. the query optimizer has to evaluate all possible join combinations to find the best order of joins. That can take a very long time.

2. The star schema joins all dimensions directly with the fact table, so you don't get join connections to the tables of other dimensions. In other words, you cannot possibly join "outside" of the star, but always only against the fact table. For the arbitrary join connections of a normalised data model, this is not true - the optimiser may choose a join path between any of the tables, if the data model permits it.

On top of that, the star schema forces all aggregation and grouping to happen on the fact table, while all filtering is done via selections on the dimension tables and later joining to the fact table.

For best performance, the assumption is that you want to aggregate and group the fact table records after the filtering has been done. With the standard 2-way joins and the star schema this is not possible, as the fact table is hit by the first join operation.

In contrast, the star join makes exactly this possible: collecting the filters on from the dimensions first and apply them together to the fact table, before aggregating and grouping it.

In addition, this approach also allows for simply leaving out joins against dimensions that have not been filtered and not been requested in the output.

Hope that makes it a bit clearer.

Share
Former Member Mar 29, 2017 at 04:44 AM
0

Hi Lars,

Many thanks for your explanation, so even if we are to build an EDW using HANA, I reckon it's not necessary to build a denormalized physical data model, is my understanding right?

Regards,
James

Show 3 Share

As usual, the answer is "it depends" as there always can be reasons to still go for a materialised approach. However, generally, I'd say: yep, that's it.

Just look at what BW 4/ON HANA does: most of the models are now based on the DSOs and not on InfoCubes with fact tables.

Former Member

to digress a bit...I'm looking forward to having an HANA equivalent of Oracle's CDC package so that we can have a delta management framework to rely upon, the ETL in HANA in inevitable after all for a materialised approach....

That's a fairly different topic, yes ... maybe worthwhile to have in another discussion. Meanwhile, if your question is answered and the mystery of the Star Join has been solved, how about accepting the answer and closing the question?