Skip to Content
author's profile photo Former Member
Former Member

Analytic View Modeling: Using attribute on fact instead of attribute view?

Hi all, this discussion came up in this thread and I thought that it might deserve it's own thread. The entire idea revolves around attempting to expose the primary key of the attribute view from the fact table (data foundation) rather than the attribute view itself. Is this possible and does anyone have any practical experience with this? If yes, what is the behavior like in the front end tools?

Add two instances of the same field, one for use in join only

Connect the fact to the attribute view join field.

Hide the attribute view field, and only show the fact table field

Essentially the aim is to leverage the value of the referential join in an analytic view, only executing the join when it's absolutely needed. By definition, the referential join behaves as a left outer join in case no fields from the right table are selected, otherwise it's an inner join.

In most traditional DW implementations, there is the use of a surrogate key in order to map the attribute to the fact - therefore to get the natural key one would HAVE to execute the join. For example, to see the MATNR natural key on a given fact table, one would have to execute the join on a surrogate key num and retrieve the natural key, SQL like below.

SELECT B.MATNR, A.SUM(SALES) FROM

FACT_TABLE A INNER JOIN MATERIAL_TABLE B

ON A.MAT_KEY_NUM = B.MAT_KEY_NUM

Now in HANA, we are making all joins on a natural key - so on the fact table there is no surrogate key for MATNR, only the actual natural key values. So keeping the above thread and discussion in mind, the thought is that the referential join could be leverage here for scenarios where queries are being executed that only pull the natural key of a given dimension and no related attributes (from the right table). I see this as the normal execution path for a huge portion of use cases, where a user only wants to execute by MATNR (with no text or other attributes) and execute a SUM operation on the related fact measures. Here it seems that modeling the natural key on the fact to be exposed instead of the attribute would be extremely advantageous.

If fields from the right table (attribute view) are NOT selected, then the SQL would be similar to this (logical only - I'm sure HANA builds it quite differently 😉 )

SELECT MATNR, SUM(SALES) FROM

FACT_TABLE

If fields from the right table (attribute view) ARE selected, then the SQL would be similar to this (logical only - I'm sure HANA builds it quite differently 😉 )

SELECT B.MATNR, A.SUM(SALES) FROM

FACT_TABLE A INNER JOIN MATERIAL_TABLE B

ON A.MAT_KEY_NUM = B.MAT_KEY_NUM

So I am imagine scenarios where a user is selecting 5 natural keys (with no other attributes) and performing a SUM operation on a fact, in this case it would be possible to eliminate ALL joins and keep all operations within the fact itself.

Any thoughts?

Thanks,

Justin

Capture.JPG (24.8 kB)
Capture2.JPG (27.9 kB)
Capture4.JPG (21.7 kB)
Capture5.JPG (22.9 kB)
Add a comment
10|10000 characters needed characters exceeded

Assigned Tags

Related questions

2 Answers

  • Best Answer
    author's profile photo Former Member
    Former Member
    Posted on Aug 16, 2013 at 02:04 PM

    After some further investigation, it looks like the answer is pretty clear. To get this behavior to work (only executing joins when a non-joined attribute column is used), simply use a left outer join. The plan optimizer seems to be aware that the joined column has the natural key and no join is required when just using that column in a left outer.

    There should be no need to create the additional aliased fields for use in referential joins as discussed above except for the cases when an inner join is truly required (for referential join). Typically, fact to dimension joins (with SAP data) are going to be a left outer anyhow - not every row in COPA (for example) will have a Plant or Material populated - so an inner would 'drop' values from the measures where these are null - which is undesirable behavior in most cases.

    Anyhow, to the details on how I examined both approaches:

    - Created an analytical view with two attached attribute views

    - One was joined N:1 with a referential join

    - One was joined N:1 with a left outer join

    - Create statements to test 4 scenarios (use joined column, use attribute column for both the Left and Referential joins)

    -- SELECT only joined column from a Left Outer
    SELECT AT_BUSINESS_UNIT_MATNR, SUM(VVREV) FROM "_SYS_BIC"."copa poc/AN_COPA/olap"
    GROUP BY AT_BUSINESS_UNIT_MATNR

    No join to attribute view is executed, self only

    -- SELECT only joined column from a Referential
    SELECT MATNR, SUM(VVREV) FROM "_SYS_BIC"."copa-poc/AN_COPA/olap"
    GROUP BY MATNR

    Join is executed

    -- SELECT joined column + attribute from a Left Outer
    SELECT AT_BUSINESS_UNIT_MATNR, BUSINESS_UNIT, SUM(VVREV) FROM "_SYS_BIC"."copa-poc/AN_COPA/olap"
    GROUP BY AT_BUSINESS_UNIT_MATNR, BUSINESS_UNIT

    Join is executed for additional attribute on attribute view

    -- SELECT joined column + attribute from a Referential
    SELECT MATNR, MATKL, SUM(VVREV) FROM "_SYS_BIC"."copa-poc/AN_COPA/olap"
    GROUP BY MATNR, MATKL

    Two joins are executed for each column from attribute view

    So from here, it's pretty clear that the avoidance of joins when only the key attribute column is selected is achieved by user the left outer join. This looks to be a huge performance consideration within HANA as compared to typical DW architectures that rely on surrogate keys.

    Thanks,

    Justin


    Capture.JPG (52.2 kB)
    Capture1.JPG (37.4 kB)
    Capture2.JPG (42.6 kB)
    Capture3.JPG (51.7 kB)
    Capture4.JPG (65.6 kB)
    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member

      Just want to add that this came up in a recent conversation with a colleague and this was a huge selling point in model performance. The ability to hook a dimension into a fact, but only execute the join when the non-joined column is requested makes a whole lot of sense from a performance perspective.

      LO is usually preferable anyhow from a functionality perspective as well.

      Regards,

      Justin

  • Posted on Aug 13, 2013 at 02:24 PM

    Hi Justin,

    We have implemented following 2 approaches:

    1) Use the Attribute views and perform the joins as mentioned above.

    2) Don't use the attribute views, but perform the joins in the Data foundation itself. In this approach, you can join the data foundation "fact table" with the attribute view tables (attribute and that to the text table).

    The second approach is the typical approach used by SAP and you can see it when you import BW DSO to HANA to generate the Analytic view. Please note that you may not achieve the "re-usability" as provided by the Attribute view architecture.

    In both the approaches, the performance is optimized by HANA and the join path is defined by the HANA optimizer.

    The Front end tool used is a third party tool which interfaces with HANA over JDBC / ODBC interface with SQL statements fired against the HANA Information models.

    So far, the performance has been quite good for both the options.

    Regards,

    Ravi

    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member

      Just to clarify your approaches, comments in <brackets>

      1) Use the Attribute views and perform the joins as mentioned above.

      <this is the standard HANA approach for analytic views>

      2) Don't use the attribute views, but perform the joins in the Data foundation itself. In this approach, you can join the data foundation "fact table" with the attribute view tables (attribute and that to the text table).

      <I assume you don't mean "don't use the attribute view", rather "use the natural key from the fact RATHER than the attribute view", correct?>

      The Front end tool used is a third party tool which interfaces with HANA over JDBC / ODBC interface with SQL statements fired against the HANA Information models.

      <So when you access a given model from the tool, what MATNR would be presented? Both or only one? If only one, which one is shown? Or does your tool only fire SQL statements without any 'native' UI?>

      Thanks,

      Justin

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.