Skip to Content
0

HANA calculation view with heavy joins and calculated columns (performance)

May 26, 2017 at 08:33 PM

1.2k

avatar image
Former Member

Hi experts,

Could you please take a look at the problem described below?

Calculation view description (please also see an image):

Projection 6 gets the data from a table that contains roughly 700,000,000 records and grows by roughly 100,000,000 a month. Same thing applies to Projection 7 – it gets the data from a different table, but that table is 1:1 with the table that feeds Projection 6. Projection 8 gets the data from even larger table. All tables are SAP standard tables. Joins that come after the union only add some master data and texts. They don’t affect the performance drastically and can be ignored.

All the necessary filters are applied to decrease the data volumes. Users have an option to filter on calendar month period (for example, start: 201701, end: 201704).

Problem:

All tables feeding projections 6,7 and 8 are very different in terms of their structure. That’s why I can’t use union – joins are required. Also aggregations are required before Union 1 to combine data from Projection 4 and Join 9. And with such data volumes we’re experiencing performance issues: slow runtimes, large memory consumption and also CPU consumption. I don’t really know the reason behind the performance issue, but I have several assumptions:

1) Join 9 contains a calculation that gets a date (VARCHAR of length 8) from a timestamp (DECIMAL field of length 14 that stores date and time, for example, like this 20170526112351). I need it to be turned into the date to aggregate the data a before doing further joins – that increases the performance. I can also perform this calculation in Projection 6 (that’s where the timestamp comes from), but it doesn’t really change the situation – the calculation increases memory consumption. The formula looks like this: leftstr(string("ACTIVITY_DATE"),8).

2)The reason for performance issues is not this conversion but rather because of the fact that tables from projections 6,7 and 8 are partitioned and it somehow affects the performance of joins?

Questions:

If the assumption 1) is correct, is there a more efficient way of converting this timestamp (decimal) to date (varchar)?

Can assumption 2) be correct at all?

Maybe there’s some other potential reason that I’m missing here?

capture.jpg (54.2 kB)
10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

2 Answers

avatar image
Former Member May 31, 2017 at 04:04 PM
0

Hi Vlad,

What is your HANA system version?

Have you tried to run Visualize Plan for this view? What is current runtime of the view?

1) Assumption 1 can be verified by creating the logic as 'Generated Column' at table level and see if performance changes.

2) If partition is causing performance issue, this can be verified by checking network transfer time in 'Visualize Plan' of the view.

How are there user filters created in view(Variable/Input Parameter) and where? Is it possible to put this filter at lowest project as input parameter (may be at projection 6)?

Regards,

Venkat

Show 1 Share
10 |10000 characters needed characters left characters exceeded
Former Member

Hi,

Thank you for the reply.

HANA database release is 1.00.122.07.

I have tried VizPlan. The view can run for several minutes.

1) Could you please give more background about 'Generated Columns' (some usful links would be very much appreciated)? Is it something that I can create for standard SAP CRM tables? And does it really make sense to create them for tables with roughly 700,000,000 records?

2) As for the partitions, in VizPlan I see that most of the runtime takes place in joins with partitions. That's why I'm having this suspicion.

My user filter criteria is already applied on the lowest level (Projection 6). It's an input parameter with stored procedure User enters start and end calendar months like '201703' and '201704' and stored procedure converts them into decimal falues similar to timestamps like 201703000000 and 201704235959. So the filter on the lowest level the filter looks like this: timestamp => 201703000000 and timestamp <= 201704235959.

Best regards,
Vlad

0
Zahid Yener Jul 13, 2017 at 12:10 PM
0

Hi,

This is what you can do,

1. Apply the filters as early as possible.

2. Try to use Input Parameters.

3. Try to simplify your CV if possible try to break in parts.

4. Try to use UNIONS if you need 2 fact tables.

5. If you need to use joins, use LEFT OUTER/REFENTIAL(I personally use LEFT OUTER) joins instead of INNER

6. On joins set optimize join to TRUE

7. Only project the columns you need. Don't bring up all the columns in a table to the next node.

8. Avoid using IF-THEN-ELSE calculations

9. Aggregate as early as possible

10. Try to avoid joining on calculated columns

11. Try to create calculated columns in Aggregation Node.

Your goal should be to work with the data you only need. If you are trying to work on all data, then that's a problem.

Also check link and documents below for best practices and tunning and performance on SAP HANA

https://blogs.sap.com/2014/03/26/hana-modeling-good-practices/

https://blogs.sap.com/2015/11/03/the-art-and-science-of-hana-performance-modeling/

http://www.hdespot.com/wp-content/uploads/2015/11/SAPHDE_Webinar-The-Art-Science-of-Tuning-HANA-Models-for-Performance_Abani_Pattanayak_SAP-Nov_2015.pdf

Share
10 |10000 characters needed characters left characters exceeded