Skip to Content
0

How to get big data to work with SAP HANA and Lumira

Dec 13, 2017 at 12:24 AM

102

avatar image

Hi,

we are trying to handle hospital data in large scale in using HANA and Lumira. Let me describe the situation below:

we have a patient encounter table with 1,000,000 rows by 50 columns, and diagnosis table with 10,000,000 rows by 50 columns, and prescription table with 100,000,000 rows by 50 columns

all tables can be joined together through a common ENCOUNTER_ID which is unique to each patient encounter. 1 patient encounter can have multiple diagnosis and multiple prescriptions.

Currently I have tried to do a left outer join of encounter table to both diagnosis and prescription.

My issue is when i do this join the HANA server crashes and the memory maxes out.

My question:

1. Is HANA limited in this Cartesian join? What is the limit for joins in HANA given say 200 GB memory?

2. How do normal developers model large datasets which need to be able to change the filters "on the fly" in analysis.

For example, I know that if i only had encounters from year 2009 then my encounter table would only be 100,000 rows. similarly, only choosing 1 type of diagnosis might reduce the diagnosis table to 50,000 rows. and doing the join on just these records is easier on the database. But the thing is with Lumira, we want to be able to pick and choose the filters for our results from the WHOLE space.

We are using Lumira server with live HANA access, but even at the HANA level, the Calculation views with 3 dimension joins are crashing the server.

Any help or links to blogs to mitigate these issues would be appreciated.

Thank you,

Matt

also under consideration is moving to 1000x the data size, e.g. precription table with 1 Trillion rows.

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

2 Answers

Lars Breddemann
Dec 17, 2017 at 04:05 AM
0

You asked:

My question:

1. Is HANA limited in this Cartesian join? What is the limit for joins in HANA given say 200 GB memory?

2. How do normal developers model large datasets which need to be able to change the filters "on the fly" in analysis.


to 1.: SAP HANA obviously is processing data in-memory, which means, that all base data, as well as the intermediate data required to perform a computation, need to fit into memory. For many/most configurations a 50%:50% ratio of "storage" memory to "processing" memory is part of the sizing rules. There is no specific limitation for certain join types, though. Generally, SAP HANA tries to leverage all available memory for its operations (if possible).

to 2.: There is certainly no "normal" developer. However, the general design principles for SAP HANA information views and schemas are explained in the developer's guides and in many online tutorials and courses.


Show 2 Share
10 |10000 characters needed characters left characters exceeded

Thanks Lars,

I understand the in-memory limitation and general design principles. However, i just wanted to know if there are any real world public examples of HANA being used for big data.

How can i check the "size" in GB of a table?

--Matt

0

You can review the storage requirements of a table in the runtime information of the table in SAP HANA studio as well as in the M_CS_TABLES/M_CS_COLUMNS system views. All this can be found in the documentation.

As for real-world public examples, https://www.sap.com/products/hana/customer-reviews.html lists many customer stories. If this should be too much marketing for your taste, then just look around the blog section here in the SAP Community. I remember years ago that @John Appleby published https://blogs.sap.com/2014/10/21/build-your-own-wikipedia-keynote-part-1-build-and-load-data/ for example.

Maybe a comment to the "BIG DATA" notion: BIG DATA does not imply that SAP HANA delivers a big volume of data to an analytics client (or any client). The idea is to process the data where it is stored and to avoid sending it "around". So, ideally, information models are build so that the analytics clients can specify what result the user needs to see and HANA provides just this result to the client tools.

0
Sarhan Polatates Dec 13, 2017 at 08:29 AM
-1

Hi Matt,

John Appleby is the right person for this question.

Cheers,

Sarhan.

Share
10 |10000 characters needed characters left characters exceeded