cancel
Showing results for 
Search instead for 
Did you mean: 

SAP HANA python API - machine learning use cases

0 Kudos

Hi,

what are the typical use cases when I can deveolp my models in the Jupyter notebook with data in SAP HANA?

I am asking because most of the machine learning use cases I know have data in formats like wav, txt, csv or stored in data lake e.g. Hadoop or streaming data from IOT sensors. HANA memory is very expensive so it would make no sense to load this data to HANA.

Do you know any use cases or similar scenarios? Any links?

BR

Robert

Accepted Solutions (1)

Accepted Solutions (1)

henrique_pinto
Active Contributor
0 Kudos

Hi Robert,

you're right that typically if you're handling large volumes of data, you will not store that data in a permanent state in HANA. I mean, in theory with HANA NSE you even could consider the possibility, since data with NSE now sits on disk instead of memory (HANA in this case would behave like a regular disk-based, cache enabled DB), but then your argument would be that you would never store big data in a DB, specially for ML, and would be better served with a Data Lake, since you can then use things like Spark to distribute compute.

However, before being an in-memory DB, HANA is an In-Memory Compute Engine (fun fact, IMCE was one of the many early internal names for HANA). In essence, that means that you don't need to necessarily store the data in HANA's memory but you can (and should, where valid) leverage the HANA in-database compute engines for processing data in a fast manner, even if the data is not stored in HANA. I'm writing a blog I will publish soon about some tests I've done with NSE and also virtual tables with the hana_ml lib

Typically, the advantage of the in-memory compute of HANA is for the scoring in real time data from SAP applications. That way, we can bring the model into production in the actual business transactions apps in a easy way.

Answers (3)

Answers (3)

AndreasForster
Product and Topic Expert
Product and Topic Expert

Hello Robert, ML can provide value through improving or automating business decisions. Very often data for such processes is kept in a SAP HANA systen, ie under BW-on-HANA or BW4. With the hana_ml Wrapper Data Scientists can easily train ML models without having to extract the data out of the system, thus avoiding data duplication, improving data governance, keeping the architecture lean. Since the data is not moved, the hand-over from Data Scientist to IT for deployment becomes easier. The ML models can be integrated into larger workflows, ie you could deploy a model as REST-API on Data Intelligence for inference and provide a chatbot frontend with Conversational AI for your end users to get predictions on the fly. All without having to extract the data. The actual use cases can be very different depending on the industry or department. I worked on projects for example to estimate the quality of a customer's product based on different raw materials, or estimating the fair price of a used product, or generally around customer analytics. I understand it is planned to expose time-series forecasting through the ml_wrapper. This would be another major area, ie for demand forecasting, financial forecasting, etc. Please also feel free to ping me directly. Greetings, Andreas

abdel_dadouche
Active Contributor

Hi mount_bertl

The SAP HANA Python API brings 2 major components, one is the SAP HANA DataFrame and the other is the access to the APL & PAL algorithm wrappers.

The SAP HANA DataFrame gives you access to your SAP HANA data and run transformation in the database instead of locally, you can apply transformations, aggregation and other functions at the database level instead of locally.

You can also collect the data use it like any Pandas data frame in the end with your preferred visualization or ML libraries.

And with the second, you can get access to the SAP HANA libraries for Machine Learning. SAP HANA provides access to 90+ "industry" standard algorithms like Linear Regression, K-mean, Apriori etc. but also to the Automated algorithm from KXEN (APL).
Not all algorithms have been wrapped in Python yet, but that's the ambition!

For the list of algorithms available from PAL please check: https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.algorithms.pal.h...

For the list of algorithms available from PAL please check: https://help.sap.com/doc/0172e3957b5946da85d3fde85ee8f33d/2.0.03/en-US/html/hana_ml.algorithms.apl.h...

You can also check arun.godwin.patel blog series about the SAP HANA Python library:

- https://blogs.sap.com/2018/12/17/diving-into-the-hana-dataframe-python-integration-part-1/

- https://blogs.sap.com/2019/01/28/diving-into-the-hana-dataframe-python-integration-part-2/

You can also consider using SAP HANA, express edition which use a free developer license up to 32 GB of RAM. I personally ran some test loading csv files, and turned out that some of my 4GB of data files was loaded into a couple of hundred MB.

From what I remember, SAP HANA, express edition allows you to use SAP HANA streaming capabilities (to be confirmed however).

And last but not least with SAP HANA, express edition is that you can get the binary and install where ever you want or download a pre-built VM (assuming your host meet the minimum system requirement in both cases) or spin a new instance on AWS, Google Cloud or Microsoft Azure (the order here is just alphabetical, no preference is represented here ;-)).

Hope this helps you see better the benefits.

And off course this is definitely open to discussion

@bdel

0 Kudos

Hi Abdel,

Thx for the answer but I think I have expressed my question the wrong way because I have already implemented models with this Python API with PAL and HANA DataFrames. So technically everything is clear to me.

I only want to know the typical use cases which are implemented and deployed this way. What are typical real world business examples?

Thx

Robert