Skip to Content

Performance issue when shipping data out of BW/HANA to Azure DataLake

Hello Experts

Within our project we are shipping the data from SAP BW/HANA to Azure DataLake Gen 1 and witnessing severe performance issues with the overall runtimes for data transfer. Options we have tried so far include

Option 1 - Azure SSIS IR

SAP BW/HANA Hosted on Azure VM - North Europe > Calculation View (170 Columns - Attributes & Metrics, 5 Million Records) > SSIS Packages Hosted on Azure SSIS IR - West Europe > Azure DataLake Gen 1 - West Europe

Total Runtime - 88Mins

Option 2 - Azure Data Factory V2

SAP BW/HANA Hosted on Azure VM - North Europe > Calculation View (170 Columns - Attributes & Metrics, 5 Million Records) > Azure Data Factory V2 with Self Hosted IR - West Europe > Azure DataLake Gen 1 - West Europe

Total Runtime - 80Mins

BW/HANA System Details

SAP HANA - 1.00.122.23.1548298510 (fa/hana1sp12)

Platform - SUSE Linux Enterprise Server 12.4

ODBC Driver - HDB_CLIENT

HANA ODBC Version - 1.00.120.139.1589320942

We reached out to Microsoft support and they seem to suggest this is to do with ODBC driver! Has anyone come across similar situations and what was done to improve the performance?

PS: We tried to tweak the config./parameters at the SSIS Package level (Engine Threads, Batch Size, Buffer Size..)too but that had no noticeable impact on the performance

Thanks

Abhishek Shanbhogue

Add a comment
10|10000 characters needed characters exceeded

  • Thanks Witalij Rudnicki, Please see my response

    We are reading data from a HANA calculation view (6 projections unioned together) for 175 columns with 3Million records

    Q1: Have you done any other investigation breaking that total time into parts it takes e.g. selecting data in HANA, transferring the volume of data, saving it int data lake?

    Yes, These runtimes are for loading data from HANA CV to Azure Data Lake, we tried to use default parameters where it took ~1Hr and after changing the SSIS parameters (Engine Threads, Batch Size, Buffer Size..) it didnt have any noticeable reduction in runtimes

    Q2: What is the size of data volume once it is persisted in Azure?

    We are reading 175 columns from HANA CV having 3 Million records

    Q3:Have you tried parallelization of the data movement to create multiple streams reading partitions of data in HANA?

    Have tried parameter changes for parallelisation on the Azure side by changing SSIS package and Azure Data Factory V2 but that didnt help. I could try to create a temp table with partitions and load data from CV to check if this improves but this will mean an additional step before shipping data out of HANA to Azure!

    Thanks

    Abhishek shanbhogue

  • Have you done any other investigation breaking that total time into parts it takes e.g. selecting data in HANA, transferring the volume of data, saving it int data lake? What is the size of data volume once it is persisted in Azure? Have you tried parallelization of the data movement to create multiple streams reading partitions of data in HANA?

  • Lars Breddemann - Apologies on tagging you here but I see you have responded to some of the queries related to HANA Client/ODBC performance, any help here would be much appreciated

    Thanks

    Abhishek Shanbhogue

Assigned Tags

Related questions

1 Answer

  • Best Answer
    Posted on Jul 26, 2020 at 03:06 PM

    Posing this as an answer for the benefit of others who have similar situation. We have done various tests and could conclude the following

    - Leaner the query better the performance, In our case we were trying to read 177 columns from HANA view and that was degrading the performance. If I query the same view for 60 columns then its performance is much better

    - Splitting the columns into smaller chunks and then merging is another option but this will still have some performance implications

    - Changing the packet size from default value could have significant performance benefit but this again works with lean query and not for a wide column set. We tried with PACKETSIZE = 2; in the connection string and this went well

    Add a comment
    10|10000 characters needed characters exceeded

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.