Background Information

fellipe_mendes · ‎02-27-2024

This blog provides a comprehensive guide for users seeking a solution for processing large files from an SFTP source and transferring the data to a HANA Database. The solution outlined in this guide employs SAP Integration Suite. There have always been challenges associated with processing large files, which may include issues related to performance, memory usage, and overall efficiency. All of which are solved by the solution shared in this guide.

Background Information

During a recent migration project - we were challenged with designing a solution that eliminated a previously designed integration using Informatica Platform. Replacing it with SAP BTP Cloud Integration Suite, during the migration project a crucial task emerged: retrieving large files from an external SFTP and seamlessly loading the data into a HANA Database On-Premise table.

Systems and Platforms

SAP BTP Integration Suite (Cloud Integration): Integration Platform.
SAP HANA Database On-Premise: Target destination for the data.
External SFTP (SSH File Transfer Protocol): Source location for the large files.

Business Requirement

Regularly monitor the external SFTP folder for incoming files, and upon detection, initiate the seamless transfer of data to the HANA Database On-Premise table through SAP Cloud Integration. The graphic below is a diagram the represents the solution we deployed.

The Challenge with this Integration

While handling file processing within SAP Cloud Integration is generally straightforward and handled seamlessly by the platform. Challenges arise when dealing with large files, approximately 400MB or larger. Depending on the environment's capacity, unforeseen issues may occur, such as messages becoming stuck in processing status and being automatically aborted after 24 hours without completion.

Providing a specific recommendation on the maximum initial file size is challenging. Memory consumption varies significantly based on individual data content and processing step characteristics. Operations such as csv-to-xml conversion and message-mapping process the payload as a whole, resulting in considerable memory usage.

To address this, it is advisable to divide the input into manageable chunks for more efficient processing.

Solution Implemented

The solution we implemented involves utilizing the General Splitter capability, where two instances of General Splitter were employed.

(General Splitter splits a composite message comprising N messages into N individual messages based on configured conditions refer: https://help.sap.com/docs/cloud-integration/sap-cloud-integration/general-and-iterating-splitter)

The first one was responsible for breaking down the original input files into smaller pieces, while the second one facilitated sending the data to HANA in batches. This approach enables parallel processing with a defined number of concurrent processes, allowing the complete payload or data to be efficiently sent to HANA.

1 - SFTP Sender to trigger the iFlow

2 - File General Splitter to process a quantity of records per time (for example, a file with 1.000.000 rows will be split in 10 branches of 100.000)

Grouping: number of rows in each block
Parallel Processing: to enable the parallel processing
Number of Concurrent Processes: number of threads executing in parallel

3 - Convert the original payload CSV to XML (using the standard function)

4 - Graphical mapping to create SQL XML payload to HANA (field transformation to meet HANA format) Referral document for detail: https://help.sap.com/docs/SAP_NETWEAVER_750/5cf7d2de571a45cc81f91261668b7361/2e96fd3f2d14e869e100000...)

5 - JDBC General Splitter to send to HANA a quantity of records per time (batches)

6 - Send the data to HANA

7 - Gather to merge all the HANA responses

8 - Gather to merge all the records processed

9 - Exception Handling sending an alert email to a distribution list. Afterwards the escalation event is raised

Benefits/Advantages of the implemented solution

This solution utilizes an approach that harnesses the strengths of SAP Cloud Integration while addressing common challenges associated with data processing and integration. The emphasis on parallelization, error prevention, and leveraging out-of-the-box capabilities contributes to the efficiency, reliability, and scalability of the overall solution.

Conclusion

Overall, the solution not only meets the specific requirements of the migration project but also sets the foundation for a robust and configurable data integration framework. This approach facilitates seamless data flow from external sources to the HANA Database, emphasizing reliability, scalability, and effective handling of potential challenges associated with large file processing.

I would be more than happy to hear from our SAP Community about similar experiences with this or other integration requirements.
Please share your feedback in the comments below. I welcome any thoughts you may have on this solution. I encourage you to share similar requirements and how you solved them.

Thank you for reading!!

Fellipe Mendes

Cloud Integration - How to Manage Abnormally Large Files with SFTP Adapter and SAP HANA Database

Background Information

Systems and Platforms

Business Requirement

The Challenge with this Integration

Solution Implemented

Benefits/Advantages of the implemented solution

Conclusion

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z