Transformation Flow partition generation with SAP ...

Sefan_Linders

You want to partition a Transformation Flow in SAP Datasphere, to reduce the memory consumption? That’s what we discuss in this blog post and provide the Python code to realize this. It makes use of the SAP Datasphere Command Line Interface (CLI) to read and write Transformation Flows, and to write a Task Chain that holds them together as one.

Recently I tried loading a large amount of data from the SAP Datasphere BW Bridge into SAP Datasphere Core, using a Transformation Flow in Delta Mode. As you might know, the Bridge runs its own SAP HANA database, separate from the SAP HANA database that SAP Datasphere “Core” runs. When using a Transformation Flow to load from Bridge to Core, the data transfer goes over SDA. Both Core and Bridge were minimally sized, so I needed to be careful with the memory consumption of the operation. And that is what you can usually achieve with partitioning, so that the data is transferred in smaller packets using a pre-defined set of criteria. However, Transformation Flows do not support partitioning yet.

The alternative? Creating multiple Transformation Flows, each with a WHERE clause in the SQL definition, effectively creating your own partitioned Flows. And since I didn’t feel like creating 20 of this manually, I thought this would be nice to give the CLI another swing. And some python code of course.

The original Transformation Flow

The original Transformation Flow is rather straightforward: we read from one Delta (remote) Table and write it into another Delta (local) Table. In this case we are writing from and to a Local Delta Table, which means that the Transformation Flow is in Delta Mode. Below you see the screenshots of the main work area and the source SQL definition.

Figure 1: The original un-partitioned Transformation Flow

Figure 2: The SQL definition of the source node

SELECT "Change_Type",
   "Change_Date",
   "ID",
   "SOMESTRING"
FROM "UC2_DLT_SRC_Delta"

What we want to accomplish is to copy this Transformation Flow for each partition that we want to create, add a WHERE clause to the SQL definition, and bind these Flows together in one Task Chain.

The code (and running it)

I’ve uploaded the code into this Git repo: https://github.com/SAP-samples/btp-global-center-of-excellence-samples/tree/main/DSP_object_generati.... The repo is shared with a few other code samples, only the folder “DSP_object_generation” is needed for what we achieve in this blog. The folder also contains the code for Remote Table to View generation, which is described in a previous blog.

Requirements:

Python to run this. I developed this using 3.10.6.
The latest SAP Datasphere CLI. I used 2024.9.0.

To run it:

Create a dsp_logon_data.json file in the root of the folder. You can use “dsp_logon_data.json.template” as a template. The hdb* parameters do not have to be set for transformation flow generation, they only apply to the previous blog where we read remote table definitions from the database.
Edit the parameters in main.py:
- Technical space name
- Technical transformation flow name (this one should have been created already)
- Column to partition on (in my example this is a String column)
- Partition definition (in my example these are Strings)
Run main.py

The output

For each partition, a Transformation Flow is generated, as you can see in below screenshot. Also depicted is the WHERE statement added to one of the generated Transformation Flows. Besides that, a Task Chain is generated which runs all generated Transformation Flows in sequence.

Figure 3: An overview of the generated objects

Figure 4: The WHERE statement added to the generated Transformation Flow SQL definition

Figure 5: The generated Task Chain that binds all generated Transformation Flows together

Figure 6: Running the generated Task Chain

Conclusion

This blog post describes how to use the SAP Datasphere CLI and object generation using Python to partition a Transformation Flow. This is merely another example of what you can achieve with the CLI and some coding. This example might help you directly but applies to a rather straightforward Transformation Flow where we only want to add a WHERE clause, repeat this for each partition, and bind them together with a Task Chain. In anyway, I hope you enjoyed the read and are inspired to build something yourself.