Skip to Content
avatar image
Former Member

SDI Intial Load from HIVE tables to HANA

We configured a HIVE adapter using SDI DP agent. We have a huge table in HIVE DB with 125 fields.

We created a virtual table on top of HIVE table and created a flow graph which pulls the data based on a date filter.

We have around 10 million records for day and it is taking around 1 hour to read the data from HIVE.

When we look in HADOOP yarn logs, job was completed with in 4 mins.

Can you help me with the question below.

1) Are there any settings in SDI DP agent (like increase the number of threads/process) to improve the speed of the data loads.

2) How do we know how many threads DP agent is using when pulling the data from HIVE.

3)how to Monitor the loads in DP agent.

Thanks

Srini

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Jul 31, 2017 at 06:44 PM

    Hi Srini,

    there are a couple of parameters that we can tweak to get the data into Hana as I've mentioned in this answer ( https://answers.sap.com/questions/128689/sap-hana-sdi-flowgraph-vs-insert-into-select.html ) .

    In addition to the two things mentioned in there (configure fetch size in agent & partition on source) you can also try to partition the load in the task partitions, that are accessible through the Flowgraph settings. This can add actual parallel loading( or sequential if necessary).

    In your case you can add task partitions based on your mentioned Date column. You can then also specify the number of parallel jobs that should run.

    For more information on the Task Partitions and the performance boost that we see, please have a look at this Blogpost: https://blogs.sap.com/2017/01/24/task-partitioning-enhances-initial-load-performance-in-hana-smart-data-integration/

    Please let me know if you have any question.

    Thanks,
    Timo

    Add comment
    10|10000 characters needed characters exceeded