cancel
Showing results for 
Search instead for 
Did you mean: 

Extract BSEG table using SLT/Data Services to flat files

yogesh_joshi
Discoverer
0 Kudos

Scenario is ERP > SLT > DS > flat files.

We need to do initial extract of SAP table BSEG, with billions of records. SLT is setup to trigger initial load. DS has to fetch this data from SLT and generate multiple flat files.

DS automatically includes an additional column DI_SEQUENCE_NUMBER (int, starting from 0, mapped to Query output, and output already sorted with this column) when this data is read from SLT.

We want to write first 10 million records into 1 flat file, and the next 10 million is another and so on. What is the best approach to design a flow to handle this, given the billions of records? There is no need to group or do any further transforms on the incoming data.

Edit: SLT to DS data transfer is using RFC.

Any ideas or suggestions are much appreciated, thanks for your time.

Yogesh

Accepted Solutions (0)

Answers (2)

Answers (2)

former_member208402
Active Contributor
0 Kudos

Hi Yogesh Joshi,

You can achieve this by using a loop. Take count of your source data and divide your source count by 10 million.. so you will get a value, say it is 10, so you have to loop your dataflow 10 times.

initialize variables outside loop

$G_Loop=1;

$G_max_loop=10; ( the value which you got after dividing your source count by 10 million)

$G_INITIAL=1;

$G_MAX=10000000;

and give target file name using a variable like $G_FILENAME='filename_'||$G_Loop;

Now connect source-----> Query-----> target file

in query where condition give

DI_SEQUENCE_NUMBER>=$G_INITIAL and DI_SEQUENCE_NUMBER<=$G_MAX ( which will be DI_SEQUENCE_NUMBER>=1 and DI_SEQUENCE_NUMBER<=10000000)

This will load 10 million records to the file (filename_1 ----> since $G_loop=1)

now inside loop take a script after dataflow and

$G_INITIAL=$G_MAX+1; ( this value will be 10000001)

$G_MAX=$G_MAX+$G_MAX; ( this value will be 20000000)

$G_Loop=$G_Loop+1 ( this value will be 2)

Now the where condition will be

DI_SEQUENCE_NUMBER>=$G_INITIAL and DI_SEQUENCE_NUMBER<=$G_MAX ( which will be DI_SEQUENCE_NUMBER>=10000001 and DI_SEQUENCE_NUMBER<=20000000)

This will load next 10 million records to the file (filename_2 -----> since $G_loop=2)

every time filename will be different. So 1 dataflow in a loop creates multiple files for you with 10 million records in each file.

Change this logic according to your requirement.

Thanks,

Ravi kiran

yogesh_joshi
Discoverer
0 Kudos

Hi Ravi,

Thanks for checking this and the suggestions. What you have suggested is to read off the ODP source repeatedly with filters, and that doesn't seem to work for us. I have 2 questions:

1. Using DI_SEQUENCE_NUMBER in the query's where clause is tripping up something, the jobs doesn't get executed, with errors returned by the ODP API I believe. Are you able to run this without any issues?

2. Similar thing with ODP Source - inside a loop, first iteration runs fine and we get first 10 million records, but when second iteration starts, there are errors returned within ODP API again I think. Are you able to read the ODP source repeatedly in a loop?

former_member208402
Active Contributor
0 Kudos

Hi Yogesh Joshi,

You can achieve this by using a loop. Take count of your source data and divide your source count by 10 million.. so you will get a value, say it is 10, so you have to loop your dataflow 10 times.

initialize variables outside loop

$G_Loop=1;

$G_max_loop=10; ( the value which you got after dividing your source count by 10 million)

$G_INITIAL=1;

$G_MAX=10000000;

and give target file name using a variable like $G_FILENAME='filename_'||$G_Loop;

Now connect source-----> Query-----> target file

in query where condition give

DI_SEQUENCE_NUMBER>=$G_INITIAL and DI_SEQUENCE_NUMBER<=$G_MAX ( which will be DI_SEQUENCE_NUMBER>=1 and DI_SEQUENCE_NUMBER<=10000000)

This will load 10 million records to the file (filename_1 ----> since $G_loop=1)

now inside loop take a script after dataflow and

$G_INITIAL=$G_MAX+1; ( this value will be 10000001)

$G_MAX=$G_MAX+$10000000; ( this value will be 20000000)

$G_Loop=$G_Loop+1 ( this value will be 2)

Now the where condition will be

DI_SEQUENCE_NUMBER>=$G_INITIAL and DI_SEQUENCE_NUMBER<=$G_MAX ( which will be DI_SEQUENCE_NUMBER>=10000001 and DI_SEQUENCE_NUMBER<=20000000)

This will load next 10 million records to the file (filename_2 -----> since $G_loop=2)

every time filename will be different. So 1 dataflow in a loop creates multiple files for you with 10 million records in each file.

Change this logic according to your requirement.

Thanks,

Ravi kiran