cancel
Showing results for 
Search instead for 
Did you mean: 

How to Split a huge data XML file before PI Picks it from Sender Folder

AbdulHammed
Explorer
0 Kudos

Hi Experts,

My requirement is File to Proxy and they are placing xml file at sender folder which contains huge data lets say 200k records. so is there any possibility to split the the file before pick it into PI system and process to further like 1000 records as 1 file.

Please help me on this

Thanks

Accepted Solutions (0)

Answers (3)

Answers (3)

former_member190293
Active Contributor
0 Kudos

Hi Abdul!

As variant, you could think about splitting source file into smaller ones by using operation system script before processing.

I had a similar task once ago and even wrote the perl script for unix OS:

#!/usr/bin/perl

# use module

use XML::Simple;

$find = 1;

$cur = 0;

$newfile = 1;

$msgclosed = 1;

$reccount = 0;

$reccount = @ARGV[1];

$xml = new XML::Simple;

$data = $xml->XMLin(@ARGV[0]);

foreach $e (@{$data->{Customer}})

{   

  if ($newfile == 1)

  {

  open(FO,">res$find.xml") or die;

  print FO qq{<?xml version="1.0" encoding="UTF-8"?>

<ns1:MT_Customers xmlns:ns1="urn:jdbc_connect">};

  $newfile = 0;

  $msgclosed = 0;

  }

  print FO "<Customer>";

        print FO $xml->XMLout($e, RootName => undef, KeepRoot => 0);

  print FO "</Customer>";

  $cur = $cur + 1;

  if ($cur == $reccount)

  {

  print FO "

</ns1:MT_Customers>";

  close(FO);

  $find = $find + 1;

  $cur = 0;

  $newfile = 1;

  $msgclosed = 1;

  }

}

if ($msgclosed == 0)

{

  print FO "

</ns1:MT_Customers>";

  close(FO);

}

It just splits source xml into several files using a given parameter of "Customer" elements count per every message.

Regards, Evgeniy.

former_member210091
Participant
0 Kudos

Hey Abdul,

XML files typically have parent-child relationship. If you split this file into smaller chunks, you run the risk of loosing this dependancy.

Is it possible that the source application can send you a flat (.txt,.csv etc) file ? If yes, you can easily handle number of record via FCC parameter "Recordset per message"

Thanks,

iaki_vila
Active Contributor
0 Kudos

Hi Abdul,

In my opinion for a huge files that you need certain process inside PI ESR i think you can split it with OS command before to take it like Anupam explains here Optimum File Size for various file scenarios in PI part-2 - Process Integration - SCN Wiki

Another way is not to use a proxy in the receiver side, and to move the file to a NFS directory accessible for your ECC endpoint and the PI only moves the file from one path to another using chunk mode:

Regards.