cancel
Showing results for 
Search instead for 
Did you mean: 

Concurrency in the hot folder level in a clustered environment

Former Member
0 Kudos

We have a NFS data mounted in our hybris project that is used to receive batch interface files delivered through sftp connection via external middleware, the folder structure is:

/NFS_DATA/transfert/transfer/incoming/products (product xml processed through jaxb processing)
/NFS_DATA/transfert/transfer/incoming/orders (order txt files processed through hot folder)
.. other folders ..

We are facing an issue in one clustered environment to process the orders, please understand the process:
1) middleware delivers 1 set of orders every one hour containing 3 txt files inside "/orders" folder, "order-header.txt", "order-lines.txt", "order-lines-scheduling.txt"
2) there is one hybris cronjob that runs every two hours and pick all the files inside this folder, rename them to csv and hotfolder now can process the file since they are now CSV
3) every file has its own key and this is defined inside hotfolder impex, for example, to process the order-header.csv the impex guarantee that insert_update table1;order-id[unique=true], to process the order-line.csv the impex guarantee that insert_update table2;order-id[unique=true],order-line-id[unique=true] and to process the order-lines-scheduling.csv the impex guarantee that insert_update table3;order-id[unique=true];order-line-id[unique=true];order-line-scheduling-id[unique=true]. This is because we want to keep one single state of the order, order line and scheduled in the table corresponding to the last update, last status of that item.

Well, this has being working fine until we started our tests in a clustered environment with two nodes, app-001 and app-002, the load balancer guarantees that the rename cronjob (step 2) executes every time in a different node converting the txt to csv but hot folder is mapped accross both nodes also and in a very rare moment when, for example:
Hour 0: Middleware delivers one set of order containg several orders plus order 123.
Hour 1: Middleware delivers another set of order containg several orders plus order 123.
Hour 2: Hybris renames the txt files and hotfolder start processing the files. The different nodes are concurring to process the files

  • Node 1 picks header file from first middleware delivery;

  • Node 2 picks header file from second middleware delivery; For coincidence, both nodes processes order 123 exactly at same time (milisecond level).

At this moment the order is being duplicated in the table. The next time orders 123 comes on a middleware set of order, when Hybris tries to update it the item will not be updated and the ImpEx will be aborted, which means all orders below order 123 in the txt file will not be updated. This can occur at header, line and/or line scheduling level.

So, what we need to do in this case is "isolate hot folder processing in one single node".

In our project we have only two nodes and we don't want to separate things like for example isolate one node only for Front End processing and the other one only for Batch processing, they must do the same thing except the hot folder processing over /orders folders, we need to find a way to isolate it.

We are looking for infrastructure alternatives like for example unmount this folder in app-002 for example and create a private folder under NFS DATA for this node, so this folder will never see the files but it seems that this option is not feasible. Another option could be force the hot-folder to execute in one single node, but also seems not to be feasible.

Does someone has any insights?

Thanks,

Accepted Solutions (0)

Answers (4)

Answers (4)

Former Member
0 Kudos
Former Member
0 Kudos

Hot folder import can be configured to run only on the back office server.

Former Member
0 Kudos

Hi , thanks, what you mean as disable file scanning? I am not sure what you are talking about. Please remember that I want both nodes to keep processing other files, the only folder I use with hot folder is this /orders and I want to focus only one of the nodes to pickup files under hot folder level.

former_member537989
Contributor
0 Kudos

Check the updated answer.

former_member537989
Contributor
0 Kudos

Another option could be force the hot-folder to execute in one single node, but also seems not to be feasible

Why? You can disable file scanning on one of the nodes.

Update You can set different regexp patterns for inbound channel adapters, so both patterns will be read from configuration properties (local.properties), every node would have different patterns, only one node will be allowed to read specific file name.

Former Member
0 Kudos

Thanks for sharing the knowledge but it is still not clear for me how to achieve it, I want to keep my cronjobs processing on both nodes for batch processing but the unique folder which is /NFS_DATA/transfert/transfer/folder-x uses hot folder approach to process csvs, I want to guarantee that the hot folder processing is occuring in one single node, could you please detail your suggestion to achieve it ?

former_member537989
Contributor
0 Kudos

look into hot-folder-store--spring.xml, inbound adapter settings: filename-regex="^(.)-(\d+).csv"** comparator="fileOrderComparator" >

So as one approach you can provide that regex from local.properties instead, for node1 the value for regexp will accept specific filename, however node2 value will prevent that specific filename|fileformat|etc.

Another approach is to specify your custom file scanner bean again driven by different property value per node