cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Launch a pipeline when a file arrives in a specific folder

mnduwayo
New Contributor II

Hello everyone,

is there any way how snaplogic can detect that a new file arrives in a folder and launch a pipeline? I thought Trigger tasks can do that but I don't know how.

Thank in advance for your help.

1 ACCEPTED SOLUTION

endor_force
New Contributor III

You can fulfill the business need with a pipeline having a file poller snap with the settings polling interval = 30 seconds and polling timeout = -1 minutes and the option "Only Output On Change" ticked in.

Then create a scheduled task for this pipeline with a scheduled start every 1 minutes and make sure that has the option "Do not start a new execution if one is already active" ticked in.

This will generate a constant running pipeline which polls the directory for new files every 30 seconds and process anything that pops up.

You should also create a step in your pipeline that moves the original file to an archive location or even delete the source file after successful processing of the sub pipeline which will deal with the processing.

View solution in original post

3 REPLIES 3

endor_force
New Contributor III

We have similar need, both polling file systems, kafka and JMS topics and have tried some different solutions.

Currently we are dependant on constant running file poller pipelines. 
We have separated the file pollers from the processing part and their only job is to monitor a specific folder or topic and perform pipeline execute on the appropriate sub pipeline(s) for that specific data, based on content or file name.
We call them input detection pipelines.

Consider if you really need to have it on demand and instantly processed, otherwise you are better of setting this file poller pipeline on a scheduled task, for example every 5 or 10 min.

Pros of having constant polling: 
The polling is done frequently (file poller checks every 30 sec), there is no leadtime for preparation or caching any triggered task since we use pipeline execute for processing/transforming the file input.
Handles multiple files processing well 
It is reliably working all the time and has it's dedicated slots.

Pro/Con:
Any sub pipeline that is used within a constant running pipeline is held in memory, if you do any changes to a sub pipeline in a constant running flow you will need to stop and let the input detection restart in order for it to reload the changes, and if you are too quick then it may still be the previous version that is loaded, and you will need to stop /start the constant running input detection again

Cons:
These constant running pipelines allocates slots and resources since they are always active.
Since we try to have the input detection separated per flow and project, the number of file polling pipelines adds up quite a lot and we see the slots decreasing very rapidly.
The tasks for keeping the constant running input detection needs to be scheduled frequently to prevent it from stopping, we have our tightly set on scheduled run every 1 minute (and not to execute again if already running).
Scheduler will try to start the task every 1 minute and come to conclusion that it is already running, log as a failed start and let it go which also takes some processing power.

We have tried to have triggered tasks for some flow, but the preparation time is slow and if there are too many files or calls - then quite soon the number of slots fill up and we run out of power and any new executions are discarded with http error.
Ultra pipeline will probably solve this, but then you will have a constantly running pipeline to receive and process this anyway.

Ideal solution, and as i have seen in some other integration platforms, would be to introduce separate lightweight functions in snaplogic which acts as constant input detection from different sources (file system, kafka topics, jms queues etc) which are disconnected from a regular pipeline or slot allocation and is smart about the keep-alive of the poller.

Thank you @endor_force  to reply. I have tried a triggered task which runs every ten minutes, but this doesn't respond to the bussiness need. I am asking if we can use Rabbit MQ to detect if a file arrives and use the Execute Task snap to execute the triggered task, but I don't have any knoweldge in how to set up Rabbit MQ and use in Snaplogic. I think it will be a huge work for me.

endor_force
New Contributor III

You can fulfill the business need with a pipeline having a file poller snap with the settings polling interval = 30 seconds and polling timeout = -1 minutes and the option "Only Output On Change" ticked in.

Then create a scheduled task for this pipeline with a scheduled start every 1 minutes and make sure that has the option "Do not start a new execution if one is already active" ticked in.

This will generate a constant running pipeline which polls the directory for new files every 30 seconds and process anything that pops up.

You should also create a step in your pipeline that moves the original file to an archive location or even delete the source file after successful processing of the sub pipeline which will deal with the processing.