11-27-2018 02:50 PM
I have several legacy file-based interfaces I’m migrating to Snaplogic. Many of them involve polling for new input files on a scheduled basis, usually something like every 30/60/120 seconds. I’ve coded the ones I’ve done using a combination of a scheduled task that calls a pipeline containing a file poller. I’ve played around with having all of the iteration done by the task and having the file poller only run once, and I’ve also tried setting the task to run every 30/60/90 minutes and having the file poller run every 30/60/120 seconds with a timeout equal to the task frequency. I like doing the latter since I can iterate at a higher frequency, but given that task scheduling is somewhat approximate, getting the the two schedules in sync is more of a problem.
Is there any general guidance as to how to set up the two schedules, i.e. is it better to schedule a task to run once an hour/day/week and then have the poller do all of the smaller iterations in that larger time-slice? If so, what should the task frequency generally be?
Thanks in advance.
08-19-2022 02:26 PM
Hi,
I am also interested in this question. I hope we can see a response.
08-20-2022 01:21 PM
@marenas - Unless you are running the File Poller snap in an Ultra Task which will automatically restart the task as soon as it closes, I would recommend that you schedule the task for the minimum acceptable “downtime” in the event the File Poller terminates abnormally. As of the 4.26 release (Sep '21), SnapLogic enabled the Snaplex-based Scheduler, which means that scheduled task timing and reliability is dependent on your local snaplex and should be very close to the selected timing.
With that stated, you could set your File Poller to check for file existence as often as you wish, with the timeout set for 59 minutes (for example), then create the Scheduled Task to execute every 5 minutes with the “Do not start a new execution if one is already active” option enabled. This means that the polling would only be “down” at most 5 minutes every hour. Since the scheduler is using local snaplex resources, you could even have the task scheduled every minute if you desire.
08-22-2022 07:07 AM
@koryknick thank you for your response.
In my sample pipeline I set up a file poller with these properties below:
A scheduled task is set to run daily at 12 midnight (“Do not start a new execution if one is already active” option enabled). I expected that the behavior of this pipeline will continue to poll the matched file and output only when there is a change in the contents of the polled directory. Is that correct? The file has less than 100 rows but should I increase the polling interval to make sure enough time is allocated to process each file?
08-22-2022 09:00 AM
I won’t pretend to be an expert on the File Poller snap - I usually need to play with the settings a bit to get it to work the way I want. I do believe that with the “Only Output on Change” will retrieve an initial set of files in the directory but then won’t output anything else until a file is updated or added to the directory being polled.
With the Polling Timeout of -1, the File Poller snap will not end unless the pipeline is stopped or fails. With the Only Output On Change enabled, the Polling Interval is only important if you expect the same file to have updates within the same file being processed. Those are things you need to consider in your design of how files are landing and how you are processing them. You may wish to move files to another location to be processed to prevent re-capturing files that are in-flight.