Forum Discussion

hrender's avatar
hrender
New Contributor
7 years ago

Scheduled Task + File Poller guidelines

I have several legacy file-based interfaces I’m migrating to Snaplogic. Many of them involve polling for new input files on a scheduled basis, usually something like every 30/60/120 seconds. I’ve coded the ones I’ve done using a combination of a scheduled task that calls a pipeline containing a file poller. I’ve played around with having all of the iteration done by the task and having the file poller only run once, and I’ve also tried setting the task to run every 30/60/90 minutes and having the file poller run every 30/60/120 seconds with a timeout equal to the task frequency. I like doing the latter since I can iterate at a higher frequency, but given that task scheduling is somewhat approximate, getting the the two schedules in sync is more of a problem.

Is there any general guidance as to how to set up the two schedules, i.e. is it better to schedule a task to run once an hour/day/week and then have the poller do all of the smaller iterations in that larger time-slice? If so, what should the task frequency generally be?

Thanks in advance.

7 Replies

  • Hi,

    I am also interested in this question. I hope we can see a response.

  • @marenas - Unless you are running the File Poller snap in an Ultra Task which will automatically restart the task as soon as it closes, I would recommend that you schedule the task for the minimum acceptable “downtime” in the event the File Poller terminates abnormally. As of the 4.26 release (Sep '21), SnapLogic enabled the Snaplex-based Scheduler, which means that scheduled task timing and reliability is dependent on your local snaplex and should be very close to the selected timing.

    With that stated, you could set your File Poller to check for file existence as often as you wish, with the timeout set for 59 minutes (for example), then create the Scheduled Task to execute every 5 minutes with the “Do not start a new execution if one is already active” option enabled. This means that the polling would only be “down” at most 5 minutes every hour. Since the scheduler is using local snaplex resources, you could even have the task scheduled every minute if you desire.

    • marenas's avatar
      marenas
      Contributor

      @koryknick thank you for your response.

      In my sample pipeline I set up a file poller with these properties below:

      A scheduled task is set to run daily at 12 midnight (“Do not start a new execution if one is already active” option enabled). I expected that the behavior of this pipeline will continue to poll the matched file and output only when there is a change in the contents of the polled directory. Is that correct? The file has less than 100 rows but should I increase the polling interval to make sure enough time is allocated to process each file?

  • I won’t pretend to be an expert on the File Poller snap - I usually need to play with the settings a bit to get it to work the way I want. I do believe that with the “Only Output on Change” will retrieve an initial set of files in the directory but then won’t output anything else until a file is updated or added to the directory being polled.

    With the Polling Timeout of -1, the File Poller snap will not end unless the pipeline is stopped or fails. With the Only Output On Change enabled, the Polling Interval is only important if you expect the same file to have updates within the same file being processed. Those are things you need to consider in your design of how files are landing and how you are processing them. You may wish to move files to another location to be processed to prevent re-capturing files that are in-flight.

    • marenas's avatar
      marenas
      Contributor

      @koryknick thank you for the response. I too am not an expert and I really appreciate all your thoughts on this.

      The above settings turned out to be prone to data errors in my case. The directory has over 52k records. When I enable Only Output On Change, the pipeline kicks in every time there are changes in the contents of the directory even though the changes do not relate to the file that I am looking for, and then polls continually (due to pulling timeout -1). It eventually finds the matching file however the file is not the most updated one. I am trying this configuration below and I will play around them depending on the results. btw, the scheduled task is set to run every 5 mins for this.

      I will take note of your recommendation to move files to another location for processing.

  • Do you have the File Filter in the snap configured with a specific filename? You should not get results unrelated to the file you specify.

    • marenas's avatar
      marenas
      Contributor

      Yes, I have a file filter. I am re-enabling this "Only Output On Change’, set the polling timeout to 0, and exit on the first match property. I set a scheduled task yesterday that runs the pipeline every 5 minutes. So far I don’t see any unexpected results, but I will keep on monitoring this.

      thank you for the follow up @koryknick