Forum Discussion
- ptaylor5 years agoEmployee
Directory Browser doesn’t actually read the files. It just returns the list of files in a directory. So you’d have to use a File Reader for each of the files output by the Directory Browser.
I think what you want is the Mutli File Reader instead. It basically combines the Directory Browser + File Reader functionality into one snap and it’s far more efficient. But be aware that it has limited functionality during Validation – I think it only reads one file. But if you Execute the pipeline, it will read them all.
- koryknick5 years agoEmployee
What are your intervals? If less than 5 minutes, I recommend the File Poller approach and move files to a “working” directory to ensure the same files aren’t polled for multiple runs. Keep in mind that the file poll will run continuously until the timeout is reached. So even if it finds files to process, it will send those on to the next snap and continue polling. Depending on how long it takes to process your files, it is possible to poll the same files before you finish, so this may be challenging for you to implement if you aren’t familiar with the usage.
If your interval is more than 5 minutes, you can use a Directory Browser in your pipeline. Create the task as scheduled and enable the “Do not start a new execution if one is already active” option to prevent multiple instances. This is a simpler pattern and easier to implement.
- skatpally5 years agoFormer Employee
Kory, on File Poller Snap we have Only Output on Change check box. Only When there is a change it will o/p.
From Documentation
Only Output on Change
Select this check box to instruct the Snap to provide an output only when there is a change in the contents of the polled directory. When selected, the Snap provides an output during its initial run if it finds matching documents. However, it provides polling results in the next run only if the polled directory has newer files that match the pattern specified.
- Thom3 years agoNew Contributor II
This post is a few years old but I wanted to get clarity on this statement. You seemed to indicate that at the end of each polling interval, the output is passed downstream and polling will continue until the time out. That has not been my experience. The output for me was only passed on once the timeout was hit. Items were added to the output at each polling event if they were added to the file location being polled but not until the timeout is reached does the output move on.
Is there a configuration for the snap that will achieve the behavior you experienced?
I have experimented with different combinations of “Only Output on Change” and “Exit on first matches” but have not achieve the behavour as you describe.
Has your experience changed with this File Poller snap?
Thanks,
- koryknick5 years agoEmployee
Thanks @skatpally - I should have double-checked the documentation on the File Poller snap. It’s been a while since I’ve used it.
@mramaswamy - back to your original question. I think the Multi-File Reader snap is probably your simplest option to do what it sounds like you are doing.
The File Poller is typically used in an “always on” situation to poll continuously during a pipeline execution, for example in an Ultra Pipeline where it might poll the directory many times a minute, 24 hours a day.
- mramaswamy5 years agoNew Contributor II
Thanks for all your help. This is really helpful. I have not tried multi-file reader. At this point we are comfortable with Directory Browser + File Reader + Task scheduler. I will keep posted after we try multi-file reader.
- koryknick3 years agoEmployee
If you are trying to view the behavior in Pipeline Validation, it gets a bit confusing because you don’t see the data preview until it either times out or outputs enough documents to satisfy your Document Preview Count in your User Settings.
You should be able to see the correct behavior in the Execution Statistics in Dashboard… i.e. each time it picks up a file, the output document count on the File Poller snap will increase.
Related Content
- 10 months ago
- 4 months ago
- 4 months ago