Forum Discussion
It’s almost the same.
Take a look at this pipeline. In both cases, I’m using extra XML formatted and XML parser for the purposes of validating the XSD file.
Convert XML to CSV_2020_11_15.slp (20.7 KB)
/Igor
Thanks and I have tried with attached example xml with XSD file but CSV formatter failed with data need to flatten output of json parser. but CSV requirement is same.
XML file reading from Folder and XML file some elements May or may not come but If elements come then CSV file should have elements header and value else CSV file should have element header with blank value
BOOKORDER_XSD and XML.txt (4.1 KB)
Directory Browser doesn’t actually read the files. It just returns the list of files in a directory. So you’d have to use a File Reader for each of the files output by the Directory Browser.
I think what you want is the Mutli File Reader instead. It basically combines the Directory Browser + File Reader functionality into one snap and it’s far more efficient. But be aware that it has limited functionality during Validation – I think it only reads one file. But if you Execute the pipeline, it will read them all.
What are your intervals? If less than 5 minutes, I recommend the File Poller approach and move files to a “working” directory to ensure the same files aren’t polled for multiple runs. Keep in mind that the file poll will run continuously until the timeout is reached. So even if it finds files to process, it will send those on to the next snap and continue polling. Depending on how long it takes to process your files, it is possible to poll the same files before you finish, so this may be challenging for you to implement if you aren’t familiar with the usage.
If your interval is more than 5 minutes, you can use a Directory Browser in your pipeline. Create the task as scheduled and enable the “Do not start a new execution if one is already active” option to prevent multiple instances. This is a simpler pattern and easier to implement.
Kory, on File Poller Snap we have Only Output on Change check box. Only When there is a change it will o/p.
From Documentation
Only Output on Change
Select this check box to instruct the Snap to provide an output only when there is a change in the contents of the polled directory. When selected, the Snap provides an output during its initial run if it finds matching documents. However, it provides polling results in the next run only if the polled directory has newer files that match the pattern specified.
This post is a few years old but I wanted to get clarity on this statement. You seemed to indicate that at the end of each polling interval, the output is passed downstream and polling will continue until the time out. That has not been my experience. The output for me was only passed on once the timeout was hit. Items were added to the output at each polling event if they were added to the file location being polled but not until the timeout is reached does the output move on.
Is there a configuration for the snap that will achieve the behavior you experienced?
I have experimented with different combinations of “Only Output on Change” and “Exit on first matches” but have not achieve the behavour as you describe.
Has your experience changed with this File Poller snap?
Thanks,
Thanks @skatpally - I should have double-checked the documentation on the File Poller snap. It’s been a while since I’ve used it.
@mramaswamy - back to your original question. I think the Multi-File Reader snap is probably your simplest option to do what it sounds like you are doing.
The File Poller is typically used in an “always on” situation to poll continuously during a pipeline execution, for example in an Ultra Pipeline where it might poll the directory many times a minute, 24 hours a day.
- mramaswamy5 years agoNew Contributor II
Thanks for all your help. This is really helpful. I have not tried multi-file reader. At this point we are comfortable with Directory Browser + File Reader + Task scheduler. I will keep posted after we try multi-file reader.
If you are trying to view the behavior in Pipeline Validation, it gets a bit confusing because you don’t see the data preview until it either times out or outputs enough documents to satisfy your Document Preview Count in your User Settings.
You should be able to see the correct behavior in the Execution Statistics in Dashboard… i.e. each time it picks up a file, the output document count on the File Poller snap will increase.