โ01-08-2018 12:16 PM
Hi All,
I am trying to connect SFTP server and fetch multiple files from the source remote directory. File Reader snap has an option to get SFTP account but doesnโt find how to provide remote directory path.
In the remote directory, we are getting different types of files like *.txt where I need to apply filters based on the filenames to differentiate and invoke the different process to load into SQL DB.
Once I fetch the files from remote location how to delete the files.
Appreciate your inputs and any relevant links to look into more details. Let me know if you need any further details.
Thanks in advance,
Amar
โ01-09-2018 09:41 PM
Hi Amar,
Letโs first start with the Multi File Reader Snap. If you create a new one, and then in the account tab, create a new account. For one of our internal test accounts, I chose basic auth. I am prompted for user name password, which I give. Next in the main settings tab of the Snap, I input the following under file/folder the format: sftp://hostName/DirectoryStructure/TargetFolder/
Then in the wildcard I put โ.csvโ And I am able to run successfully. If I look at my output as JSON, I can see โcontent-locationโ which includes the absolute path to the file name (including the file name). From this point, you should be able to use a BinaryToDocument Snap in order to operate on the document data. But letโs see how things go up to this point first.
Thanks, and apologies for the delay.
-Charlie
โ01-09-2018 11:05 PM
Thank you, Charlie. Iโve followed your steps for the Multi File Reader Snap and able to succeed to fetch files from the remote directory.
Attached screenshot for your ref.
I also used BinaryToDocument snap and tried all options under encode or decode parameters. But I am getting below message in the dialogue window.
โPreview data is not found or too large for browser to decryptโ
Appreciate your next steps to get the document data.
Thanks,
Amar
โ01-09-2018 11:57 PM
Hi Amar, glad youโve had success so far! I experienced the same when I connect the BinaryToDocument to a Multi File Reader. However, if I connect it to a FileReader and specify one specific file, I get preview data. This may be a validation mode limitation.
So the output of the MultiFile Reader is showing me JSON data with attributes about the files, including the file handle. The handle is in the โcontent-locationโ field.
If youโve hooked up your BinaryToDocument Snapโฆ
Depending on what you want to do with the documents, youโll want to encode/decode differently. If you want to pass around raw bytes, then you can choose โBYTE_ARRAYโ but if you want to do some parsing of your txt files, then you probably will want to use โNONEโ encoding. What this is actually doing behind the scenes is pushing the data into a String format with the default charset of your node (likely UTF-8 depending on the file, but I digress).
The next move is to hook up a Mapper Snap. In the expression field, enter a dollar sign (make sure the equals sign is checked). Same with the target path field.
Now validate the pipeline one time. Might take a moment to run. When you open up the Mapper Snap (not the preview data) You can see the schema in the left hand side. Using this, you could map the โcontent-locationโ to some other โnameโ field that you should be able to grab from the output of the Mapper during the actual execution. This field could be keyed on as the input to the File Delete I believe.
Let me know how this works out, and we can go further.
Thanks,
-Charlie
โ01-10-2018 05:04 PM
Thank you, Charlie. I able to succeed with three snaps. Attached two screenshots for your ref.
Can you please guide me how to extract data from the source file (it is the tab delimited file) to map to SQL table.
How to extract the filename from the filepath. I want to extract a region from filename as well.
Appreciate your support.
โ01-11-2018 11:10 AM
Hi Amar,
Sorry for the delay. After reading about the two things that you want to (parse data from the filename and parsing the data) I had a conversation with a colleague to verify some of my thoughts, and get more advice. We may need to create two pipelines โAโ and โBโ in pipeline A We switch from MultiFileReader to DirectoryBrowser. This is because Directory Browser will give non binary data. The resulting document will contain fields like name, type, size, path. You could hook this into a PipelineExecute Snap. Which will execute Pipe โBโ You can pass this snap the โpathโ field.
In Pipeline โBโ You can use a FileReader Snap to read the โpathโ field. Your file that gets returned, you said was a .txt file, although it is delimited with tabs and so should be able to be treated like a CSV file. So you should be able to hook up a CSV Formatter Snap to this File Reader. Then you should be able to process the data in this Pipeline โBโ.
You could also ramp up the pool size, and let the Pipeline Execute Snap run multiple executions in parallel. But for starters, letโs see how it runs with one thread.
Thanks,
-Charlie