01-08-2018 12:16 PM
Hi All,
I am trying to connect SFTP server and fetch multiple files from the source remote directory. File Reader snap has an option to get SFTP account but doesn’t find how to provide remote directory path.
In the remote directory, we are getting different types of files like *.txt where I need to apply filters based on the filenames to differentiate and invoke the different process to load into SQL DB.
Once I fetch the files from remote location how to delete the files.
Appreciate your inputs and any relevant links to look into more details. Let me know if you need any further details.
Thanks in advance,
Amar
01-09-2018 09:41 PM
Hi Amar,
Let’s first start with the Multi File Reader Snap. If you create a new one, and then in the account tab, create a new account. For one of our internal test accounts, I chose basic auth. I am prompted for user name password, which I give. Next in the main settings tab of the Snap, I input the following under file/folder the format: sftp://hostName/DirectoryStructure/TargetFolder/
Then in the wildcard I put “.csv” And I am able to run successfully. If I look at my output as JSON, I can see “content-location” which includes the absolute path to the file name (including the file name). From this point, you should be able to use a BinaryToDocument Snap in order to operate on the document data. But let’s see how things go up to this point first.
Thanks, and apologies for the delay.
-Charlie
01-09-2018 11:05 PM
Thank you, Charlie. I’ve followed your steps for the Multi File Reader Snap and able to succeed to fetch files from the remote directory.
Attached screenshot for your ref.
I also used BinaryToDocument snap and tried all options under encode or decode parameters. But I am getting below message in the dialogue window.
“Preview data is not found or too large for browser to decrypt”
Appreciate your next steps to get the document data.
Thanks,
Amar
01-09-2018 11:57 PM
Hi Amar, glad you’ve had success so far! I experienced the same when I connect the BinaryToDocument to a Multi File Reader. However, if I connect it to a FileReader and specify one specific file, I get preview data. This may be a validation mode limitation.
So the output of the MultiFile Reader is showing me JSON data with attributes about the files, including the file handle. The handle is in the “content-location” field.
If you’ve hooked up your BinaryToDocument Snap…
Depending on what you want to do with the documents, you’ll want to encode/decode differently. If you want to pass around raw bytes, then you can choose “BYTE_ARRAY” but if you want to do some parsing of your txt files, then you probably will want to use “NONE” encoding. What this is actually doing behind the scenes is pushing the data into a String format with the default charset of your node (likely UTF-8 depending on the file, but I digress).
The next move is to hook up a Mapper Snap. In the expression field, enter a dollar sign (make sure the equals sign is checked). Same with the target path field.
Now validate the pipeline one time. Might take a moment to run. When you open up the Mapper Snap (not the preview data) You can see the schema in the left hand side. Using this, you could map the “content-location” to some other “name” field that you should be able to grab from the output of the Mapper during the actual execution. This field could be keyed on as the input to the File Delete I believe.
Let me know how this works out, and we can go further.
Thanks,
-Charlie
01-10-2018 05:04 PM
Thank you, Charlie. I able to succeed with three snaps. Attached two screenshots for your ref.
Can you please guide me how to extract data from the source file (it is the tab delimited file) to map to SQL table.
How to extract the filename from the filepath. I want to extract a region from filename as well.
Appreciate your support.
01-11-2018 11:10 AM
Hi Amar,
Sorry for the delay. After reading about the two things that you want to (parse data from the filename and parsing the data) I had a conversation with a colleague to verify some of my thoughts, and get more advice. We may need to create two pipelines “A” and “B” in pipeline A We switch from MultiFileReader to DirectoryBrowser. This is because Directory Browser will give non binary data. The resulting document will contain fields like name, type, size, path. You could hook this into a PipelineExecute Snap. Which will execute Pipe “B” You can pass this snap the “path” field.
In Pipeline “B” You can use a FileReader Snap to read the “path” field. Your file that gets returned, you said was a .txt file, although it is delimited with tabs and so should be able to be treated like a CSV file. So you should be able to hook up a CSV Formatter Snap to this File Reader. Then you should be able to process the data in this Pipeline “B”.
You could also ramp up the pool size, and let the Pipeline Execute Snap run multiple executions in parallel. But for starters, let’s see how it runs with one thread.
Thanks,
-Charlie