Connect SFTP server to fetch multiple files

Hi All,

I am trying to connect SFTP server and fetch multiple files from the source remote directory. File Reader snap has an option to get SFTP account but doesn’t find how to provide remote directory path.

In the remote directory, we are getting different types of files like *.txt where I need to apply filters based on the filenames to differentiate and invoke the different process to load into SQL DB.

Once I fetch the files from remote location how to delete the files.

Appreciate your inputs and any relevant links to look into more details. Let me know if you need any further details.

Thanks in advance,
Amar

Hi Amar,

I believe that the first challenge could be solved by using the FilePoller, and use the file filter option. This Snap will produce output containing the paths of the *.txt files, but not the actual files. After that, you could potentially use a ForEach Snap to process the path to the *.txt file. I need to confer with my teammates to see how one would go about deleting the files.

Hope this gets you off to a good start!
Thanks,
-Charlie

Thank you, Charlie for the quick response. I’ve tried FilePoller and fetch four files from the remote SFTP directory. FilePoller output showing four different paths for each individual file.

As a second snap, I have tried ForEach Snap but could not able to process the path. I might be giving wrong parameters. Attached ForEach Snap. Could you please guide how to process each individual file further.

Thanks again
Amar

Hi Amar,
Sorry for the delay. I was thinking more about your scenario, and I think I have a better suggestion. Since you may not need to “poll” for the files, I am thinking a different Snap would be better.

Say I want to boil the problem down to grabbing the files, and then deleting them…You could use a MultiFileReader Snap, with a file filter to get the desired types of file. The output will be a binary data stream.

You could connect this to a BinaryToDocument Snap and then perform operations on that document. At some point you would need to use a Mapper to key on the filename and pass that to a FileDelete Snap.

For the simpler case of just say…reading files and deleting them for proof of concept, you could use a DirectoryBrowser instead of the MultiFileReader. The DirectoryBrowser would return you attributes about all of the files that match the type filter (essentially a List in document format versus binary). Then you’d plug this into a Mapper Snap, to map the filename. Then you’d plug this into a Delete Snap where you configure it to read the file name from the mapped variable.

I hope this helps you get further towards your goal!

Thanks,
-Charlie

Hi Charlie,

Thank you, for the giving other options. I am still struggling to process the files from the first snap.
I am providing my findings for your reference. Appreciate your inputs further.

MultiFileReader Snap is not taking sftp path under folder/file section. I am getting below exception:
The same path is working other Snaps but not in MultiFileReader Snap. Provide your suggestions based on the error.

Multi File Reader( Multi File Reader )
Reads binary data from an endpoint (e.g. file, sldb, s3, sftp, ftp, adl, etc).

Snap has the following errostrong textrs. Please open the info box and edit the highlighted fields.
Could not compile expression: _sftp://ftp.ext.hpe.com:2222/_SC … (Reason: Encountered extraneous input ‘:’ at line 1:4; expecting={, ‘||’, ‘&&’, ‘^’, ‘==’, ‘!=’, ‘>’, ‘<’, ‘>=’, ‘<=’, ‘+’, ‘-’, ‘*’, ‘/’, ‘%’, ‘?’, ‘instanceof’, ‘in’, ‘[’, ‘(’, ‘.’, PropertyRef}; Resolution: Please check expression syntax)

I’ve tried DirectoryBrowser and able to fetch files from the SFTP server and used mapper snap to take filename into the variable. If I use file delete snap again it is throwing same error as multi-file reader snap exception. The file parameter is not taking SFTP path with variable filename.

Appreciate your further support and also let me know how to extract data from the original file. The attributes from the directory browser snap showing bytes and name only.

Thanks in advance,
Amar

Hi Amar,
Let’s first start with the Multi File Reader Snap. If you create a new one, and then in the account tab, create a new account. For one of our internal test accounts, I chose basic auth. I am prompted for user name password, which I give. Next in the main settings tab of the Snap, I input the following under file/folder the format: sftp://hostName/DirectoryStructure/TargetFolder/
Then in the wildcard I put “.csv” And I am able to run successfully. If I look at my output as JSON, I can see “content-location” which includes the absolute path to the file name (including the file name). From this point, you should be able to use a BinaryToDocument Snap in order to operate on the document data. But let’s see how things go up to this point first.

Thanks, and apologies for the delay.
-Charlie

1 Like

Thank you, Charlie. I’ve followed your steps for the Multi File Reader Snap and able to succeed to fetch files from the remote directory.
Attached screenshot for your ref.

I also used BinaryToDocument snap and tried all options under encode or decode parameters. But I am getting below message in the dialogue window.

“Preview data is not found or too large for browser to decrypt”

Appreciate your next steps to get the document data.

Thanks,
Amar

Hi Amar, glad you’ve had success so far! I experienced the same when I connect the BinaryToDocument to a Multi File Reader. However, if I connect it to a FileReader and specify one specific file, I get preview data. This may be a validation mode limitation.

So the output of the MultiFile Reader is showing me JSON data with attributes about the files, including the file handle. The handle is in the “content-location” field.

If you’ve hooked up your BinaryToDocument Snap…
Depending on what you want to do with the documents, you’ll want to encode/decode differently. If you want to pass around raw bytes, then you can choose “BYTE_ARRAY” but if you want to do some parsing of your txt files, then you probably will want to use “NONE” encoding. What this is actually doing behind the scenes is pushing the data into a String format with the default charset of your node (likely UTF-8 depending on the file, but I digress).

The next move is to hook up a Mapper Snap. In the expression field, enter a dollar sign (make sure the equals sign is checked). Same with the target path field.

Now validate the pipeline one time. Might take a moment to run. When you open up the Mapper Snap (not the preview data) You can see the schema in the left hand side. Using this, you could map the “content-location” to some other “name” field that you should be able to grab from the output of the Mapper during the actual execution. This field could be keyed on as the input to the File Delete I believe.

Let me know how this works out, and we can go further.
Thanks,
-Charlie

1 Like

Thank you, Charlie. I able to succeed with three snaps. Attached two screenshots for your ref.

Can you please guide me how to extract data from the source file (it is the tab delimited file) to map to SQL table.
How to extract the filename from the filepath. I want to extract a region from filename as well.

Appreciate your support.

Hi Amar,

Sorry for the delay. After reading about the two things that you want to (parse data from the filename and parsing the data) I had a conversation with a colleague to verify some of my thoughts, and get more advice. We may need to create two pipelines “A” and “B” in pipeline A We switch from MultiFileReader to DirectoryBrowser. This is because Directory Browser will give non binary data. The resulting document will contain fields like name, type, size, path. You could hook this into a PipelineExecute Snap. Which will execute Pipe “B” You can pass this snap the “path” field.

In Pipeline “B” You can use a FileReader Snap to read the “path” field. Your file that gets returned, you said was a .txt file, although it is delimited with tabs and so should be able to be treated like a CSV file. So you should be able to hook up a CSV Formatter Snap to this File Reader. Then you should be able to process the data in this Pipeline “B”.

You could also ramp up the pool size, and let the Pipeline Execute Snap run multiple executions in parallel. But for starters, let’s see how it runs with one thread.
Thanks,
-Charlie

Hi Charlie,

I’ve tried Directory Browser snap to fetch files. When I hook pipelineExecute Snap I am getting the error.

Failure: Pipeline did not complete successfully, Reason: Snap errors: {ruuid=830fa0d3-2253-40b5-b419-2a4fd53dfa97, reason=Value referenced in the sub-expression ‘$Path’ is null, label=File Reader, resolution=Please check expression syntax and data types.}, Resolution: Fix the child pipeline errors and try again

I might be giving the wrong expression in the parameter section. Can you please look into pipeline Execute and File reader snap screenshots and suggest the correction.

Attached screenshots.

FileReader_Snap

Thanks,
Amar

@amarb, the “Execution Label” setting is not expression-enabled. Please enable it by selecting the “=” button beside the field. As it is not enabled, it is treating the $Path value as literal.

Also, the Pipeline Parameters section doesn’t look right to me. Typically, the Parameter Name won’t be prefixed with a ‘$’ (although yours might). As well, the Parameter Value should be enabled for expressions by toggling the “=” button.

Thank you Robin and Del. I’ve tried both ways and still getting an error. Attached screenshots for your ref.
Is there anything I need to provide in the child pipeline. File Reader snap I am using as shown in the previous screenshot.

Changed pipeline parameters

The error is due to the child pipeline failing. I would suggest debugging the child pipeline on its own before trying to run it from the parent. If you haven’t already, add a ‘Path’ pipeline parameter to the child. Note that referencing a pipeline parameter is referenced in an expression using an underscore instead of a dollar sign. So, your FileReader snap should use ‘_Path’ for the File property. When defining the pipeline parameter, you can give a default value that is a test file path so that the child pipeline will run correctly. Then, when the child is executed from the parent pipeline, the default value will be replaced with the ‘Path’ passed in via the PipelineExecute snap.

Hi tstack,

I’ve tried and able to succeded below three snaps to fetch files from SFTP server

Directory Browser
File Poller
Multi File Reader

Based on suggestions tried child pipeline to get filename and file content. I tried using underscore in the File Reader but still getting same error. My goal is to extract filename to find out region from the name and extract file content to map SQL table and load into SQL server.

Appreciate your help!!

Thanks,

I’m attaching a project that, I think, is similar to what you want to do. You should be able to import it in the Manager and then play around with both of the pipelines it contains. The ProcessAllFiles pipeline is the parent that uses a DirectoryBrowser to look for “*.csv” files in the project. The parent pipeline then invokes the ProcessOneFile child pipeline to read it in, parse the CSV data and then format it as a JSON file that is written to the same project.

FileProcessingExample.zip (4.3 KB)

Are you sure the File property was marked as an expression? Based on your previous screenshot, the error mentions that it cannot find the file “…/projects-AmarnathBolleddula/$Path” which looks like it’s searching for a file with the literal name ‘$Path’ and not evaluating the expression.