Forum Discussion

amarb's avatar
amarb
New Contributor II
8 years ago

Connect SFTP server to fetch multiple files

Hi All,

I am trying to connect SFTP server and fetch multiple files from the source remote directory. File Reader snap has an option to get SFTP account but doesn’t find how to provide remote directory path.

In the remote directory, we are getting different types of files like *.txt where I need to apply filters based on the filenames to differentiate and invoke the different process to load into SQL DB.

Once I fetch the files from remote location how to delete the files.

Appreciate your inputs and any relevant links to look into more details. Let me know if you need any further details.

Thanks in advance,
Amar

16 Replies

  • chenry's avatar
    chenry
    Former Employee

    Hi Amar, glad you’ve had success so far! I experienced the same when I connect the BinaryToDocument to a Multi File Reader. However, if I connect it to a FileReader and specify one specific file, I get preview data. This may be a validation mode limitation.

    So the output of the MultiFile Reader is showing me JSON data with attributes about the files, including the file handle. The handle is in the “content-location” field.

    If you’ve hooked up your BinaryToDocument Snap…
    Depending on what you want to do with the documents, you’ll want to encode/decode differently. If you want to pass around raw bytes, then you can choose “BYTE_ARRAY” but if you want to do some parsing of your txt files, then you probably will want to use “NONE” encoding. What this is actually doing behind the scenes is pushing the data into a String format with the default charset of your node (likely UTF-8 depending on the file, but I digress).

    The next move is to hook up a Mapper Snap. In the expression field, enter a dollar sign (make sure the equals sign is checked). Same with the target path field.

    Now validate the pipeline one time. Might take a moment to run. When you open up the Mapper Snap (not the preview data) You can see the schema in the left hand side. Using this, you could map the “content-location” to some other “name” field that you should be able to grab from the output of the Mapper during the actual execution. This field could be keyed on as the input to the File Delete I believe.

    Let me know how this works out, and we can go further.
    Thanks,
    -Charlie

    • amarb's avatar
      amarb
      New Contributor II

      Thank you, Charlie. I able to succeed with three snaps. Attached two screenshots for your ref.

      Can you please guide me how to extract data from the source file (it is the tab delimited file) to map to SQL table.
      How to extract the filename from the filepath. I want to extract a region from filename as well.

      Appreciate your support.

  • chenry's avatar
    chenry
    Former Employee

    Hi Amar,

    I believe that the first challenge could be solved by using the FilePoller, and use the file filter option. This Snap will produce output containing the paths of the *.txt files, but not the actual files. After that, you could potentially use a ForEach Snap to process the path to the *.txt file. I need to confer with my teammates to see how one would go about deleting the files.

    Hope this gets you off to a good start!
    Thanks,
    -Charlie

    • amarb's avatar
      amarb
      New Contributor II

      Thank you, Charlie for the quick response. I’ve tried FilePoller and fetch four files from the remote SFTP directory. FilePoller output showing four different paths for each individual file.

      As a second snap, I have tried ForEach Snap but could not able to process the path. I might be giving wrong parameters. Attached ForEach Snap. Could you please guide how to process each individual file further.

      Thanks again
      Amar

  • chenry's avatar
    chenry
    Former Employee

    Hi Amar,
    Sorry for the delay. I was thinking more about your scenario, and I think I have a better suggestion. Since you may not need to “poll” for the files, I am thinking a different Snap would be better.

    Say I want to boil the problem down to grabbing the files, and then deleting them…You could use a MultiFileReader Snap, with a file filter to get the desired types of file. The output will be a binary data stream.

    You could connect this to a BinaryToDocument Snap and then perform operations on that document. At some point you would need to use a Mapper to key on the filename and pass that to a FileDelete Snap.

    For the simpler case of just say…reading files and deleting them for proof of concept, you could use a DirectoryBrowser instead of the MultiFileReader. The DirectoryBrowser would return you attributes about all of the files that match the type filter (essentially a List in document format versus binary). Then you’d plug this into a Mapper Snap, to map the filename. Then you’d plug this into a Delete Snap where you configure it to read the file name from the mapped variable.

    I hope this helps you get further towards your goal!

    Thanks,
    -Charlie

    • amarb's avatar
      amarb
      New Contributor II

      Hi Charlie,

      Thank you, for the giving other options. I am still struggling to process the files from the first snap.
      I am providing my findings for your reference. Appreciate your inputs further.

      MultiFileReader Snap is not taking sftp path under folder/file section. I am getting below exception:
      The same path is working other Snaps but not in MultiFileReader Snap. Provide your suggestions based on the error.

      Multi File Reader( Multi File Reader )
      Reads binary data from an endpoint (e.g. file, sldb, s3, sftp, ftp, adl, etc).

      Snap has the following errostrong textrs. Please open the info box and edit the highlighted fields.
      Could not compile expression: _sftp://ftp.ext.hpe.com:2222/_SC … (Reason: Encountered extraneous input ‘:’ at line 1:4; expecting={, ‘||’, ‘&&’, ‘^’, ‘==’, ‘!=’, ‘>’, ‘<’, ‘>=’, ‘<=’, ‘+’, ‘-’, ‘*’, ‘/’, ‘%’, ‘?’, ‘instanceof’, ‘in’, ‘[’, ‘(’, ‘.’, PropertyRef}; Resolution: Please check expression syntax)

      I’ve tried DirectoryBrowser and able to fetch files from the SFTP server and used mapper snap to take filename into the variable. If I use file delete snap again it is throwing same error as multi-file reader snap exception. The file parameter is not taking SFTP path with variable filename.

      Appreciate your further support and also let me know how to extract data from the original file. The attributes from the directory browser snap showing bytes and name only.

      Thanks in advance,
      Amar

      • chenry's avatar
        chenry
        Former Employee

        Hi Amar,
        Let’s first start with the Multi File Reader Snap. If you create a new one, and then in the account tab, create a new account. For one of our internal test accounts, I chose basic auth. I am prompted for user name password, which I give. Next in the main settings tab of the Snap, I input the following under file/folder the format: sftp://hostName/DirectoryStructure/TargetFolder/
        Then in the wildcard I put “.csv” And I am able to run successfully. If I look at my output as JSON, I can see “content-location” which includes the absolute path to the file name (including the file name). From this point, you should be able to use a BinaryToDocument Snap in order to operate on the document data. But let’s see how things go up to this point first.

        Thanks, and apologies for the delay.
        -Charlie

  • chenry's avatar
    chenry
    Former Employee

    Hi Amar,

    Sorry for the delay. After reading about the two things that you want to (parse data from the filename and parsing the data) I had a conversation with a colleague to verify some of my thoughts, and get more advice. We may need to create two pipelines “A” and “B” in pipeline A We switch from MultiFileReader to DirectoryBrowser. This is because Directory Browser will give non binary data. The resulting document will contain fields like name, type, size, path. You could hook this into a PipelineExecute Snap. Which will execute Pipe “B” You can pass this snap the “path” field.

    In Pipeline “B” You can use a FileReader Snap to read the “path” field. Your file that gets returned, you said was a .txt file, although it is delimited with tabs and so should be able to be treated like a CSV file. So you should be able to hook up a CSV Formatter Snap to this File Reader. Then you should be able to process the data in this Pipeline “B”.

    You could also ramp up the pool size, and let the Pipeline Execute Snap run multiple executions in parallel. But for starters, let’s see how it runs with one thread.
    Thanks,
    -Charlie

  • amarb's avatar
    amarb
    New Contributor II

    Hi Charlie,

    I’ve tried Directory Browser snap to fetch files. When I hook pipelineExecute Snap I am getting the error.

    Failure: Pipeline did not complete successfully, Reason: Snap errors: {ruuid=830fa0d3-2253-40b5-b419-2a4fd53dfa97, reason=Value referenced in the sub-expression ‘$Path’ is null, label=File Reader, resolution=Please check expression syntax and data types.}, Resolution: Fix the child pipeline errors and try again

    I might be giving the wrong expression in the parameter section. Can you please look into pipeline Execute and File reader snap screenshots and suggest the correction.

    Attached screenshots.

    Thanks,
    Amar

    • robin's avatar
      robin
      Former Employee

      @amarb, the “Execution Label” setting is not expression-enabled. Please enable it by selecting the “=” button beside the field. As it is not enabled, it is treating the $Path value as literal.

    • del's avatar
      del
      Contributor III

      Also, the Pipeline Parameters section doesn’t look right to me. Typically, the Parameter Name won’t be prefixed with a ‘$’ (although yours might). As well, the Parameter Value should be enabled for expressions by toggling the “=” button.

  • amarb's avatar
    amarb
    New Contributor II

    Thank you Robin and Del. I’ve tried both ways and still getting an error. Attached screenshots for your ref.
    Is there anything I need to provide in the child pipeline. File Reader snap I am using as shown in the previous screenshot.

    • tstack's avatar
      tstack
      Former Employee

      The error is due to the child pipeline failing. I would suggest debugging the child pipeline on its own before trying to run it from the parent. If you haven’t already, add a ‘Path’ pipeline parameter to the child. Note that referencing a pipeline parameter is referenced in an expression using an underscore instead of a dollar sign. So, your FileReader snap should use ‘_Path’ for the File property. When defining the pipeline parameter, you can give a default value that is a test file path so that the child pipeline will run correctly. Then, when the child is executed from the parent pipeline, the default value will be replaced with the ‘Path’ passed in via the PipelineExecute snap.

      • amarb's avatar
        amarb
        New Contributor II

        Hi tstack,

        I’ve tried and able to succeded below three snaps to fetch files from SFTP server

        Directory Browser
        File Poller
        Multi File Reader

        Based on suggestions tried child pipeline to get filename and file content. I tried using underscore in the File Reader but still getting same error. My goal is to extract filename to find out region from the name and extract file content to map SQL table and load into SQL server.

        Appreciate your help!!

        Thanks,