cancel
Showing results for 
Search instead for 
Did you mean: 

How to get the file name(s) from a Multi File Reader

mxpeskir
New Contributor

I have a Multi File Reader reading a series of S3 files using a wildcard, and writing the data to Snowflake. There is a Mapper in between. Functionally, everything is working as expected.
I’d like to get the name of the file in which the data was read, and write it to a Meta_FileName column. How do I retrieve the file name from the Multi File Reader? I’m sort of assuming it’s an expression to be added in the Mapper but not sure. TIA!

14 REPLIES 14

@sghneim

Hmm, the same code it is working on my side. So, I am wondering if the file data can have some impact.

Could you try to re-drag the Join Snap(first delete existing Join, and drag new one from Snap Pallet again), and add the same configuration.

Inner Join with “1” for Left and Right Join Paths.

Regards,
Spiro Taleski

i did delete the join and dragged a new one to the canvas , the set up is below but still no luck with this .
InnerJoin

@sghneim

Strange.

As a workaround, you can achieve the same by using Directory Browser Snap and Parent-Child pipeline configuration.

  1. Parent Pipeline

    • Use Directory Browser Snap to list the files from s3 location
    • Use Pipeline Execute Snap to call the child pipeline and send the file name as a pipeline parameter
  2. Child Pipeline

    • Create pipeline parameter to receive the file names from the Parent pipeline.
    • Read the files using File Reader Snap
    • Map the content and file name from parameter using Mapper Snap

This is actually what was proposed above by tstack.

t’s currently not possible to pass the binary header that contains the file name through the CSVParser. Instead, you can use a DirectoryBrowser snap to get the file names and then kick off child pipelines to read the files and do the SnowflakeUpserts. In that case, you’ll be passing the filename as a pipeline parameter to the child pipeline, so you can use a Mapper to add the parameter into the documents that are going into the Upsert. A side-benefit of this is that you can process multiple files in parallel.

Regards,
Spiro Taleski

thank you for responding do you think this will work if i have subfolders in my source? i tried the recommended method and i am getting an error . i believe the issue is that i am passing the child pipeline multiple file names at once , this is due to the fact that i have multiple files in a folder.
error child

@sghneim

Directory Browser Snap should work with subfolders. Please check the documentation: https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/1438716/Directory+Browser

Regarding the error. From what I can see is that the child pipeline has more than one unlinked output view. Please check the child pipeline, and make sure that you only have 1 or no output views.

Regards,
Spiro Taleski