cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to get the file name(s) from a Multi File Reader

mxpeskir
New Contributor

I have a Multi File Reader reading a series of S3 files using a wildcard, and writing the data to Snowflake. There is a Mapper in between. Functionally, everything is working as expected.
Iโ€™d like to get the name of the file in which the data was read, and write it to a Meta_FileName column. How do I retrieve the file name from the Multi File Reader? Iโ€™m sort of assuming itโ€™s an expression to be added in the Mapper but not sure. TIA!

14 REPLIES 14

@sghneim

Hmm, the same code it is working on my side. So, I am wondering if the file data can have some impact.

Could you try to re-drag the Join Snap(first delete existing Join, and drag new one from Snap Pallet again), and add the same configuration.

Inner Join with โ€œ1โ€ for Left and Right Join Paths.

Regards,
Spiro Taleski

i did delete the join and dragged a new one to the canvas , the set up is below but still no luck with this .
InnerJoin

@sghneim

Strange.

As a workaround, you can achieve the same by using Directory Browser Snap and Parent-Child pipeline configuration.

  1. Parent Pipeline

    • Use Directory Browser Snap to list the files from s3 location
    • Use Pipeline Execute Snap to call the child pipeline and send the file name as a pipeline parameter
  2. Child Pipeline

    • Create pipeline parameter to receive the file names from the Parent pipeline.
    • Read the files using File Reader Snap
    • Map the content and file name from parameter using Mapper Snap

This is actually what was proposed above by tstack.

tโ€™s currently not possible to pass the binary header that contains the file name through the CSVParser. Instead, you can use a DirectoryBrowser snap to get the file names and then kick off child pipelines to read the files and do the SnowflakeUpserts. In that case, youโ€™ll be passing the filename as a pipeline parameter to the child pipeline, so you can use a Mapper to add the parameter into the documents that are going into the Upsert. A side-benefit of this is that you can process multiple files in parallel.

Regards,
Spiro Taleski

thank you for responding do you think this will work if i have subfolders in my source? i tried the recommended method and i am getting an error . i believe the issue is that i am passing the child pipeline multiple file names at once , this is due to the fact that i have multiple files in a folder.
error child

@sghneim

Directory Browser Snap should work with subfolders. Please check the documentation: https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/1438716/Directory+Browser

Regarding the error. From what I can see is that the child pipeline has more than one unlinked output view. Please check the child pipeline, and make sure that you only have 1 or no output views.

Regards,
Spiro Taleski