12-17-2021 02:13 AM
I’ve got a pipeline which is reading multiple files.
A simplified version is the below:
I’d need the output to contain a column with the filename of the original file.
As the file content has nothing to do with the filename, i can’t use the normal “copy / join” pattern, as the join would have to be one without condition - which would however multiply my data per input file.
CSV Parser does not seem to allow me to pass through the filename - which would in reality be the simplest and best solution.
Has anyone built something similar - or has some ideas on how to accomplish this?
Solved! Go to Solution.
12-17-2021 02:41 AM
Hey @matthias.voppichler,
You can use Pipeline Execute. Add the CSV in a child pipeline, and pass the filename as a parameter, that way you can read the parameter and map it on every line that will come out of the CSV Parser using a Mapper. Files will be read one at a time, parsed, and the filename will be added to the lines accordingly.
You can even combine Marjan’s solution with pipeline execute and again you’ll have the same desired result.
Regards,
Bojan
12-17-2021 02:30 AM
Hi @matthias.voppichler ,
You can use the following example:
In binary router you can set the following:
In mapper you can use this:
And in the join you can set this:
Try it and let me know the outcome.
Regards
12-17-2021 02:39 AM
Unfortunately, this solution will only work as long as you only have one filename.
The moment you get multiple file documents (which is the case for me) - i get multiple documents in the “mapper1” step - so the join with “true = true” will multiply my rows per file.
Assuming i have 3 files, with 2 lines each (6 distinct lines expected in the output) - then i would receive 6 x 3 (18) lines - as each content line will be combined with every file name.
12-17-2021 02:56 AM
As @bojanvelevski said, you can use this solution with pipeline execute snap in order to process one document at a time.
Regards,
Marjan
12-17-2021 02:41 AM
Hey @matthias.voppichler,
You can use Pipeline Execute. Add the CSV in a child pipeline, and pass the filename as a parameter, that way you can read the parameter and map it on every line that will come out of the CSV Parser using a Mapper. Files will be read one at a time, parsed, and the filename will be added to the lines accordingly.
You can even combine Marjan’s solution with pipeline execute and again you’ll have the same desired result.
Regards,
Bojan