a week ago
I have tried using Join type merge but the output looks like below which is not what I am trying to achieve.
What I expect the result to look like:
Solved! Go to Solution.
a week ago
@marenas - I would not recommend using Gate to combine the data files - it can cause excessive memory consumption for very large files since the data has to be stored completely in memory. I recommend the attached approach.
The trick here is in the Mapper on the bottom path and the second input view added to the CSV Parser. If you review the documentation, you will see that the second input view allows you to specify a header and also datatypes, if you choose. I simply added the header in the Mapper.
Then in the Union, it combines the data in the way you are looking for.
One thing to note is that Union will take documents as they come from each input view - meaning in this case that if both CSV Parsers are sending a large volume of records, you will see them intermixed - it does not wait for all of the documents on the first path before consuming the documents from the second path. There are easy fixes for this, but thought I would mention it in case it is a requirement that the data ordering be preserved between the input files.
Hope this helps!
Tuesday
@marenas - Here is an updated version of the pipeline to preserve the order of the files. Basically, I've simply added a Mapper to each path and placed a file-number and record-number to the documents of both paths before the Union, then sorted the data to ensure proper record ordering, and finally removed the sorting fields from the document.
So a couple new concepts here:
Hope this helps!
a week ago
@marenas - I would not recommend using Gate to combine the data files - it can cause excessive memory consumption for very large files since the data has to be stored completely in memory. I recommend the attached approach.
The trick here is in the Mapper on the bottom path and the second input view added to the CSV Parser. If you review the documentation, you will see that the second input view allows you to specify a header and also datatypes, if you choose. I simply added the header in the Mapper.
Then in the Union, it combines the data in the way you are looking for.
One thing to note is that Union will take documents as they come from each input view - meaning in this case that if both CSV Parsers are sending a large volume of records, you will see them intermixed - it does not wait for all of the documents on the first path before consuming the documents from the second path. There are easy fixes for this, but thought I would mention it in case it is a requirement that the data ordering be preserved between the input files.
Hope this helps!
a week ago
Hi Kory,
Thank you for taking the time to review my post and suggest an alternative solution.
I tested the initial pipeline design using the Gate Snap with both input files now including headers, it produced the expected output. However, based on your recommendation, I’ll revise the design to use the Union Snap instead. There is indeed a requirement to preserve the data order between the input files — the records from the first file should appear first, followed by those from the second file, in order.
Thanks again for your support!
Regards,
Marrah
Tuesday
@marenas - Here is an updated version of the pipeline to preserve the order of the files. Basically, I've simply added a Mapper to each path and placed a file-number and record-number to the documents of both paths before the Union, then sorted the data to ensure proper record ordering, and finally removed the sorting fields from the document.
So a couple new concepts here:
Hope this helps!
Tuesday
Hi Kory,
Thank you so much for the detailed explanation. I really appreciate your help! Your approach introduced me to new capabilities I wasn’t previously aware of. There's clearly so much more to explore in SnapLogic, and I'm learning more each time I work through these use cases.
Thanks again for sharing your knowledge!
Regards,
Marrah