cancel
Showing results for 
Search instead for 
Did you mean: 

Combine CSV document into a new file

marenas
Contributor
Hello,

I’m reaching out regarding a pipeline that I am building. Here are the details:
Requirement: Combine two csv files (one has a header and the other one does not have)
Steps Taken:
1. Read both csv files at the start of the pipeline
2. Use a 'Gate' snap to combine both inputs
3. Concatenate both inputs using a Mapper with the following expression: [].concat($['doc1'],$['doc2'])
Target path: $data
4. Split the data using JSON splitter
JSON Path to Split: jsonPath($, "data[*]")
5. Write the csv file
 
However, the result of my pipeline looks like this:
marenas_0-1746651536779.png

I have tried using Join type merge but the output looks like below which is not what I am trying to achieve.

marenas_3-1746651727217.png

What I expect the result to look like:

 
marenas_2-1746651570793.png
 
Appreciate your help on this.
 
-- Marrah
2 ACCEPTED SOLUTIONS

koryknick
Employee
Employee

@marenas - I would not recommend using Gate to combine the data files - it can cause excessive memory consumption for very large files since the data has to be stored completely in memory.  I recommend the attached approach.

koryknick_0-1746706858793.png

The trick here is in the Mapper on the bottom path and the second input view added to the CSV Parser.  If you review the documentation, you will see that the second input view allows you to specify a header and also datatypes, if you choose.  I simply added the header in the Mapper.

Then in the Union, it combines the data in the way you are looking for.  

One thing to note is that Union will take documents as they come from each input view - meaning in this case that if both CSV Parsers are sending a large volume of records, you will see them intermixed - it does not wait for all of the documents on the first path before consuming the documents from the second path.  There are easy fixes for this, but thought I would mention it in case it is a requirement that the data ordering be preserved between the input files.

Hope this helps!

View solution in original post

koryknick
Employee
Employee

@marenas - Here is an updated version of the pipeline to preserve the order of the files.  Basically, I've simply added a Mapper to each path and placed a file-number and record-number to the documents of both paths before the Union, then sorted the data to ensure proper record ordering, and finally removed the sorting fields from the document.

So a couple new concepts here:

  • snap.in.totalCount is a built-in value that gives us the number of records seen by the input view of the snap.
  • I enabled Pass-Through in the Mapper settings for the "Add fileN sortKey" snaps to allow the rest of the data to flow through the snap without having to specify it.  In the "Remove sortKey" Mapper, I use Pass-Through again to allow all the fields to pass through except the fields that I have in the Expression without specifying the Target-Path - this effectively deletes just those 2 fields.

Hope this helps!

View solution in original post

4 REPLIES 4

koryknick
Employee
Employee

@marenas - I would not recommend using Gate to combine the data files - it can cause excessive memory consumption for very large files since the data has to be stored completely in memory.  I recommend the attached approach.

koryknick_0-1746706858793.png

The trick here is in the Mapper on the bottom path and the second input view added to the CSV Parser.  If you review the documentation, you will see that the second input view allows you to specify a header and also datatypes, if you choose.  I simply added the header in the Mapper.

Then in the Union, it combines the data in the way you are looking for.  

One thing to note is that Union will take documents as they come from each input view - meaning in this case that if both CSV Parsers are sending a large volume of records, you will see them intermixed - it does not wait for all of the documents on the first path before consuming the documents from the second path.  There are easy fixes for this, but thought I would mention it in case it is a requirement that the data ordering be preserved between the input files.

Hope this helps!

marenas
Contributor

Hi Kory,

Thank you for taking the time to review my post and suggest an alternative solution.

I tested the initial pipeline design using the Gate Snap with both input files now including headers, it produced the expected output. However, based on your recommendation, I’ll revise the design to use the Union Snap instead. There is indeed a requirement to preserve the data order between the input files — the records from the first file should appear first, followed by those from the second file, in order.

Thanks again for your support!

Regards,

Marrah

koryknick
Employee
Employee

@marenas - Here is an updated version of the pipeline to preserve the order of the files.  Basically, I've simply added a Mapper to each path and placed a file-number and record-number to the documents of both paths before the Union, then sorted the data to ensure proper record ordering, and finally removed the sorting fields from the document.

So a couple new concepts here:

  • snap.in.totalCount is a built-in value that gives us the number of records seen by the input view of the snap.
  • I enabled Pass-Through in the Mapper settings for the "Add fileN sortKey" snaps to allow the rest of the data to flow through the snap without having to specify it.  In the "Remove sortKey" Mapper, I use Pass-Through again to allow all the fields to pass through except the fields that I have in the Expression without specifying the Target-Path - this effectively deletes just those 2 fields.

Hope this helps!

marenas
Contributor

Hi Kory, 

Thank you so much for the detailed explanation. I really appreciate your help! Your approach introduced me to new capabilities I wasn’t previously aware of. There's clearly so much more to explore in SnapLogic, and I'm learning more each time I work through these use cases.

Thanks again for sharing your knowledge!

Regards,

Marrah