cancel
Showing results for 
Search instead for 
Did you mean: 

split target csv file into more smaller CSV's

SL12345
New Contributor III

Hi Snaplogic experts, 

my pipeline is creating one big CSV file (50k rows) as the result. Collegues asked me, if it is possible to split this big CSV file into smaller ones (create csv files with thousand rows)

My question is, how to make  more csv files as an output ? 

my idea is: 
get number of all  rows for output,  using Math.floor function  get how many iterations are needed and then make a loop in order to split rows into  csv files first file - row  1 - 1000 second file 1001 - 2000 and so on .... 

 

or is there another better approach? 

Thank you

1 ACCEPTED SOLUTION

koryknick
Employee
Employee

@SL12345 - You can also use the Pipeline Execute snap and set the Batch Size value to your desired number of records in the target.  Create a child pipeline that is just a CSV Formatter and File Writer snap and call that child in your Pipeline Execute.  

Basically, this passes the number of records specified in the Batch Size property to an instance of the child pipeline which can create a file of those records and finish, then the parent will start a new child with the next "batch" of records, and will keep iterating until all input records are consumed.  Simple data chunking of your original file.

Hope this helps!

View solution in original post

5 REPLIES 5

alchemiz
Contributor III

Hi @SL12345 

See attached, hope this helps.

Assumption CSV file exits and is delimited by CR+LF 

alchemiz_0-1741084862453.png
~alchemiz

@alchemiz - one warning I have with this approach is that very large files would be retained in memory through each snap.  Since the point is to split a very large file, you may find your snaplex node resources drained while trying to run this splitter.

alchemiz
Contributor III

CSV is row delimited by CR+LF 

koryknick
Employee
Employee

@SL12345 - You can also use the Pipeline Execute snap and set the Batch Size value to your desired number of records in the target.  Create a child pipeline that is just a CSV Formatter and File Writer snap and call that child in your Pipeline Execute.  

Basically, this passes the number of records specified in the Batch Size property to an instance of the child pipeline which can create a file of those records and finish, then the parent will start a new child with the next "batch" of records, and will keep iterating until all input records are consumed.  Simple data chunking of your original file.

Hope this helps!