a week ago
Hi Snaplogic experts,
my pipeline is creating one big CSV file (50k rows) as the result. Collegues asked me, if it is possible to split this big CSV file into smaller ones (create csv files with thousand rows)
My question is, how to make more csv files as an output ?
my idea is:
get number of all rows for output, using Math.floor function get how many iterations are needed and then make a loop in order to split rows into csv files first file - row 1 - 1000 second file 1001 - 2000 and so on ....
or is there another better approach?
Thank you
Solved! Go to Solution.
a week ago
@SL12345 - You can also use the Pipeline Execute snap and set the Batch Size value to your desired number of records in the target. Create a child pipeline that is just a CSV Formatter and File Writer snap and call that child in your Pipeline Execute.
Basically, this passes the number of records specified in the Batch Size property to an instance of the child pipeline which can create a file of those records and finish, then the parent will start a new child with the next "batch" of records, and will keep iterating until all input records are consumed. Simple data chunking of your original file.
Hope this helps!
a week ago
Hi @SL12345
See attached, hope this helps.
Assumption CSV file exits and is delimited by CR+LF
~alchemiz
a week ago
@alchemiz - one warning I have with this approach is that very large files would be retained in memory through each snap. Since the point is to split a very large file, you may find your snaplex node resources drained while trying to run this splitter.
a week ago
CSV is row delimited by CR+LF
a week ago
@SL12345 - You can also use the Pipeline Execute snap and set the Batch Size value to your desired number of records in the target. Create a child pipeline that is just a CSV Formatter and File Writer snap and call that child in your Pipeline Execute.
Basically, this passes the number of records specified in the Batch Size property to an instance of the child pipeline which can create a file of those records and finish, then the parent will start a new child with the next "batch" of records, and will keep iterating until all input records are consumed. Simple data chunking of your original file.
Hope this helps!