We are currently creating a Pipeline where we are using a Pipeline Execute. Pipeline Execute we have a batch number defined. In the pipeline that is triggered from PE we are writing a file. I would like to know if there is a way to get the batch number that is triggered i.e., lets say i have 650 records in source and batch number in PE is defined as 100. So for every 100 records it creates a file i want the file name created as filename_batch_1.csv so on and 7th file will have filename_batch_7.csv. Is it possible?
Please find the attached sample pipelines:
pipe_1_2022_03_03.slp (6.7 KB) - parent
pipe_2_2022_03_03.slp (5.2 KB) - child
In the parent pipeline, using the Group By N snap you can specify the size of the group/batch to be sent to the child pipeline.
For every batch/group there will be separate child pipeline invocation and the number of the batch will be included in the filename.
we tried implementing this but it would cause an issue if we process more than 3 million records
Using the Group By N snap will work fine for smaller document sizes and document counts. Since grouping combines multiple documents and creates larger documents in memory, that approach is not recommended when document sizes or counts are large.
The batching option in the PipeExec snap does not support automatically passing a batch number. The parent pipeline can use an expression like
((snap.in.totalCount + 1000) / 1000).toFixed() to generate a batch number to pass to the child pipeline, the child can use that info to generate the file name. See the attached parent and child pipelines
A couple of releases ago we added a
Memory Sensitivity property to the
Group By N snap. Set it to
Dynamic to dynamically reduce the group size according to available memory conditions.
Thank you. This worked well without impacting performance.
I’m glad to hear that! Thanks for the update.