What is the difference between “Group By N”, “Batch Size”, “Pool Size” in terms of processing documents?
The “Group Size” or “Batch Size” properties in snaps tend to refer to how many input documents will be collected before they perform an operation. For example, a database Insert snap with a batch size of 50 will wait for 50 documents to be received before it makes the call to the database to do the insertion. The GroupByN snap is similar, it will wait for N documents to be received before it writes an output document.
The “Pool Size” property in the PipelineExecute snap refers to the maximum number of child executions that it will run in parallel. For example, with a pool size of 5, the snap will run anywhere from 1 to 5 executions in parallel over its lifetime. Note that the snap does not wait for 5 documents to be received before it does anything. As soon as a document is received, PipelineExecute will try to start the child pipeline.
Thank You @tstack