Parallel Reused Pipeline Executes Uneven Load
I’ve been trying to build out a pipeline that reuses pipeline executes across a pool size of 3 to get input documents on an even spread between them to take advantage of multiple cores on similar work between differing input documents.
However, in practice, it seems like it’s unevenly distributing the work - one such example I’m looking at right now is that two pipeline executes are getting 8 documents each, and the third is getting 25.
My input data is super simple - it just passes in a day to run across, which the internal logic takes use of and does a bunch of self-contained work before completing. The actual work done is not at all simple, but I would figure that when distributing work, it would want to distribute them as evenly as possible.
Is there a way to guarantee that this work does get evenly distributed without pre-aggregating the data to pass into each pipeline execute thread? That works for evenly distributing the work, but it’s annoying as a pattern, plus doesn’t take advantage of the fact that some of the days executing might run significantly faster than other days.