Hi,
We don’t set the snapplex properly as we expect snaplogic to workload balance the nodes within the production snapplex. In our testing, it appears that Snaplogic will eventually chose a different node when a node is busy, but it takes some time for snaplogic to realize that a node is busy.
A key use case of the batches functionality is to take a smaller number of records to a sub-pipeline. For example, in our case, we are pulling massive amounts of weather data for a list of 5-7k locations. If I send all 7k locations at once, the pull overwhelms node as the data pull side is about 260mm records. therefore, we use a pipleline/sub pipeline structure to pull 1000 locations at a time. However, this data must all be pulled in within an hour, so I need to have multiple pulls going on at the same time. If I do it all in a serial pipeline it takes 3 hours. However, by splitting the locations into 2 groups, I can run 2 nodes at once and get done in 1.5 hours…
The snapplex property doesn’t help becuase you can only specific a snapplex. We need better load balancing on our nodes so that it looks quickly at both nodes and decides which is the least used. Support is taking a look at this and they pointed me to documentation which shows that load balancing is done by thread count. Which is fine, but clearly from our testing it only is looking in like 5+ minute resolution, instead of real time when the job is being prepared