What is the purpose of the "Max in-flight" parameter in an ultra task?

The documentation description “to the maximum number of documents that can be processed by an instance at any one time” does not say a lot

The default value is 200, but it is not clear if the task will shutdown after 200 documents, or if 200 documents are the max number of requests that can be held in the request queue

With an Ultra Task, there are multiple pipeline executions (i.e. instances) that are used to process requests sent to the task. Each request is turned into a document that is fed into one of the executions for processing. Since all of the snaps in a pipeline run in parallel, a pipeline can process many documents simultaneously. Documents that are being processed by a pipeline are “in-flight”. So, the ‘Max in-flight’ parameter restricts how many documents/requests are allowed to be processed by an execution before it will stop accepting more.

Why does this matter? If a pipeline calls out to external services, one of the executions may be slower than the others and that delay will compound for subsequent requests. If this is possible, lowering the ‘Max in-flight’ allows subsequent requests to be processed by other executions that may be faster. For example, if an execution is experiencing a 10 second delay when communicating with a service and 4 requests are fed into this execution with no delay, the first request will get a response after 10 seconds, the second 20 will be seconds later, and so on. However, if the ‘Max in-flight’ was lowered to 1, the second and third requests could be sent to other executions that may be performing better.

What’s the downside to lowering this parameter? It will slow down the performance of the happy path since it disables prefetching requests from the feed-master.