Increase Pipeline Throughput By Parallelizing Service Calls

Question

Many times when designing a pipeline for a use case one may need to make a call to an external service (REST/SOAP). In SnapLogic REST and SOAP snaps operate serially, meaning they can only process one document at a time. Some service calls may take an extended period of time to come back, so this will essentially block one’s pipeline execution until a response is received. This may be an acceptable for a pipeline with little volume, but if one is processing thousands of documents or more and each call takes a few seconds to complete, it can rack up quite the runtime.
To get around this, one can distribute the incoming documents over a number of outputs and add more snaps to make service calls.
To evenly distribute incoming documents across multiple outputs, one can use the router snap in “autorouting” mode. What this will do is round-robin incoming documents over the number of given outputs. To setup, drag a router snap onto one’s pipeline, add the desired number of output views, and leave the expressions correlating to those output views blank.

After that is setup, copy one’s snap making the service call and add one to each of the router outputs.

Union the REST outputs to recombine the streams, and then reconnect the remaining pipeline logic.

Thats it!
If your service calls block even a few seconds and one is making a large number of calls, this should decrease the runtime a bit. The exact amount of parallelism to use will be a factor of the use case and the service endpoint. I’ve found that two calls has worked best most of the time, only on extreme volume was an advantage gained by going to 3 or more.

tstack · Answer

Another approach would be to put the parts of the pipeline you wish to parallelize in a child pipeline that is called with the Pipeline Execute snap.  Using a child pipeline will save you the trouble of having to maintain copies of the snaps and make it easier to test different concurrency levels since it’s just a matter of changing the ‘Pool Size’ property.

walkerline117 · Answer

So is this the same as having a sub-pipeline and tune the pool size of the Pipeline execute?

nganapathiraju · Answer

This is in addition to this.
So if you have 100 documents, and only one pipeline execute with a pool size of 10, then 10 are distributed for each instantiation of execution.
In this case net instantiations are only 10.
Now with the same amount of documents, you have 2 routes each calling pipeline execute with a pool size of 10 then 50 are distributed in 2 routes and then 5 run across each instantiation of execution.
In this case net instantiations are now 20.
Hope that makes sense.

walkerline117 · Answer

So then why I have such duplicate in terms of snap and why not just do one pipeline execute with pool size of 20?

nganapathiraju · Answer

It is basically distribution of load and to achieve greater performance.
You have to take into the snaplexes and nodes into account.
Suppose you have a large snaplogic environment with multiple snaplexes and multiple nodes, you can basically route this to different snaplexes and achieve greater performance.
With just one pipeline execute you cannot do that. They will be distributed to only one snaplex even though it has multiple nodes in it.

Forum Discussion

Increase Pipeline Throughput By Parallelizing Service Calls

Recent Discussions

Pagination and nextCursor in header

Javascript to promote top level lists

Google Sheets Subscribe questions

Basic string transformations not working

Can we generate XML file in pretty print format using native snapLogic snaps?