cancel
Showing results for 
Search instead for 
Did you mean: 

Increase Pipeline Throughput By Parallelizing Service Calls

dwhite
Employee
Employee

Many times when designing a pipeline for a use case one may need to make a call to an external service (REST/SOAP). In SnapLogic REST and SOAP snaps operate serially, meaning they can only process one document at a time. Some service calls may take an extended period of time to come back, so this will essentially block one’s pipeline execution until a response is received. This may be an acceptable for a pipeline with little volume, but if one is processing thousands of documents or more and each call takes a few seconds to complete, it can rack up quite the runtime.

To get around this, one can distribute the incoming documents over a number of outputs and add more snaps to make service calls.

To evenly distribute incoming documents across multiple outputs, one can use the router snap in “autorouting” mode. What this will do is round-robin incoming documents over the number of given outputs. To setup, drag a router snap onto one’s pipeline, add the desired number of output views, and leave the expressions correlating to those output views blank.

3272be1819962ebf25201c0c1090ce930818769f.PNG

After that is setup, copy one’s snap making the service call and add one to each of the router outputs.

e63b3bd4aaf66a5621399534cab69d22ff6da1bd.PNG

Union the REST outputs to recombine the streams, and then reconnect the remaining pipeline logic.

6364fbedad1cc8b66e148d91615e68132cc7f1b5.PNG

Thats it!

If your service calls block even a few seconds and one is making a large number of calls, this should decrease the runtime a bit. The exact amount of parallelism to use will be a factor of the use case and the service endpoint. I’ve found that two calls has worked best most of the time, only on extreme volume was an advantage gained by going to 3 or more.

6 REPLIES 6

It is basically distribution of load and to achieve greater performance.

You have to take into the snaplexes and nodes into account.

Suppose you have a large snaplogic environment with multiple snaplexes and multiple nodes, you can basically route this to different snaplexes and achieve greater performance.

With just one pipeline execute you cannot do that. They will be distributed to only one snaplex even though it has multiple nodes in it.

Ok, that makes sense then. Thanks!