Thoughts on Decoupled Batch Process Involving Ultra Pipelines

I have a “batch” process today, where a parent pipeline pulls a list of items (1000s) from some application’s REST API, and executes (pipeline execute) children pipelines for each which work the individual items. Each child process consists of downloading audio recordings from an internal application and then uploading them to an external vendor. So not super speedy… It works okay, but there is overhead spinning up the children processes, and it doesn’t effectively leverage/balance all my groundplex nodes, making it take longer to work the list than I think it should.

We recently purchased ultra, and I was thinking about making a new decoupled process, where the parent process just threw list items on a Kafka topic, and then each node has ultra pipelines running and listening to the topic, each node pulling of messages and doing work at its own pace. Unfortunately, our organization has mandated Kerberos authentication for our Kafka infrastructure, and I don’t have a decent way of doing that with SnapLogic… Does anyone have experience/examples of doing Kafka producers/consumers with SnapLogic using Kerberos authentication?

Then I was thinking instead of using Kafka, that the feedmaster process has some inherent queuing built in, so perhaps I could have my parent process fire off REST API ultra triggered tasks for each child. I don’t really need any kind of response back to the parent job for each item. And I wasn’t sure how much queuing/waiting would be tolerated in that manner, like if I just tried to fire off thousands of requests. Any thoughts/suggestions?

Maybe open a support case for this if you haven’t already so we can take a deeper look. There might be some improvements that could be done.

The requests are stored on disk on the feed-masters and won’t be cleared until the request is fully processed by the ultra pipeline. So, you’ll need enough space to store the queued requests. You’ll also want to reply to the client right away and not wait for processing to finish so that connections are freed up (in other words, use a Copy and one branch sends a response while the other does the actual work).