SnapLogic - Integration Nation

tfan · ‎10-31-2024

We all love the Pipeline Execute Snap, it greatly simplifies a complex pipeline by extracting sections into a sub-pipeline. But sometimes, we’d really want the ability to run a pipeline multiple times to perform some operations, like polling from an endpoint or performing LLM Tool calls. In this article, we will introduce the PipeLoop Snap, which adds iteration to the SnapLogic programming model. With PipeLoop, we can create new workflows that are previously hard to manage or even impossible.

What is PipeLoop

PipeLoop is a new Snap for iterative execution on a pipeline. For people who are familiar with iterations within programming languages, PipeLoop is essentially a do-while loop for pipelines. The user is required to provide an iteration limit as a hard cutoff to avoid resource depletion or infinite loop, and an optional stop condition to control the execution.

Just like we can pass input documents to PipeExec, we can also pass input documents to PipeLoop, the difference between the two is that the output document of the pipeline executed with PipeLoop will be used as the next round of input to continue the execution until the stop condition is met or limit is reached. Due to this unique mechanism, the pipeline run by PipeLoop must have one unlinked input and one unlinked output to work properly. To put it simply, PipeLoop can be thought of as chaining a bunch of PipeExec Snaps with the same pipeline with variable length and a condition to exit early.

PipeLoop execution flow

1. Input documents to PipeLoop are passed to the child pipeline for execution. 2. Child pipeline executes. 3. Child output is collected. 4. Evaluate stop condition based on document output. If true, exit and pass the output document to PipeLoop, otherwise continue. 5. Check if the iteration limit is reached. If true, exit and pass the output document to PipeLoop, otherwise continue. 6. Use the output document as the next round of input and continue (1.)

PipeLoop execution walkthrough

Let’s start with a very simple example. We’ll create a workflow using PipeLoop that increments a number from 1 to 3. For simplicity, we will refer to the pipeline with PipeLoop as the “Parent pipeline”, and the pipeline that is executed by PipeLoop as the “Child pipeline”.

Parent pipeline setup

The parent pipeline consists of one JSON Generator Snap with one document as input, and one PipeLoop Snap running the pipeline “child” with stop condition “$num >= 3”. We’ll also enable “Debug Iteration output” to see the output of each round in this walkthrough.

Child pipeline setup

The child pipeline consists of a single mapper snap that increments “$num” by 1, which satisfies the requirement “a pipeline with one unlinked input and one unlinked output” for a pipeline to be run by PipeLoop.

Output

The output of PipeLoop consists of two major sections when Debug mode is enabled: the output fields, and _iteration_documents. We can see the final output is “num”: 3, which means PipeLoop has successfully carried out the task.

PipeLoop features

There are multiple features in PipeLoop that can be helpful when building iterating pipelines. We’ll categorize them from where the features are located.

Properties

There are 4 main sections in the property of the PipeLoop Snap.

Pipeline
Pipeline Parameters
Loop options
Execution Options

Pipeline

The pipeline to be run.

Pipeline Parameters

We’ll take a deeper dive into this in the Pipeline Parameters section.

Loop options

Loop options are property settings that are related to iterations of this snap.

Stop condition

The Stop condition field allows the user to set an expression to be evaluated after the first execution has occurred. If the expression is evaluated to true, the iteration will be stopped. The stop condition can be also set to false if the user wishes to use this as a traditional for loop.

There are cases where the user might pass an unintended value into the Stop condition field. In this scenario, PipeLoop generates a warning when the user provides a non-boolean String as the Stop condition, while the stop condition will be treated as false.

Non-boolean Stop condition warning

Iteration limit

The Iteration limit field allows the user to limit the maximum number of iterations that could potentially occur. This field can also be used to limit the total number of executions if the Stop condition is set to false.

Setting a large value for the Iteration limit with debug mode on could be dangerous. The accumulated documents could quickly deplete CPU and RAM resources. To prevent this, PipeLoop generates a warning in the Pipeline Validation Statistics tab when the Iteration limit is set to greater than or equal to 1000 with Debug mode set to enabled.

Large iteration limit with debug mode enabled warning

Debug iteration outputs

This toggle field enables the output from the child pipelines for each iteration and the stop condition evaluation to be added into the final output as a separate field.

Output example with Debug iteration outputs enabled

Execution options

Execute On

To specify where the pipeline execution should take place. Currently only local executions (local snaplex, local node) are supported.

Execution Label

We’ll take a deeper dive into this in the Monitoring section.

Pipeline Parameters

For users that are familiar with Pipeline Parameters in PipeExec, feel free to skip to the next section as the instructions are identical.

Introduction to Pipeline Parameters

Before we take a look at the Pipeline Parameters support in the PipeLoop Snap, let’s take a step back and see what pipeline parameters are and how pipeline parameters can be leveraged.

Pipeline parameters are String constants that can be defined in the Edit Pipeline Configuration settings. Users can use the parameters as a constant to be used anywhere in the pipeline. One major difference for Pipeline parameters and Pipeline variables is that Pipeline parameters are referred using an underscore prefix, whereas Pipeline variables are referred using a dollar sign prefix.

Pipeline Parameters in Edit Pipeline Configuration

Accessing Pipeline Parameters in an expression field

Example

Let’s take a look at Pipeline Parameters in action with PipeLoop. Our target here is to print out “Hello PipeLoop!” n times where n is the value of “num”.

We’ll add two parameters in the child pipeline, param1 and param2. To demonstrate, we assign “value1” to param1 and keep it empty for param2. We’ll then add a message field with the value “Hello PipeLoop!” in the JSON Generator so that we can assign the String value to param2. Now we’re able to use param2 as a constant in the child pipeline. PipeLoop also has field name suggestions built in the Parameter name fields for ease of use.

PipeLoop Pipeline Parameters in action

For our child pipeline, we’ll add a new row in the Mapping table to print out “Hello PipeLoop!” repeatedly (followed with a new line character). One thing to bear in mind is that the order of the Mapping table does not affect the output (the number of “Hello PipeLoop!” printed in this case), as the output fields are updated after the execution of current iteration is finished.

Child Pipeline configuration for our task

Here’s the final result, we can see “Hello PipeLoop!” is being printed twice. Mission complete.

Remarks

Pipeline Parameters are String constants that can be set in Edit Pipeline Configuration.
Users can pass a String to Pipeline Parameters defined in the Child pipeline in PipeLoop.
Pipeline Parameters in PipeLoop will override previous pipeline parameter values defined in the Child pipeline if the parameters share the same name.
Pipeline Parameters are constants, which means the values will not be modified during iterations even if the users did so.

Monitoring

When a snap in a pipeline is executed, there will not be any output until the execution is finished. Therefore, due to the nature of iterating pipeline execution as a single snap, it is slightly difficult to know where the execution is currently at, or which pipeline execution is corresponding to which input document. To deal with this, we have two extra features that can add more visibility to the PipeLoop execution.

Pipeline Statistics progress bar

During the execution of PipeLoop, a progress bar will be available in the Pipeline Validation Statistics tab, so that the user can get an idea of which iteration the PipeLoop is currently at. Note that the progress bar might not reflect the actual iteration index if the child pipeline executions are short, due to polling intervals.

PipeLoop iteration progress bar

Execution Label

When a PipeLoop with multiple input documents is executed, the user will not be able to tell which pipeline execution is linked to which input document in the SnapLogic Monitor. Execution label is the answer to this problem. The user can pass in a value in the Execution label field that can differentiate input documents so that each input document will have its own label in the Snaplogic Monitor during Execution.

Here’s an example of two input documents running on the child pipeline. We set the Execution label with the expression “child_label” + $num, so the execution for the first document will have the label “child_label0” and the second execution will have the label “child_label1”.

Execution label settings

SnapLogic Monitor View

Summary

In this article, we introduced PipeLoop, a new Snap for iterative execution workflows. The pipeline run by PipeLoop must have one unlinked input and one unlinked output.

PipeLoop has the following features:

Pipeline Parameters support
Stop condition to exit early with warnings
Iteration limit to avoid infinite loop with warnings
Debug mode
Execution label to differentiate runs in Monitor
Progress bar for status tracking

SnapLogic - Integration Nation

Introduction to PipeLoop

What is PipeLoop

PipeLoop execution walkthrough

Parent pipeline setup

Child pipeline setup

Output

PipeLoop features

Properties

Pipeline

Pipeline Parameters

Loop options

Stop condition

Iteration limit

Debug iteration outputs

Execution options

Execute On

Execution Label

Pipeline Parameters

Introduction to Pipeline Parameters

Example

Remarks

Monitoring

Pipeline Statistics progress bar

Execution Label

Summary

Happy Building!

Advance Prompt Engineering

Embeddings and Vector Databases

Basics of SnapLogic