10-02-2023 06:47 AM - edited 06-11-2024 08:09 AM
This document serves as a comprehensive best practice guide for developing efficient and robust Pipelines within the SnapLogic Platform.
It offers guidelines that aim to optimize performance, enhance maintainability, reusability, and provide a basis for understanding common integration scenarios and how best to approach them. The best practices encompass various aspects of Pipeline design, including Pipeline behavior, performance optimization and governance guidelines.
By adhering to these best practices, SnapLogic developers can create high-quality Pipelines that yield optimal results while promoting maintainability and reuse.
The content within this document is intended for the SnapLogic Developer Community or an Architect, in addition to any individuals who may have an influence on the design, development or deployment of Pipelines.
Authors: SnapLogic Enterprise Architecture team
The SnapLogic Pipeline serves as the foundation for orchestrating data across business systems, both within and outside of an organization. One of its key benefits is its flexibility and the broad range of "Snaps" that aim to reduce the complexity involved in performing specific technical operations. The “SnapLogic Designer”, a graphical low-code environment for building an integration use case with Snaps, provides a canvas enabling users with little technical knowledge to construct integration Pipelines. As with any user-driven environment, users must exercise careful attention to ensure they not only achieve their desired business goals but also adhere to the right approach that aligns with industry and platform best practices. When dealing with a SnapLogic Pipeline, these best practices may encompass various considerations:
Not considering these factors may cause undesirable consequences for the business and users concerned. Relative to the considerations stated above, these consequences could be as follows:
Therefore, it is essential that users of the Platform consider best practice recommendations and also contemplate how they can adopt and govern the process to ensure successful business outcomes.
To better understand how Pipelines can be built effectively within SnapLogic, it is essential to have an understanding of the Pipeline’s internal characteristics and behaviors. This section aims to provide foundational knowledge about the internal behavior of Pipelines, enabling you to develop a solid understanding of how they operate and help influence better design decisions.
The execution of a SnapLogic Pipeline can be initiated either via a Triggered, Ultra or Scheduled task. In each case, the Pipeline transitions through a number of different ‘states’ with each state reflecting a distinct processing the lifecycle of the Pipeline, from invocation, preparation, execution to completion. The following section of the document will look to highlight this process in more detail and explain some of the internal behaviors.
The typical Pipeline execution flow is as follows:
The following section describes the different Pipeline state transitions & respective behavior in sequential order.
State |
Purpose |
NoUpdate |
A pre-preparing state. This indicates a request to invoke a Pipeline has been received but the leader node or control plane is trying to establish which Snaplex node it should run on. (This state is only relevant if the Pipeline is executed on the leader node). |
Preparing |
Indicates the retrieval of relevant asset metadata including dependencies from the control plane relating to the invoked Pipeline. This process also carries out pre-validation of snap configuration alerting the user of any missing mandatory snap attributes. |
Prepared |
Pipeline is prepared and is ready to be executed |
Executing |
Pipeline executes and processes data, connecting to any Snap Endpoints using the specified protocols. |
Completed |
Pipeline execution is complete and the teardown resulting in the releasing of compute resources within the Snaplex Node. Final Pipeline execution metrics and sent to the Control Plane. |
Table 1.0 Pipeline state transitions
Pipeline execution flow
The following decision tree can be used to establish the best Pipeline Design approach for a given use case.
Snaps can be generally categorized into these types:
Connected Snaps with a Pipeline communicate with one another using Input and Output views. An Input view accepts data being passed from an upstream snap, it operates on the data and then passes the data to its Output view. Each view implements a separate in-memory ring buffer at runtime. Given the following example, the Pipeline will have three separate ring buffers. These are represented by the circular connections between each snap (diamond shaped connections for binary Snaps).
com.snaplogic.cc.jstream.view.publisher.AbstractPublisher.DOC_RING_BUFFER_SIZE=1024
com.snaplogic.cc.jstream.view.publisher.AbstractPublisher.BINARY_RING_BUFFER_SIZE=128
The following example Pipeline demonstrates the practical example of how the usage and composition of Snaps within a Pipeline change the characteristics of how the Pipeline will operate once it is executed.
Segment 1
Segment 2
Property / Threshold |
Where configured |
Default value |
Comments |
Maximum memory % |
Node properties tab of the Snaplex |
85 (%) |
Threshold at which no more Pipelines will be assigned to a node |
Pipeline termination threshold |
Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common. |
95 (%) |
Threshold at which the active Pipeline management feature kicks in and terminates Pipelines when the node memory consumption exceeds the threshold.
Ideal range: 75-99 |
Pipeline restart delay interval |
Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common. |
30 (seconds) |
One Pipeline is terminated every 30 seconds until the node memory goes below the threshold (i.e. goes below 95%) |
Table 2.0 Snaplex node memory configurations
The above thresholds can be optimized to minimize Pipeline terminations due Out-of-Memory exceptions. Note that the memory thresholds are based on the Physical memory on the node, and not the Virtual / Swap memory.
Additional Reference: Optimizations for Swap Memory
Add 16 GB swap memory to a Snaplex node with 8 GB physical memory.
Property |
Comments |
Swap Space on the server |
Add 16 GB of swap / virtual memory to the node. |
Total Memory |
Total Memory is now = 24 GB (8 GB Physical plus 16 GB Virtual) |
Maximum Heap Size |
Set to 90% (of 24 GB) = 22 GB |
Maximum Memory |
Set to 31% rounded (of 22 GB) = 7 GB |
The intent of the above calculation is to ensure that the JCC utilizes 7GB of the available 8GB memory for normal workloads. Beyond that, the load balancer can queue up additional Pipelines or send them to other nodes for processing. If Pipelines that are running collectively start using over 7GB of memory, then the JCC can utilize up to 22GB of the total heap memory by using the OS swap space per the above configuration.
Table 3.0 Snaplex node memory configurations
By updating the memory configurations as in the above example, the JCC utilizes 7 GB of the available 8 GB memory. Beyond that value, the load balancer would queue up additional Pipelines or distribute them across other nodes.
Modularization can be implemented in SnapLogic Pipelines by making use of the Pipeline Execute Snap. This approach enables you to:
Modularization best practices:
Detailed documentation with examples can be found in the SnapLogic documentation for Pipeline Execute.
Use Pipeline Execute when:
Avoid when:
Additional recommendations and best practices for the Pipeline Execute Snap:
This section lists some recommendations to improve Pipeline efficiency
Note:
SLDB should not be used as a file source or as a destination in any SnapLogic orgs (Prod / Non-Prod). You can use your own Cloud storage provider for this purpose. You may encounter issues such as file corruption, pipeline failures, inconsistent behavior, SLA violations, and platform latency if using SLDB instead of a separate Cloud storage for the file store.
This applies to all File Reader / Writer Snaps and the SnapLogic API.
Use your own Cloud storage instead of SLDB for the following (or any other) File Read / Write use-cases:
Scenario |
Recommendation |
Feature(s) |
Multiple Pipelines with similar structure |
Use parameterization with Pipeline Execute to reuse Pipelines |
Pipeline Execute Pipeline parameters |
Bulk Loading to target datasource |
Use Bulk Load Snaps where available |
Bulk Loading |
Mapper snap contains a large amount of mappings where the source & target field names are consistent |
Enable “Pass through” setting on the Mapper. |
Mapper - Pass Through |
Processing large data loads |
Perform target load operation within a Child Pipeline using the “Pipeline Execute” snap with “Execute On” set to “LOCAL_SNAPLEX”. |
Pipeline Execute |
Performing complex transformations and/or JOIN/SORT operations across multiple tables |
Perform transformations & operations within SQL query |
SQL Query Snaps |
High Throughput Message Queue to Database ingestion |
Batch polling and ingestion of messages by:
|
Consumer Snaps Database Load Snaps |
Table 4.0 Optimization recommendations
An Ultra Task is a type of task which can be used to execute Ultra Pipelines. Ultra Tasks are well-suited for scenarios where there is a need to process large volumes of data with low latency, high throughput, and persistent execution.
While the performance of an Ultra Pipeline largely depends on the response times of the external applications to which the Pipeline connects to, there are a number of best practice recommendations that can be followed to ensure optimal performance and availability.
There are two modes of Ultra Tasks - Headless Ultra and Low Latency Ultra API with each mode being characterized by the design of the Pipeline which is invoked by the Ultra Task. The modes are described in more detail below.
A Headless Ultra Pipeline is an Ultra Pipeline which does not require a Feedmaster, and where the data source is a Listener or Consumer type construct, for example Kafka Consumer, File Poller, SAP IDOC Listener (For a detailed list of supported Snaps, please click here).
The Headless Ultra Pipeline executes continuously and polls the data source according to the frequency configured within the Snap passing documents from the source to downstream Snaps.
Low Latency API Ultra is a high-performance API execution mode designed for real-time, low-latency data integration and processing. The Pipeline invoked by the Ultra Task is characterized by having an open input view for the first Snap used in the Pipeline (typically a HTTP Router or Mapper Snap). Requests made to the API are brokered through a ‘FeedMaster Node’, guaranteeing at least once message delivery.
Triggered Tasks offer the method of invoking a Pipeline using an API endpoint when the consumption pattern of the API is infrequent and/or does not require low latency response times.