cancel
Showing results for 
Search instead for 
Did you mean: 

Snaplex Capacity Tuning Guide

ramaonline
Employee
Employee

Introduction

This document serves as a comprehensive best practice guide for developing efficient and robust Pipelines within the SnapLogic Platform.

It offers guidelines that aim to optimize performance, enhance maintainability, reusability, and provide a basis for understanding common integration scenarios and how best to approach them. The best practices encompass various aspects of Pipeline design, including Pipeline behavior, performance optimization and governance guidelines.

By adhering to these best practices, SnapLogic developers can create high-quality Pipelines that yield optimal results while promoting maintainability and reuse.

The content within this document is intended for the SnapLogic Developer Community or an Architect, in addition to any individuals who may have an influence on the design, development or deployment of Pipelines within the SnapLogic platform.

Authors: SnapLogic Enterprise Architecture team

Snaplex Planning

Snaplexes are a grouping of co-located nodes which are treated as a single logical entity for the purpose of Pipeline execution. The SnapLogic Control plane automatically performs load balancing of Pipeline workload within a Snaplex. Nodes in Snaplexes should be homogeneous, with the same CPU/memory/disk sizing and network configurations per node type (i.e. JCC / FeedMaster). The JCC and Feedmaster nodes in a Snaplex can be of different sizes.

Examples of recommended configurations:

Snaplex configurations

JCC node count - 4

JCC node size for each node - Large

Feedmaster node count - 2

Feedmaster node size for each node - Medium

JCC node count - 4

JCC node size for each node - X-Large

Feedmaster node count - 2

Feedmaster node size for each node - Large

 

 

Object

Definition

Node

A Node is a JVM (Java Virtual Machine) process which is installed on a server such as Windows or Linux.

JCC Node

The JCC node is responsible for:

  • Preparation, validation, and execution of Pipelines.
  • Send heartbeat to the Snaplogic Control plane indicating the health of the node.

FeedMaster Node

 

The FeedMaster node acts as an interface between  the JCC nodes and the client. The main functions of a FeedMaster node are:

  • Manage message queues.
  • Send heartbeat to the SnapLogic Control plane indicating the health of the node.

When setting up Snaplexes, it is recommended to plan out the number of Snaplexes to configure along with the usage criteria to achieve isolation across workloads. Snaplexes can be organized in various ways such as:

  • Pipeline Workload - Organize Snaplexes by workload type: Batch, Low latency, and On-demand.
  • Business Unit - Organize Snaplexes by business units.
  • Geographical location - Organize Snaplexes by data center or geographic location.

The recommendation is to use a combination of the above to optimize resource usage and achieve workload isolation.

Snaplex Network Requirements

Snaplexes should have the below network characteristics:

Within a Snaplex:

  • Less than 10 ms round trip latency between Snaplex nodes.
  • Greater than 40 MB/sec throughput between Snaplex nodes.

Snaplex to Control Plane:

  • Less than 50 ms round trip latency to the SnapLogic Control plane.
  • Greater than 20 MB/sec throughput to the SnapLogic Control plane.

Pipeline Execute

Pipeline execution using the Pipeline Execute Snap, nodes communicate with each other using HTTPS on port 8081. There is some resiliency to network failures and HTTPS requests are retried in the case of failures. Even though requests are retried, high network latency and dropped connections can result in Pipeline execution failures.

Regular Pipeline executions run within a node, requiring no communication with other nodes in the Snaplex. When a Pipeline Execute Snap is used to run child Pipelines, there are three options:

Option

Comments

LOCAL_NODE

This option is recommended when the child Pipeline is being used for Pipeline structuring and reuse rather than Pipeline workload distribution. Use this option for most regular child Pipeline executions.

LOCAL_SNAPLEX

The network communication is optimized for streaming data processing since the child Pipeline is on the local Snaplex. Use this option only when workload distribution within the Snaplex is required.

SNAPLEX_WITH_PATH

This has high dependency on the network. The network communication is optimized for batch data processing since the child Pipeline is on a remote Snaplex. Use this option only when the child Pipeline has to run on a different Snaplex, either because of endpoint connectivity restrictions or for workload distribution.

Ultra Pipelines

The JCC nodes communicate with the FeedMaster nodes over TCP with SSL on port 8084 when executing Ultra Pipelines. The communication between nodes is based on a message queue.

This communication is not resilient to network failure, so a reliable network is required between the Snaplex nodes for Ultra Pipeline processing. In case of any network failures, the currently processing Ultra requests will be retried or in some instances fail with errors.

If there is a communication failure between the JCC and Feedmaster nodes, then the request will be retried for up to five times. This is controlled by the ultra_max_redelivery_count Snaplex configuration. There is an overall 15-minute timeout for an Ultra request to the Feedmaster that is configurable at the request level using the X-SL-RequestTimeout HTTP request header or at the Snaplex level by using the llfeed.request_timeout config setting.

Note that both ultra_max_redelivery_count and llfeed.request_timeout are configured under Node Properties -> Global Properties for GroundPlexes. You can submit a support request to configure these properties for your Cloudplexes.

Pipeline Load Balancing

The Control plane performs load balancing for Pipeline execution requests on a Snaplex. The following table lists the configurations that are involved:

Property / Threshold

Where configured

Default value

Comments

Maximum Slots

Node properties tab of the Snaplex

4000

One slot = One Snap = One active thread on the node

A percentage of slots (configurable with the Reserved slot % property) are reserved for interactive Pipeline executions and validations thru the Designer tool.

Pipelines will be queued if the threshold is reached.

Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output, can use a higher number of threads compared to other Snaps.

Maximum memory %

Node properties tab of the Snaplex

85 (%)

Threshold at which no more Pipelines will be assigned to a node

Snaplex node resources (CPU, FDs, Memory)

Node server configurations

Configurable

If the Control plane detects that there are not enough resources available on the Snaplex, then the Pipeline execution requests will be queued up on the control plane, and resume when resources are available.

The Control plane dispatches the Pipeline to the node which has the most available capacity in terms of CPU/memory and file descriptors. For child Pipeline executions using the Pipeline Execute Snap, there is a preference given for running the child on the local node to avoid the network transfer penalty.

Table 1.0 Configurations for Pipeline load balancing

Snaplex Resource Management

Capacity Planning

This section provides some guidelines for Snaplex capacity planning and tuning.

Configuration / Use-case

Comments

Workload isolation

Isolate workloads across Snaplexes based on workload type, geographic location, and business unit.

Node sizing

Size the node (CPU, RAM, disk space) in a Snaplex based on Pipeline workload type.
Batch data processing needs larger nodes while Streaming/API processing can use smaller nodes.

Maximum Slots

One slot = One Snap = One active thread on the node

A percentage of slots (configurable with the Reserved slot % property) are reserved for interactive Pipeline executions and validations thru the Designer tool.

Pipelines will be queued if the threshold is reached.

Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output, can use a higher number of threads compared to other Snaps.

The general recommendation is to configure this property based on the node memory configuration. Example:

8 GB - 2000 Slots

16 GB - 4000 Slots

   

API Workloads

For API workloads, the rule of thumb is to have 100 active ultra API calls per 8 GB of RAM, or 20 active triggered API calls per 8 GB of RAM. So a 16 GB node can have 200 active ultra API calls or 40 active triggered API calls.

Node sizing

The number of nodes in a Snaplex can be estimated based on the count of batch and streaming Pipelines.

The number of FeedMaster nodes can be half of the JCC node count, with a minimum of two recommended for high availability.

For active Pipeline count estimates, error Pipelines can be excluded from the count since they do not consume resources under the normal workload.

Table 1.1 Configurations for Snaplex capacity planning

Capacity Tuning

Below are some best practices for Snaplex capacity tuning:

Configuration / Use-case

Comments

Slot counts

The Maximum slot count can be tuned based on the alerts and dashboard events. It is not required to restart the nodes for this configuration to take effect.

Queued Pipelines - Increase slot count by 25%

Busy nodes - Reduce slot count by 25%

The slot count should not be set to more than 50% above the recommended value for the node configuration. e.g.

The recommended slot count on a node with 16 GB RAM is 4000. Setting it to higher than 6000 is not advisable.If you observe high CPU / memory consumption on the node despite lowering the slot count by 25%, then consider allocating additional resources to the Snaplex nodes.

Workloads

Batch Workloads:

Expand the node memory up to 64 GB, and deploy additional nodes for increased capacity.

API Workloads:

Deploy additional nodes instead of expanding the memory on the current node.

Active Pipelines

As a general rule, it's suggested to maintain fewer than 500 active Pipeline instances on a single node. Exceeding this threshold can lead to communication bottlenecks with the Control plane.
If the number of active Pipeline instances exceeds 500, then the advisable course of action is to consider the addition of more nodes.

CPU

CPU consumption can be optimized by setting the Pool size and Batch size options on Pipeline Execute Snaps.

Memory

See Table 3.0 below

Additional Reference: Optimizations for Swap Memory

Table 2.0 Configurations for Snaplex capacity tuning

Memory Configuration thresholds

Property / Threshold

Where configured

Default value

Comments

Maximum memory %

Node properties tab of the Snaplex

85 (%)

Threshold at which no more Pipelines will be assigned to a node

Pipeline termination threshold

Internal

(Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common.SnapThreadStatsPoller.
MEMORY_HIGH_WATERMARK_PERCENT
)

95 (%)

Threshold at which the active Pipeline management feature kicks in and terminates pipelines when the node memory consumption exceeds the threshold.

 

Ideal range: 75-99

Pipeline restart delay interval

Internal

(Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common.SnapThreadStatsPoller.
PIPELINE_RESTART_DELAY_SECS
)

30 (seconds)

One Pipeline is terminated every 30 seconds until the node memory goes below the threshold (i.e. goes below 95%)

Table 3.0 Snaplex node memory configurations

The above thresholds can be optimized to minimize Pipeline terminations due Out-of-Memory exceptions. Note that the memory thresholds are based on the Physical memory on the node, and not the Virtual / Swap memory.

Snaplex Alerts

SnapLogic supports alerts and notifications through email and Slack channels. These can be configured in the Manager interface under Settings. The recommended alerts are listed in the table below.

Alert type

Comments

Snaplex status alerts

Status alerts can be created at the org level or the Snaplex level (in the Snaplex properties). These allow notifications to be sent when the Snaplex node is unable to communicate with the SnapLogic control plane or there are other issues detected with the Snaplex.

Snaplex Resource usage alerts

Set up alerts for these event types:

  • Snaplex congestion
  • Snaplex load average
  • Snaplex node memory usage
  • Snaplex node disk usage

Table 4.0 Recommended Snaplex Alerts

Reference:

 

 

5 REPLIES 5

Jocelyn
Employee
Employee

An example of sizing would be great.

Sure @Jocelyn - Will review and include in the next version.

Jocelyn
Employee
Employee

The batch workload guidance is lacking a means to estimate the execution time of a batch process, which is really needed. The number of 5mb per second per 8gb ram has been floating around, if it's accurate we could use it.

 

Batch Workloads

The general guideline is that a maximum of 5 concurrent batch Pipelines can execute 8 GB of RAM. Therefore, a 16 GB RAM machine should run at most 10 concurrent batch Pipelines.

I will verify a sample batch use-case and include that example as a reference for calculation.