Recent Discussions
Platform Administration Reference guide v3
Introduction This document is a reference manual for common administrative and management tasks on the SnapLogic platform. It has been revised to include the new Admin Manager and Monitor functionality, which replace the Classic Manager and Dashboard interfaces respectively. This document is for SnapLogic Environment Administrators (Org Administrators) and users involved in supporting or managing the platform components. Author: Ram Bysani SnapLogic Enterprise Architecture team Environment Administrator (known as Org Admin in the Classic Manager) permissions There are two reserved groups in SnapLogic: admins: Users in this group have full access to all projects in the Org. members: Users in this group have access to projects that they create, or to which they are granted access. Users are automatically added to this group when you create them, and they must be a part of the members group to have any privileges within that Org. There are two user roles: Environment admins: Org users who can manage the Org. Environment admins are part of the admins group, and this role is named “Org Admin” in the classic Manager. Basic user: All non-admin users. Within an Org, basic users can create projects and work with assets in the Project spaces to which they have been granted permission. To gain Org administrator privileges, a Basic user can be added to the admins group. The below table lists the various tasks under the different categories that an Environment admin user can perform: Task Comments USER MANAGEMENT Create and delete users. Update user profiles. Create and delete groups. Add users to a group. Configure password expiration policies. Enable users’ access to applications (AutoSync, IIP) When a user is removed from an Org, the administrator that removes the user becomes the owner of that user's assets. Reference: User Management MANAGER Create and manage Project Spaces. Update permissions (R, W, X) on an individual Project space and projects. Delete a Project space. Restore Project spaces, projects, and assets from the Recycle bin. Permanently delete Project spaces, projects, and assets from the Recycle bin. Configure Git integration and integration with tools such as Azure Repos, GitLab, and GHES. View Account Statistics, and generate reports for accounts, projects, and pipelines within the project that use an account. Upgrade/downgrade Snap Pack versions. ALERTS and NOTIFICATIONS Set up alerts and notifications. Set up Slack channels and recipients for notifications. Reference: Alerts SNAPLEX and ORG Create Groundplexes. Manage Snaplex versions. Update Snaplex settings. Update or revert a Snaplex version. APIM Publish, unpublish, and deprecate APIs on the Developer portal. Configure the Developer portal. Approve API subscriptions and manage/approve user accounts. Reference: API Management AutoSync Configure AutoSync user permissions. Configure connections for data pipeline endpoints. Create user groups to share connection configuration. View information on all data pipelines in the Org. Reference: AutoSync Administration Table 1.0 Org Admin Tasks SnapLogic Monitoring Dashboards The enhanced Monitor interface can be launched from the Apps (Waffle) menu located on the top right corner of the page. The enhanced Monitor Interface enables you to observe integration executions, activities, events, and infrastructure health in your SnapLogic environment. The Monitor pages are categorized under three main groups: Analyze Observe Review Reference: Move_from_Dashboard_to_Monitor The following table lists some common administrative and monitoring tasks for which the Monitor interface can be used. Task Monitor App page Integration Catalog to fetch and display metadata for all integrations in the environment. Monitor -> Analyze -> Integration Catalog Reference: Integration Catalog View of the environment over a time period. Monitor -> Analyze -> Insights Reference: Insights View pipeline and task executions along with statistics, logs, and other details. Stop executions. Download execution details. Monitor -> Analyze -> Execution Reference: Execution Monitor and manage Snaplex services and nodes with graph views for a time period. Monitor -> Analyze -> Infrastructure Reference: Infrastructure View and download metrics for Snaplex nodes for a time period. Monitor -> Analyze -> Metrics Monitor -> Observe -> API Metrics Reference: Metrics, API-Metrics Review Alert history and Activity logs. Monitor -> Review Reference: Alert History, Activity Log Troubleshooting Snaplex / Node / Pipeline issues. Reference: Troubleshooting Table 2.0 Monitor App features Metrics for monitoring CPU Consumption CPU consumption can be high (and exceed 90% at times) when pipelines are executing. A high CPU consumption percentage when no pipelines are executing could indicate a high CPU usage by other processes on the Snaplex node. Review CPU Metrics under the Monitor -> Metrics, and Monitor -> Infrastructure tabs. Reference: CPU utilization metrics System load average (For Unix based systems) Load average is a measure of the number of processes that are either actively running on the CPU or waiting in line to be processed by the CPU. e.g. in a system with 4 virtual CPUs: A load average value of 4.0 means average full use of all CPUs without any idle time or queue. A load average value of >4.0 suggests that processes are waiting for CPU time. A load average value of <4.0 indicates underutilization. System load. Monitor -> Metrics tab. Heap Memory Heap memory is used by the SnapLogic application to dynamically allocate memory at runtime to perform memory intensive operations. The JVM can crash with an Out-of-Memory exception if the heap memory limit is reached. High heap memory usage can also impact other application functions such as pipeline execution, metrics collection, etc. The key heap metrics are listed in the table below: Metric Comments Heap Size Amount of heap memory reserved by the OS This value can grow or shrink depending on usage. Used heap Portion of heap memory in use by the application’s Java objects This value changes constantly with usage. Max heap size Upper heap memory limit This value is constant and does not change. It can be configured by setting the jcc.heap.max_size property in the global.properties file or as a node property. Heap memory. Monitor -> Metrics tab. Non-heap memory consumption The JVM reserves additional native memory that is not part of the heap memory. This memory area is called Metaspace, and is used to store class metadata. Metaspace can grow dynamically based on the application’s needs. Non-heap memory metrics are similar to heap memory metrics however there is no limit on the size of the non-heap memory. In a Snaplex, non-heap size tends to stay somewhat flat or grow slowly over longer periods of time. Non-heap size values larger than 1 GiB should be investigated with help from SnapLogic support. Note that all memory values are displayed in GiB (Gibibytes). Non-Heap memory. Monitor -> Analyze -> Metrics (Node) Swap memory Swap memory or swap space is a portion of disk used by the operating system to extend the virtual memory beyond the physical RAM. This allows multiple processes to share the computer’s memory by “swapping out” some of the RAM used by less active processes to the disk, making more RAM available for the more active processes. Swap space is entirely managed by the operating system, and not by individual processes such as the SnapLogic Snaplex. Note that swap space is not “extra” memory that can compensate for low heap memory. Refer to this document for information about auto, and custom heap settings. Reference: Custom heap setting. High swap utilization is an indicator of contention between processes, and may suggest a need for higher RAM. Additional Metrics Select the node from Monitor -> Analyze, and navigate to the Metrics tab. Review the following metrics. Active Pipelines Monitor the Average and Max active pipeline counts for specific time periods. Consider adding nodes for load balancing and platform stability if these counts are consistently high. Active Pipelines. Monitor -> Analyze -> Metrics (Node) Active Threads Active threads. Monitor -> Analyze -> Metrics (Node) Every Snap in an active pipeline consumes at least one thread. Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output can use a higher number of threads compared to other Snaps. Refer to this Sigma document on community.snaplogic.com: Snaplex Capacity Tuning Guide for additional configuration details. Disk Utilization It is important to monitor disk utilization as the lack of free disk space can lead to blocking threads, and can potentially impact essential Snaplex functions such as heartbeats to the Control Plane. Disk utilization. Monitor -> Analyze -> Metrics (Node) Additional Reference: Analyze Metrics. Download data in csv format for the individual Metrics graphs. Enabling Notifications for Snaplex node events Event Notifications can be created on the Manager (Currently in the Classic Manager) under Settings -> Notifications. The notification rule can be set up to send an alert about a tracked event to multiple email addresses. The alerts can also be viewed on the Manager under the Alerts tab. Reference: Notification Events Snaplex Node notifications Telemetry Integration with third-party observability tools using OpenTelemetry (OTEL) The SnapLogic platform uses OpenTelemetry (OTEL) to support telemetry data integration with third-party observability tools. Please contact your CSM to enable the Open Telemetry feature. Reference: Open Telemetry Integration Node diagnostics details The Node diagnostics table includes diagnostic data that can be useful for troubleshooting. For configurable settings, the table displays the Maximum, Minimum, Recommended, and Current values in GiB (Gibibytes) where applicable. The values in red indicate settings outside of the recommended range. Navigate to the Monitor -> infrastructure -> (Node) -> Additional Details tab. Example: Node diagnostics table Identifying pipelines that contribute to a node crash / termination Monitor Page Comments Monitor -> Activity logs Filter by category = Snaplex. Make note of the node crash events for a specific time period Event name text: Node crash event is reported Reference: Activity Logs Monitor -> Execution Select the execution window in the Calendar. Filter executions by setting these Filter conditions: Status: Failed Node name: <Enter node name from the crash event> Reference: Execution Sort on the Documents column to identify the pipeline executions processing the most number of documents. Click anywhere on the row to view the execution statistics. You can also view the active pipelines for that time period from the Monitor -> Metrics -> Active pipelines view. Table 3.0 Pipeline execution review Additional configurations to mitigate pipeline terminations The below thresholds can be optimized to minimize pipeline terminations due to Out-of-Memory exceptions. Note that the memory thresholds are based on the physical memory on the node, and not the Virtual / Swap memory. Maximum Memory % Pipeline termination threshold Pipeline restart delay interval Refer to the table Table 3.0 Snaplex node memory configurations in this Sigma document for additional details and recommended values: Snaplex Capacity Tuning Pipeline Quality Check API The Linter public API for pipeline quality provides additional rules to provide complete reports for all standard checks, including message levels (Critical / Warning / Info), with actionable message descriptions for pipeline quality. Reference: Pipeline Quality Check By applying the quality checks, it is possible to optimize pipelines, and improve maintainability. You can also use SnapGPT to analyze pipelines, identify issues, and suggest best practices to improve your pipelines. (SnapGPT_Analyze_Pipelines) Other third party profiling tools Third party profiling tools such as VisualVM can be used to monitor local memory, CPU, and other metrics. This document will be updated in a later version to include the VisualVM configurations for the SnapLogic application running on a Groundplex. Java Component Container (jcc) command line utility (for Groundplexes) The jcc script is a command-line tool that provides a set of commands to manage the Snaplex nodes. This utility is installed in the /opt/snaplogic/bin directory of the Groundplex node. The below table lists the commonly used arguments for the jcc script (jcc.sh on Linux and jcc.bat on Windows). Note that the command would list other arguments (for example, try-restart). However, those are mainly included for backward compatibility and not frequently used. $SNAPLOGIC refers to the /opt/snaplogic directory on Linux or the <Windows drive>:\opt\snaplogic directory on Windows servers. Run these commands as the root user on Linux and as an Administrator on Windows. Example: sudo /opt/snaplogic/bin/jcc.sh restart or c:\snaplogic\bin\jcc.bat restart Argument Description Comments status Returns the Snaplex status. The response string would indicate if the Snaplex Java process is running. start Starts the Snaplex process on the node. stop Stops the Snaplex process on the node. restart Stops and restarts the Snaplex process on the node. Restarts both the monitor and the Snaplex processes. diagnostic Generates the diagnostic report for the Snaplex node. The HTML output file is generated in the $SNAPLOGIC/run/log directory. Resolve any warnings from the report to ensure normal operations. clearcache Clears the cache files from the node. This command must be executed when the JCC is stopped. addDataKey Generates a new key pair and appends it to the keystore in the /etc/snaplogic folder with the specified alias. This command is used to rotate the private keys for Enhanced Account Encryption. Doc reference: Enhanced Account Encryption The following options are available for a Groundplex on Windows server. install_service remove_service The jcc.bat install_service command installs the Snaplex as a Windows service. The jcc.bat remove_service command removes the installed Windows service. Run these commands as an Administrator user. Table 4.0 jcc script arguments Example of custom log configuration for a Snaplex node (Groundplex) Custom log file configuration is occasionally required due to internal logging specifications or to troubleshoot problems with specific Snaps. In the following example, we illustrate the steps to configure the log level of ‘Debug’ for the Azure SQL Snap pack. The log level can be customized for each node of the Groundplex where the related pipelines are executed, and will be effective for all pipelines that use any of the Azure SQL Snaps (for example, Azure SQL - Execute, Azure SQL - Update, etc.). Note that Debug logging can affect pipeline performance so this configuration must only be used for debugging purposes. Configuration Steps Follow steps 1 and 2 from this document: Custom log configuration Note: You can perform Step 2 by adding the property key and value under the Global Properties section. Example: Key: jcc.jvm_options Value: -Dlog4j.configurationFile=/opt/snaplogic/logconfig/log4j2-jcc.xml The Snaplex node must be restarted for the change to take effect. Refer to the commands in Table 3.0. b. Edit the log4j2-jcc.xml file configured in Step a. c. Add a new RollingRandomAccessFile element under <Appenders>. In this example, the element is referenced with a unique name JCC_AZURE. It also has a log size and rollover policy defined. The policy would enable generation of up to 10 log files of 1 MB each. These values can be adjusted depending on your requirements. <RollingRandomAccessFile name="JCC_AZURE" fileName="${env:SL_ROOT}/run/log/${sys:log.file_prefix}jcc_azure.json" immediateFlush="true" append="true" filePattern="${env:SL_ROOT}/run/log/jcc_azure-log-%d{yyyy-MM-dd-HH-mm}.json” ignoreExceptions="false"> <JsonLogLayout properties="true"/> <Policies> <SizeBasedTriggeringPolicy size="1 MB"/> </Policies> <DefaultRolloverStrategy max="10"/> </RollingRandomAccessFile> … … </Appenders> d. The next step is to configure a Logger that references the Appender defined in step #c. This is done by adding a new <Logger> element. In this example, the Logger is defined with log level = Debug. <Logger name="com.snaplogic.snaps.azuresql" level="debug" includeLocation="true" additivity="false"> <AppenderRef ref="JCC_AZURE" /> </Logger> .. .. <Root> … </Root </Loggers> </Configuration> The value for the name attribute is derived from the Class FQID value of the associated Snap. The changes to log4j2-jcc.xml are marked by the highlighted text in steps c and d. The complete XML file is also attached for reference. You can refer to the Log4j documentation for more details on the attributes or for additional customization. Log4j reference Debug log messages and log files Additional debug log messages will be printed to the pipeline execution logs for any pipeline with Azure SQL Snaps. These logs can be retrieved from Dashboard. Example: {"ts": "2023-11-30T20:21:33.490Z", "lvl": "DEBUG", "fi": "JdbcDataSourceRegistryImpl.java:369", "msg": "JDBC URL: jdbc:sqlserver://sltapdb.database.windows.net:1433;database=SL.TAP;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;authentication=sqlPassword;loginTimeout=30;connectRetryCount=3;connectRetryInterval=5;applicationName=SnapLogic (main23721) - pid-113e3955-1969-4541-9c9c-e3e0c897cccd, database server: Microsoft SQL Server(12.00.2531), driver: Microsoft JDBC Driver 11.2 for SQL Server(11.2.0.0)", "snlb": "Azure+SQL+-+Update", "snrd": "5c06e157-81c7-497f-babb-edc7274fa4f6", "plrd": "5410a1bdc8c71346894494a2_f319696c-6053-46af-9251-b50a8a874ff9", "prc": "Azure SQL - The updated log configuration would also write the custom JCC logs (for all pipelines that have executed the Azure SQL Snaps) to disk under the /opt/snaplogic/run/log directory. The file size for each log file and the number of files would depend on the configuration in the log4j2-jcc.xml file. The changes to log4j2-jcc.xml can be reverted if the additional custom logging is no longer required. Log level configuration for a Snaplex in Production Orgs The default log level for a new Snaplex is ‘Debug.’ This value can be updated to ‘Info’ in Production Orgs as a best practice. The available values are: Trace: Records details of all events associated with the Snaplex. Debug: Records all events associated with the Snaplex. Info: Records messages that outline the status of the Snaplex and the completed Tasks. Warning: Records all warning messages associated with the Snaplex. Error: Records all error messages associated with the Snaplex. Reference: Snaplex logging PlexFS File Storage considerations PlexFS also known as suggest space is a storage location on the local disk of the JCC node. The /opt/snaplogic/run/fs folder is commonly designated for this purpose. It is used as a data store to temporarily store preview data during pipeline validation, as well as to maintain the state data for Resumable pipelines. Disk volumes To address issues that cause disk full errors and to ensure smoother operations of the systems that affect the stability of the Groundplex, you need to have separate mounts on Groundplex nodes. Follow the steps suggested below to create two separate disk volumes on the JCC nodes. Reference: Disk Volumes The /opt/snaplogic/run/fs folder location is used for the PlexFS operations. mount --bind /workspace/fs /opt/snaplogic/run/fs Folder Structure: The folders under PlexFS are created with this path structure: /opt/snaplogic/run/fs/<Environment>/<ProjectSpace>/<Project>/__suggest__/<Asset_ID> Example: /opt/snaplogic/run/fs/Org1/Proj_Space_1/Project1/__suggest__/aaa5010bc The files in the sub-folders are created with these extensions: *.jsonl *.dat PlexFS File Creation The files in /opt/snaplogic/run/fs are generated when a user performs pipeline validation. The amount of data in a .dat file is based on the “Preview Document Count” user setting. For Snaps with binary output (such as File Reader), the Snap will stop writing to PlexFS when the next downstream Snap has generated its limit of Preview data. PlexFS File Deletion The files for a specific pipeline are deleted when the user clicks ‘Retry’ to perform validation. New data files are generated. Files for a specific user session are deleted when the user logs out of SnapLogic. All PlexFS files are deleted when the Snaplex is restarted. Files in PlexFS are generated with an expiration date. The default expiration date is two days. The files are cleaned up periodically based on the expiration date. It is possible to set a feature flag to override the expiration time, and delete the files sooner. Recommendations The temp files are cleaned up periodically based on the default expiration date however you might occasionally encounter disk space availability issues due to excessive Preview data being written to the PlexFS file storage. The mount directory location can be configured with additional disk space or shared file storage (e.g. Amazon EFS). Contact SnapLogic support for details on the feature flag configuration to update the expiration time to a shorter duration for faster file clean up. The value for this feature flag is set in seconds.5.6KViews4likes0CommentsBest Practices for Adopting AI Solutions in the Enterprise with SnapLogic AgentCreator
Best Practices for Adopting AI Solutions in the Enterprise with SnapLogic AgentCreator Version: 1.2 Authors: Dominic Wellington, Guy Murphy, Pat Traynor, Bash Badawi, Ram Bysani, David Dellsperger, Aaron Kesler Introduction: AI in the Modern Enterprise AI is fast becoming a cornerstone of modern enterprises, transforming how businesses operate, make decisions, and interact with customers. Its capabilities, such as automation, predictive analytics, and natural language processing, allow companies to streamline processes, gain deeper insights from data, and enhance customer experiences. From optimizing supply chains to personalizing marketing strategies, AI is enabling enterprises to innovate, drive efficiency, and be competitive in an increasingly data-driven world. As AI continues to evolve, its role in shaping business strategy and operations will only grow. Precisely because of its novelty and importance, leaders will need to think carefully about various aspects of how these powerful new capabilities can be deployed in a manner that is compliant with existing legislation and regulation, and how best to integrate them with existing systems and processes. There are no generally-accepted best practices in this field yet due to its novelty, but there are lessons that we can learn from past waves of technological change and adoption. In this document we set out some suggestions for how to think about these topics in order to ensure a positive outcome. Data Data is the lifeblood of IT — arguably the reason for the field’s entire existence — but its importance is only magnified when it comes to AI. Securing access to data is a requirement for an AI project to get off the ground in the first place, but managing that access over time, especially as both the data and the policies that apply to it change and evolve over time. Data Security and Management Data security has always been a complex issue for IT organizations. When considered from the perspective of AI adoption, two main areas need to be considered: external and internal usage. Externally-hosted Large Language Models (LLMs) offer powerful and rapidly-evolving capabilities, but also have inherent risks as they are operated by third parties. The second area of focus is how and what internal data should be used with AI models, whether self-managed or externally operated. External Security Organizations have reasonable concerns about their proprietary, regulated, or otherwise sensitive information “leaking” beyond the organizational boundaries. For this reason, simply sending internal information to a public LLM without any controls in place is not considered a viable solution. An approach to this problem that was previously considered promising was to use a technique called Retrieval Augmented Generation, or RAG. In this approach, rather than passing user queries directly to an LLM for answers, a specialized data store is deployed, called a vector database. When a user query is received, the vector data store is consulted first to identify relevant chunks of information with which to answer the query, and only after this step is the LLM used to provide the conversational response back to the user. However, while RAG does limit the potential for information leakage, it does not reduce it to zero. The vector database can be operated according to the organization’s own risk profile: fully in-house, as a private cloud instance, or leveraging a shared platform, depending on the information it contains and the policies or regulations that apply to that information. However, a chunk of information will be sent from the vector store to the LLM to answer each query, and over time, this process can be expected to expose a substantial part of the knowledge base to the LLM. It is also important to be aware that the chunking process itself still uses a LLM. More security-sensitive organizations or those operating in regulated industries may choose to leverage a more restricted deployment model for the LLM as well, much as discussed for the vector database itself, in order to avoid this leakage. However, it is worth noting that while an “open-source” language model can be prevented from contributing training data back to its developers, its own pre-existing training data may still leak out into the answers. The ultimate risk here is of “model poisoning” from open-source models. That is, injection of data from outside the user’s domain which may lead to inconsistent or undesirable responses. One example of this phenomenon is “context collapse”, which may occur in the case of overloaded acronyms, where the same acronym can represent vastly different concepts in different domains. A generalist model may mis-understand or mis-represent the acronym — or worse, may do so inconsistently. The only way to be entirely certain of data security and hygiene is to train the model from scratch — an undertaking that, due to its cost in both time and resources, is practical only for the largest organisations, and is anyway required only for the most sensitive data sets. A halfway house that is suitable for organisations that have concerns in this domain, but not to the point of being willing to engineer everything themselves from the ground up, is fine-tuning. In this approach, a pre-trained model is further trained on a specific data set. This is a form of transfer learning where a pre-trained model trained on a large dataset is adapted to work for a specific task. The dataset required for this sort of fine-tuning is very small compared to the dataset required for full model training, bringing this approach within reach of far more organisations. Internal Data Access Controls The data that is consumed by the AI model also needs to be secured inside the organization, ensuring that access controls on that data follow the data through the system at all levels. It is all too easy to focus on ingesting the data and forget about the metadata, such as role-based access controls. Instead, these controls should be maintained throughout the AI-enabled system. Any role-based access controls (RBAC) that are placed on the input data should also be reflected in the output data. Agentic approaches are useful here, as they give the opportunity to enforce such controls at various points. The baseline should be that, if a user ought not be able to access certain information through traditional means such as database queries or direct filesystem access, they also must not be able to access it by querying an AI overlay over those systems — and vice-versa, of course. Prompt Logging and Observability An emerging area of concern is the security of prompts used with AI models. Especially when using public or unmodified open-source models, the primary input to the models is the prompt that is passed to them. Even minor changes to that prompt can cause major differences in what is returned by the model. For this reason, baseline best practice is to ensure that prompts are backed up and versioned, just as would be done for more traditional program code. In addition, both prompts and their corresponding responses should be logged in order to be able to identify and troubleshoot issues such as performance changes or impact to pricing models of public LLMs. Some more detailed suggestions are available here. Prompts should also be secured against unauthorized modification, or “prompt injection”. Similarly to the analogous “SQL injection”, attackers may attempt to modify or replace the prompt before it is passed to the AI model, in order to produce outputs that are different from those expected and desired by users and operators of the system. The potential for damage increases further in the case of agentic systems that may chain multiple model prompts together, and potentially even take actions in response to those prompts. Again, logging for both in-the-moment observability and later audit is important here, including the actual final prompt that was sent to the model, especially when that has been assembled across multiple steps. These logs are useful for troubleshooting, but may also be formally required for demonstrating compliance with regulation or legislation. Example Prompt Injection Scenarios Direct Injection An attacker injects a prompt into a customer support chatbot, instructing it to ignore previous guidelines, query private data stores, and send emails, leading to unauthorized access and privilege escalation. Indirect Injection A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, leading to exfiltration of the private conversation. Unintentional Injection A company includes an instruction in a job description to identify AI-generated applications. An applicant, unaware of this instruction, uses an LLM to optimize their resume, inadvertently triggering the AI detection. Intentional Model Influence An attacker modifies a document in a repository used by a Retrieval-Augmented Generation (RAG) application. When a user’s query returns the modified content, the malicious instructions alter the LLM’s output, generating misleading results. Code Injection An attacker exploits a vulnerability in an LLM-powered email assistant to inject malicious commands, allowing access to sensitive information and manipulation of email content. Payload Splitting An attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model’s response, resulting in a positive recommendation regardless of the resume’s actual contents. Multimodal Injection An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model’s behavior, potentially leading to unauthorized actions or disclosure of sensitive information. Adversarial Suffix An attacker appends a seemingly meaningless string of characters to a prompt, which influences the LLM’s output in a malicious way, bypassing safety measures. Multilingual/Obfuscated Attack An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM’s behavior. reference: https://genai.owasp.org/llmrisk/llm01-prompt-injection/ As these examples show, there are many patterns and return sets from LLMs that will need to be managed and observed, comparing prompts, responses, and data sets with certified sets and expected structures. Hopefully over time and as commercial LLMs mature, many of these issues will be managed by the LLMs themselves, but today these concerns will have to be part of the enterprise’s own governance framework for AI adoption. Data Ownership and Observability Much of the value of most Generative AI (GenAI) applications is based on the quantity, freshness and reliability of the source data that is provided. An otherwise fully-functional GenAI tool that provides responses based on incomplete or out-of-date data will not be useful or valuable to its users. The first question is simply how to gain access to useful source data, and to maintain that access in the future. This work spans both technical and policy aspects. SnapLogic of course makes technical connectivity easy, but there may still be questions of ownership and compliance, not to mention identifying where necessary data even resides. Beyond the initial setup of the AI-enabled system, it will be important to maintain ongoing access to up-to-date data. For instance, if a RAG approach is used, the vector data store will need to be refreshed periodically from the transactional data platform. The frequency of such updates will vary between use cases, depending on the nature of the data and its natural rate of change. For instance, a list of frequently asked questions, or FAQs, can be updated whenever a new entry is added to the list. Meanwhile, a data set that is updated in real time, such as airline operations, will need much more frequent synchronization if it is to remain useful. Recommendations Data is key to the success of AI-enabled systems – and not just one-time access to a dataset that is a point-in-time snapshot, but ongoing access to real-time data. Fortunately, these are not new concerns, and existing tools and techniques can be applied readily to securing and managing that flow of data. In fact, the prominence and urgency of AI projects can even facilitate the broad deployment of such tools and techniques, where they had previously been relegated to specialised domains of data and analytics. It is important to note that as the SnapLogic platform facilitates connectivity and movement of data, none of the patterns of such movement are used to enrich the learnings of a LLM. In other words, pipelines act to transport encrypted data from source to destination without any discernment of the actual payload. No payload data and no business logic governing the movement of data are ever gleaned from such movement or used to train any models. In fact, the SnapLogic platform can be used to enhance data security at source and destination as highlighted above, adding guardrails to an AI system to enforce policies against publication of sensitive or otherwise restricted data. In general it is recommended for domain experts, technical practitioners, and other stakeholders to work together and analyze each proposed use case for AI, avoiding both reflexive refusals and blind enthusiasm, focusing instead on business benefit and how to achieve that in a specific regulatory or policy context. Auditability and Data Lineage The ability to audit the output of AI models is a critical requirement, whether for routine debugging, or in response to incoming regulation (e.g. the EU AI Act) that may require auditability for the deployment of AI technology in certain sectors or for particular use cases. For instance, use of AI models for decision support in legal cases or regulated industries, especially concerning health and welfare, may be subject to legal challenges, leading to requests to audit particular responses that were generated by the model. Commercial and legal concerns may also apply when it comes to use cases that may impinge on IP protection law. Complete forensic auditability of the sort that is provided by traditional software is not possible for LLMs, due to their non-deterministic nature. For this reason, deterministic systems may still be preferable in certain highly-regulated spaces, purely to satisfy this demand. However, a weaker definition of auditability is becoming accepted when it comes to LLMs, where both inputs and outputs are preserved, and the model is required to provide the source information used to generate that output. The source data is considered important both to evaluate the factual correctness of the answer, and also to identify any bias which may make its way into the model from its source data. These factors make auditability and data lineage critical part of the overall AI Strategy, which will have to be applied at various different stages of the solution lifecycle; Model creation and training - This aspect relates to how the model was created, and whether the data sets used to train the model have a risk of either skewing over time, or exposing proprietary information used during model development. Model selection - The precise version of AI model that was used to generate a response will need to be tracked, as even different versions of the same model may produce different responses to the same prompt. For this reason it is important to document the moment of any change in order to be able to track and debug any drift in response or behaviour. For external third-party AI services, these models may need to be tested and profiled as part of both an initial selection process and ongoing validation The reality is that there are is no single AI that is best for all use cases. Experience from real-world deployment of actual AI projects shows that some models are noticeably better for some functions than others, as well as having sometimes radically different cost profiles. Thiese factors means that most probably several different AI models will be used across an enterprise, and even (as agents) to satisfy a single use case. Prompt engineering - Unlike in traditional software development, by its very nature, prompt engineering includes the data sets and structures in the development cycle in a way that traditional functional coding practices do not. The models’ responses are less predictable, so understanding how the data will be processed is an integral part of the prompt engineering lifecycle. To understand how and why a set of prompts are put into production will be driven by both the desired functionality and the data that will be provided, in order to be able to review these inputs if issues arise in the production environment. Prompt evaluation in production - All critical systems today should have robust logging and auditing processes. In reality however many enterprises rarely achieve universal deployment, and coverage is often inconsistent. Due to the nature of AI systems, and notably LLMs, there will be a critical need to audit the data inputs and outputs. The model is effectively a black box that is not available for operators to reconstruct exactly why a given response was provided. This issue is especially critical when multiple AI models, agents, and systems are chained together or networked within a wider process. Logging the precise inputs that are sent to the model will be key for all these purposes. For custom-trained models, these requirements may also extend to the training data — although this case is presumed to remain relatively rare for the foreseeable future, given the prohibitive costs of performing such training. Where more common approaches (RAG, fine-training) are used that do not require an entire model to be trained from scratch, the audit would naturally focus on the inputs to the model and how those are managed. In both these cases, good information hygiene should be maintained, including preservation of historical data for point-in-time auditability. Backing up the data inputs is necessary but not sufficient: after all, a different LLM (or a subsequent version of the same LLM) may provide different responses based on the same prompt and data set. Therefore, if a self-trained LLM is employed, that model should also be backed up in the same way as the data that feeds it. If a public LLM is used, rigorous documentation should be maintained identifying any changes or version upgrades to the external model. All of this work is in addition to the tracking of the prompts and data inputs themselves, as described previously. All of these backups will in turn need to be preserved according to whatever evidentiary concerns are expected to apply. In the case of simple technical audits to ensure continuous improvements and avoid downward pressure on the quality of responses provided, organizations can make their own determination on the level of detail, the width of the time window to be preserved, and the granularity of the data. In more highly regulated scenarios, some or all of these elements may be mandated by outside parties. In those situations, the recommendation would also generally be to specify the backup policy defensively, to avoid any negative impacts in the case of future challenges. Development and Architecture Best Practices While AI systems have notable differences from earlier systems, they are still founded in large part on pre-existing components and techniques, and many existing best practices will still apply, if suitably modified and updated. CI/CD Continuous Integration and Continuous Deployment is of course not specific to Generative AI. However, as GenAI projects move from demo to production, and then evolve over subsequent releases, it becomes necessary to consider them as part of that process. Many components of a GenAI application are stateful, and the relationship between them can also be complex. A roll-back of a vector data store used to support a RAG application may have unforeseen effects if the LLM powering that RAG application remains at a different point of the configuration timeline. Therefore the different components of an AI-enabled system should be considered as tightly coupled for development purposes, as otherwise the GenAI component risks never becoming a fully-fledged part of the wider application environment. In particular, all of the traditional CI/CD concepts should apply also to the GenAI component: Continuous Development Continuous Testing Continuous Integration Continuous Deployment Continuous Monitoring Ensuring the inclusion of development teams in the process is unlikely to be a problem, as the field of AI is still evolving at breakneck pace. However, some of the later stages of an application’s lifecycle are often not part of the worldview of the developers of early demo AI applications, and so may be overlooked in the initial phases of productization of GenAI functionality. All of these phases also have specific aspects that should be considered when it comes to their application to GenAI, so they cannot simply be integrated into existing processes, systems, or modes of thought. New aspects of development and DevOps are needed to support, notable prompt engineering will have to be treated as code artifacts, but will also have to be associated with prompts such as model version, test data sets and samples of return data so that consistent functional management of the combined set of capabilities can be understood and tracked over time. QA and Testing Quality Assurance (QA) and testing strategies for AI projects, particularly GenAI, must address challenges that differ significantly from traditional IT projects. Unlike traditional systems where output is deterministic and follows predefined rules, GenAI systems are probabilistic and rely on complex models trained on vast datasets. A robust QA strategy for GenAI must incorporate dynamic testing of outputs for quality, coherence, and appropriateness across a variety of scenarios. This involves employing both automated testing frameworks and human evaluators to assess the AI's ability to understand prompts and generate contextually accurate responses, while also mitigating risks such as bias, misinformation, or harmful outputs. A GenAI testing framework should include unique approaches like model evaluation using synthetic and real-world data, stress testing for edge cases, and adversarial testing to uncover vulnerabilities such as the attack scenarios listed above. Frameworks such as CI/CD are essential but need to be adapted to accommodate iterative model training and retraining processes. Tools like Explainable AI (XAI) help provide transparency into model decisions, aiding in debugging and improving user trust. Additionally, feedback loops from production environments become vital in fine-tuning the model, enabling ongoing improvement based on real-world performance metrics rather than static, pre-defined test cases. However, depending on the use case and the data provided, such fine-tuning based on user behaviour may itself be sensitive and need to be managed with care. The QA process for GenAI also emphasizes ethical considerations and regulatory compliance more prominently than traditional IT projects. Testing needs to go beyond technical correctness to assess social impact, ensuring that the system avoids perpetuating harmful bias or misinformation. Continuous monitoring after deployment is crucial, as model performance can degrade over time due to shifting data distributions. This contrasts with traditional IT projects, where testing is often a finite phase before deployment. In GenAI, QA is an evolving, lifecycle-long endeavor requiring multidisciplinary collaboration among data scientists, ethicists, domain experts, and software engineers to address the complex, dynamic nature of generative models. Grounding, as an example, is a technique that can be used to help produce model responses that are more trustworthy, helpful, and factual. Grounding generative AI model responses means connecting them to verifiable sources of information. To implement groundin, usually means retrieving relevant source data. The recommended best practice is to use the retrieval-augmented generation (RAG) technique. Other test concepts include; Human-in-the-Loop Testing: Involves human evaluators judging the quality, relevance, and appropriateness of the model's outputs. Source data accuracy with details and data. Adversarial Testing: Actively trying to "break" the model by feeding it carefully crafted inputs designed to expose weaknesses and vulnerabilities Deployment This aspect might superficially be considered among the easiest to cover, but that may well not be the case. Most CI/CD pipelines are heavily automated; can the GenAI aspects be integrated easily into that flow? Some of the processes involved have long durations, e.g. chunking a new batch of information; can they be executed as part of a deployment, or do they need to be pre-staged so that the result can simply be copied into the production environment during a wider deployment action? Monitoring Ongoing monitoring of the performance of the system will also need to be considered. For some metrics, such as query performance or resource utilization, it is simply a question of ensuring that coverage also spans to the new GenAI experience. Other new metrics may also be required that are specific to GenAI, such as users’ satisfaction with the results they receive. Any sudden change in that metric, especially if correlated with a previous deployment of a change to the GenAI components, is grounds for investigation. While extensive best practices exist for the identification of technical metrics to monitor, these new metrics are still very much emergent, and each organization should consider carefully what information is required — or is likely to be required in the event of a future investigation or incident response scenario. Integration of AI with other systems and applications AI strategies are already moving beyond pure analytic or chatbot use case, as the agentic trend continues to develop. These services, whether home grown or hosted by third parties, will need to interface with other IT systems, most notably business processes, and this integration will need to be well considered to be successful. Today LLMs are producing return sets in seconds, and though the models are getting quicker, there is a trend to trade time for greater resilience of quality. How this trade-off is integrated into high performance business systems that operate many orders of magnitude faster will need to be considered and managed with care. Finally, as stated throughout this paper, AI’s non deterministic nature will mandate a focus on compensating patterns across the blend of AI and process systems. Recommendation While it is true that the specifics of AI-enabled systems differ from previous application architectures, general themes should still be carried over, whether by analogy, applying the spirit of the techniques to a new domain, or by ensuring that the more traditional infrastructural components of the AI application are managed with the same rigour as they would be in other contexts. Service / Tool Catalog The shift from content and chatbot experiences to agentic approaches imply a new fundamental architectural consideration of which functions and services should be accessible for use by the model. In the public domain there are simple patterns and models will mainly be operating with other public services and sources — but in the enterprise context, the environment will be more complex. Some examples of questions that a mature enterprise will need to address to maximize the potential of an agentic capability ; What are the “right” services? Large enterprises have hundreds, if not thousands, of services today, all of which are (or should be)managed according to what their business context is. A service management catalog will be key to manage many of these issues,as this will give a consistent point of entry to the service plane. Here again, pre-existing API management capabilities can ensure that the right access and control policies can be applied to support the adoption of composable AI-enabled applications and agents. When it comes to security profiling of a consuming LLM, the requester of the LLM service will have a certain level of access based on a combination of user role and security policy that is enforced. The model will have to pass on this to core systems at run time so that there are no internal data breaches. When it comes to agentic systems, new questions arise, beyond the simpler ones that apply to generative or conversational applications. For instance, should an agent be able to change a record? How much change should be allowed and how will this be tracked? Regulatory Compliance While the field of GenAI-enabled applications is still extremely new, best practices are beginning to emerge, such as those provided by the Open Web Application Security Project (OWASP). These cybersecurity recommendations are of course not guaranteed to cover any particular emerging regulation, but should be considered a good baseline which is almost certain to give a solid foundation from which to work to achieve compliance with national or sector-specific regulation and legislation as it is formalised. In general, it is recommended to ensure that any existing controls on systems and data sets, including RBAC and audit logs, are extended to new GenAI systems as well. Any changes to the new components — model version upgrades, changes to prompts, updates to training data sets, and more — will need to be documented and tracked with the same rigour as established approaches would mandate for traditional infrastructure changes. The points made previously about observability and auditability all contribute to achieving that foundational level of best-practice compliance. It is worth reiterating here that full coverage is expected to be an important difference between GenAI and previous domains. Compliance is likely to go far beyond the technical systems and their configurations, which were previously sufficient, and to require tracking of final prompts as supplied to models, including user input and runtime data. Conclusion Planning and managing the deployment and adoption of novel AI-enabled applications will require new policies and expertise to be developed. New regulation is already being created in various jurisdictions to apply to this new domain, and more is sure to be added in coming months and years. However, much as AI systems require access to existing data and integration with existing systems to deliver value at scale, existing policies, experience, and best practices can be leveraged to ensure success. For this reason, it is important to treat AI as an integral part of strategy, and not its own isolated domain, or worse, delegated to individual groups or departments without central IT oversight or support. By engaging proactively with users’ needs and business cases, IT leaders will have a much better chance of achieving measurable success and true competitive advantage with these new technologies — and avoiding the potential downsides: legal consequences of non-compliance, embarrassing public failures of the system, or simply incorrect responses being generated and acted upon by employees or customers.1.1KViews3likes0CommentsSnapLogic deployment on Kubernetes - A reference guide
Overview SnapLogic supports the deployment of Groundplexes on Kubernetes platforms, thus enabling the application to leverage the various capabilities of Kubernetes. This document explains a few best practice recommendations for the deployment of SnapLogic on Kubernetes along with a sample deployment example using GKE. The examples in this document are specific to the GKE platform however the concepts can be applied to other Kubernetes platforms such as AWS and Azure. Author: Ram Bysani SnapLogic Enterprise Architecture team Helm Chart A Helm chart is used to define the various deployment configurations for an application on Kubernetes. Additional information about Helm charts can be found here. The Helm chart package for a SnapLogic deployment can be downloaded from the Downloads section. It contains the following files: Artifact Comments values.yaml This file defines the default configuration for the SnapLogic Snaplex deployment. It includes variables like the number of JCC nodes, container image details, resource limits, and settings for Horizontal Pod Autoscaling (HPA). Reference: values.yaml Chart.yaml This file defines the metadata and version information for the Helm chart. templates folder This directory contains the Kubernetes manifest templates which define the resources to be deployed into the cluster. These templates are YAML files that specify Kubernetes resources with templating capabilities that allow for parameterization, flexibility, and reuse. templates/deployment.yaml This file defines a Kubernetes Deployment resource for managing the deployment of JCC instances in a cluster. The deployment is created only if the value of jccCount is greater than 0, as specified in the Helm chart's values.yaml file. templates/deployment-feed.yaml This file defines a Kubernetes Deployment resource for managing the deployment of Feedmaster instances. The deployment is conditionally created if the feedmasterCount value in the Helm chart's file values.yaml file is greater than 0. templates/hpa.yaml The hpa.yaml file defines a Horizontal Pod Autoscaler (HPA) resource for a Kubernetes application. The HPA automatically scales the number of pod replicas in a deployment or replica set based on observed metrics such as CPU utilization or custom metrics. templates/service.yaml The service.yaml file describes a Kubernetes service that exposes the JCC component of your Snaplex. It creates a LoadBalancer type service, which allows external access to the JCC components through a public IP address. The service targets only pods labeled as 'jcc' within the specified Snaplex and Helm release, ensuring proper communication and management. templates/service-feed.yaml The service-feed.yaml file describes a Kubernetes service that exposes the Feedmaster components. The service is only created if the value of feedmasterCount in the Helm chart’s values.xml file is > 0. It creates a LoadBalancer type service, which allows external access to the Feedmaster components through a public IP address. templates/service-headless.yaml The service-headless.yaml file describes a Kubernetes service for IPv6 communication. The service is only created if the value of enableIPv6 in the Helm chart’s values.xml file is set to true. Table 1.0 Helm Chart configurations Desired State vs Current State The configurations in the various yaml files (e.g. Deployment, HPA, values, etc.) represent the “Desired” state of a Kubernetes deployment. The Kubernetes controllers constantly monitor the Current state of the deployment to bring it in alignment with the Desired state. Horizontal Pod Autoscaling (HPA) Horizontal Pod Autoscaling (HPA) is a feature in Kubernetes that automatically adjusts the number of replicas (pods) for your deployments based on resource metrics like CPU utilization and memory usage. SnapLogic supports HPA for deployments in a Kubernetes environment. The add-on Metrics server must be installed. Reference: Metrics-Server. Metrics collection is enabled by default in GKE as part of Cloud Monitoring. Note that Custom Metrics and External Metrics, and Vertical Pod Autoscaling (VPA) are not supported for SnapLogic deployments on Kubernetes. Groundplex deployment in a GKE environment - Example In this section, we will go over the various steps for a SnapLogic Groundplex deployment in a GKE environment. Groundplex creation Create a new Groundplex from the Admin Manager interface. Reference: Snaplex_creation. The nodes for this Snaplex will be updated when the application is deployed to the GKE environment. New Snaplex creation GKE Cluster creation Next, we create the GKE cluster on the Google Cloud console. We have created our cluster in Autopilot mode. In this mode, GKE manages the cluster and node configurations including scaling, load balancing, monitoring, metrics, and workload optimization. Reference: GKE Cluster GKE cluster Configure the SnapLogic platform Allowlist Add the SnapLogic platform IP addresses to the Allowlist. See Platform Allowlist. In GKE, this is usually done by configuring an Egress Firewall rule on the GKE cluster. Please refer to the GKE documentation for additional details. Firewall rule - Egress Helm configurations values.yaml The below table explains the configurations for some of the sections from the values.yaml file which we have used in our set up. The modified files are attached to this article for reference. Reference: Helm chart configuration Section Comments # Regular nodes count jccCount: 3 # Feedmaster nodes count feedmasterCount: 0 This defines the number of JCC pods. We have enabled HPA for our test scenario, so the jccCount will be picked from the HPA section. (i.e. minReplicas and maxReplicas). The pod count is the number of pods across all nodes of the cluster. No Feedmaster pods are configured in this example. Feedmaster count can be half of the JCC pod count. Feedmaster is used to distribute Ultra task requests to the JCC pods. HPA configuration is only applicable to the JCC pods and not to the Feedmaster pods. # Docker image of SnapLogic snaplex image: repository: snaplogic/snaplex tag: latest This specifies the latest and most recent release version of the repository image. You can specify a different tag if you need to update the version to a previous release for testing, etc. # SnapLogic configuration link snaplogic_config_link: https://uat.elastic.snaplogic.com/api/1/rest/plex/config/ org/proj_space/shared/project Retrieve the configuration link for the Snaplex by executing the Public API. The config link string is the portion before ?expires in the output value of the API. Example: snaplogic_config_link: https://uat.elastic.snaplogic.com/api/1/rest/plex/config/ QA/RB_Temp_Space/shared/RBGKE_node1 # SnapLogic Org admin credential snaplogic_secret: secret/mysecret Execute the kubectl command: kubectl apply -f snapSecret.yaml Please see the section To create the SnapLogic secret in this document: Org configurations. # CPU and memory limits/requests for the nodes limits: memory: 8Gi cpu: 2000m requests: memory: 8Gi cpu: 2000m Set requests and limits to the same values to ensure resource availability for the container processes. Avoid running other processes in the same container as the JCC so that the JCC can have the maximum amount of memory. # Default file ulimit and process ulimit sl_file_ulimit: 8192 sl_process_ulimit: 4096 The value should be more than the # of slots configured for the node. (Maximum Slots under Node properties of the Snaplex). If not set, then the node defaults will be used. (/etc/security/limits.conf). The JCC process is initialized with these values. # JCC HPA autoscaling: enabled: true minReplicas: 1 maxReplicas: 3 minReplicas defines the minimum number of Pods that must be running. maxReplicas defines the maximum number of Pods that can be scheduled on the node(s). The general guideline is to start with 1:2 or 1:3 Pods per node. The replica Pods are across all nodes of a deployment and not per node. targetAvgCPUUtilization: 60 targetAvgMemoryUtilization: 60 To enable these metrics, the Kubernetes Metrics Server installation is required. Metrics collection is enabled by default in GKE as part of Cloud Monitoring. targetAvgCPUUtilization: Average CPU utilization percentage (i.e. 60 = 60%) This is the average CPU utilization across all Pods. HPA will scale up or scale down Pods to maintain this average. targetAvgMemoryUtilization: Average memory utilization percentage. This parameter is used to specify the average memory utilization (as a percentage of the requested memory) that the HPA should maintain across all the replicas of a particular deployment or stateful set. scaleDownStabilizationWindowSeconds: 600 terminationGracePeriodSeconds: 900 # Enable IPv6 service for DNS routing to pods enableIPv6: false scaleDownStabilizationWindowSeconds is a parameter used in Kubernetes Horizontal Pod Autoscaler (HPA) It controls the amount of time the HPA waits (like a cool-down period) before scaling down the number of pods after a decrease in resource utilization. terminationGracePeriodSeconds defines the amount of time Kubernetes gives a pod to terminate before killing it. If the containers have not exited after terminationGracePeriodSeconds, then Kubernetes sends a SIGKILL signal to forcibly terminate the containers, and remove the pod from the cluster. Table 2.0 - values.yaml Load balancer configuration The service.yaml file contains a section for the Load balancer configuration. Autopilot mode in GKE supports the creation of a Load balancer service. Section Comments type: LoadBalancer ports: - port: 8081 protocol: TCP name: jcc selector: A Load balancer service will be created by GKE to route traffic to the application’s pods. The external IP address and port details must be configured on the Settings tab of the Snaplex. An example is included in the next section of this document. Table 3.0 service.yaml Deployment using Helm Upload the helm zip file package to the Cloud Shell instance by selecting the Upload option. The default Helm package for SnapLogic can be downloaded from here. It is recommended to download the latest package from the SnapLogic documentation link. The values.yaml file with additional custom configurations (as described in Tables 2.0 / 3.0 above) is attached to this article. Execute the command on the terminal to install and deploy the Snaplex release with a unique name such as snaplogic-snaplex using the configurations from the values.yaml file. The release name is a unique identifier, and can be different for multiple deployments such as Dev / Prod, etc. helm install snaplogic-snaplex . -f values.yaml <<Output>> NAME: snaplogic-snaplex NAMESPACE: default STATUS: deployed REVISION: 5 TEST SUITE: None NOTES: You can run this command to update an existing deployment with any new or updated Helm configurations. helm upgrade snaplogic-snaplex . -f values.yaml View the deployed application under the Workloads tab on the Google Cloud Console. Workloads This command returns the HPA details. $ kubectl describe hpa Name: snaplogic-snaplex-hpa Namespace: default Labels: app.kubernetes.io/instance=snaplogic-snaplex app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=snaplogic-snaplex app.kubernetes.io/version=1.0 helm.sh/chart=snaplogic-snaplex-0.2.0 Annotations: meta.helm.sh/release-name: snaplogic-snaplex meta.helm.sh/release-namespace: default Deployment/snaplogic-snaplex-jcc Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 8% (153m) / 60% resource memory on pods (as a percentage of request): 28% (1243540138666m) / 60% Min replicas: 1 Max replicas: 3 Run the kubectl command to list the services. You can see the external IP addresses for the Load balancer service. kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 34.118.224.1 <none> 443/TCP 16d kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 34.118.224.1 <none> 443/TCP 16d snaplogic-snaplex-regular LoadBalancer 34.118.227.164 34.45.230.213 8081:32526/TCP 25m Update Load balancer url on the Snaplex Note the external IP address for the LoadBalancer service, and update the host and port on the Load balancer field of the Snaplex. Example: http://1.3.4.5:8081 Load balancer Listing pods in GKE The following commands can be executed to view the pod statuses. The pod creation and maintenance is fully managed by GKE. $ kubectl top pods $ kubectl get pods kubectl get pods --field-selector=status.phase=Running NAME READY STATUS RESTARTS AGE snaplogic-snaplex-jcc-687d87994-crzw9 0/1 Running 0 2m snaplogic-snaplex-jcc-687d87994-kks7l 1/1 Running 0 2m38s snaplogic-snaplex-jcc-687d87994-pcfvp 1/1 Running 0 2m24s View node details in the SnapLogic Monitor application Each pod represents a JCC node. The maxReplica value is set to 3 so you would see a maximum of 3 nodes (pods) deployed. (Analyze -> Infrastructure tab). Snaplex nodes The below command uninstalls and deletes the deployment from the cluster. All deployed services, metadata, and associated resources are also removed. helm uninstall <deployment_name> Pod registration with the SnapLogic Control Plane Scenario Comments How are the Pod neighbors resolved and maintained by the SnapLogic Control Plane? When a JCC/FeedMaster node (Pod) starts, it registers with the SnapLogic Control Plane, and the Control Plane maintains the list of Pod neighbors. When a JCC/FeedMaster node (Pod) registers, it also publishes its IP address to the Control Plane. An internal list of Pod IP addresses is updated dynamically for neighbor to neighbor communication. DNS resolution is not used. How are the container repository versions updated? The latest Snaplex release build is updated in the docker repository version tagged ‘latest’. The pods will be deployed with this version on startup by referencing the tags from the values.yaml file. If the Snaplex version is updated on the Control Plane to a different version (e.g. main-2872), then the JCC nodes (pods) will be updated to match that version (i.e. main-2872). Reference Groundplex Deployment on Kubernetes https://kubernetes.io/ GKE HPA1.5KViews2likes0CommentsSecuring SnapLogic APIs in Hybrid Deployments: The Role of WAF
Securing SnapLogic APIs in Hybrid Deployments: The Role of WAF APIs play a vital role in integrating on-premises, cloud-based, and third-party applications for SnapLogic integration workloads. As API connectivity scales over time, so does the need for robust security measures to protect these integration points from potential threats. This is where a Web Application Firewall (WAF) can be leveraged by organizations to ensure API security. A WAF, positioned between client applications and SnapLogic's Groundplex clusters (as seen in the diagrams), helps by inspecting and filtering traffic to and from SnapLogic's API endpoints. The WAF provides defense against a wide range of common web threats, including: SQL Injection Cross-site Scripting (XSS) Distributed Denial of Service (DDoS) attacks Brute-force attacks Organizations can implement a WAF in front of their SnapLogic's Groundplex clusters, whether in cloud environments like AWS, Azure, or on-premise data centers, to monitor and control API traffic. This ensures that only legitimate requests reach the integration layers, helping to prevent malicious traffic from compromising your critical data and services. The WAF inspects incoming API requests for common security threats, such as SQL injections, cross-site scripting (XSS), and other vulnerabilities, ensuring that integrations running in SnapLogic operate within a secure framework. This added layer of protection not only shields your infrastructure from external attacks but also helps maintain the integrity and performance of your API-driven workloads. Key Benefits of Deploying a WAF Enhanced API Protection: A WAF scrutinizes incoming requests, identifying and blocking malicious payloads, ensuring the APIs that connect your cloud apps and on-premise systems remain secure. Scalability and High Availability: In SnapLogic’s hybrid environments, including on-premise and cloud (Azure/AWS), a WAF helps ensure traffic is balanced and high availability is maintained, even during periods of peak demand. Compliance Support: Many industries require stringent security standards (e.g., HIPAA, GDPR). A WAF helps ensure SnapLogic's API traffic meets these regulatory requirements by preventing unauthorized data leakage and access. Traffic Filtering and Logging: WAFs can analyze traffic patterns and provide detailed logs of API interactions. This is valuable for detecting anomalies and improving incident response times. SnapLogic supports multiple deployment models, including on-premise and cloud configurations. Below are two typical deployment scenarios showing where WAF integrates into the SnapLogic runtime infrastructure (Snaplex) Single Region - Cloud-Native SnapLogic Deployment (Azure/AWS/GCP) In cloud-based deployments, organizations leverage platforms like Azure and AWS to scale SnapLogic integration workloads. A WAF (such as Azure Application Gateway) can be deployed in front of the API Gateway to add an additional security layer for all API interactions. This setup helps ensure that integrations can securely connect to a wide range of cloud apps and data sources, protecting them from external threats. On Premise - Multi Cluster Configuration In this example of an on-premise setup, an organization deploys a WAF (such as Akamai) in the network’s DMZ (Demilitarized Zone) to protect SnapLogic’s Groundplex clusters. The WAF inspects all incoming traffic from external clients and forwards only secure and legitimate API requests to the internal SnapLogic Groundplex nodes. This approach helps ensure that sensitive integration workflows, databases, and applications remain isolated from external threats. Traffic Flow Here’s a description of the flow of an API request as it passes through a Web Application Firewall (WAF) to the SnapLogic Snaplex infrastructure. 1. API Request from the Client Application Originating from the client (either a web application, mobile app, or another API client), the API request is sent over the internet to an endpoint. This request is typically directed at the API Gateway, which acts as the initial point of contact for all external API calls. The request contains various headers, data payloads, and parameters that specify what kind of operation (GET, POST, PUT, DELETE, etc.) the client wants to perform on the API. 2. Traffic Hits the Web Application Firewall (WAF) Before reaching the Snaplex infrastructure, the API request first passes through the WAF. The WAF is typically deployed between the public internet and the organization's internal network (cloud or on-premises). Inspection and Filtering: The WAF inspects the API request for any malicious content or behaviors that could indicate a security threat. This might include: SQL Injections Cross-Site Scripting (XSS) Distributed Denial of Service (DDoS) attacks Brute-force attacks Any other patterns that could compromise the API or application. Traffic Policies: Based on predefined security policies and rule sets (specific to the organization’s needs), the WAF determines if the request is safe to proceed or needs to be blocked. Requests that violate any of the rules (e.g., malformed headers, suspicious payloads, unexpected request methods) are blocked or redirected. 3. API Gateway or Load Balancer If the request passes through the WAF without being flagged as a security threat, it is forwarded to the organization’s API Gateway or load balancer. In a cloud-based architecture, this could be services like AWS Elastic Load Balancer or Azure Application Gateway, which manage API traffic and distribute it across backend resources. In an on-premise architecture, similar load balancing and routing components manage the flow. The API Gateway ensures that traffic is efficiently routed to the appropriate Snaplex nodes and that only valid, secure API requests proceed. 4. Reaching SnapLogic Groundplex Clusters After passing through the WAF and load balancer, the API request reaches SnapLogic's Groundplex clusters. Depending on the deployment (on-premise, AWS, Azure), the clusters can be distributed across different regions and environments. Within the Groundplex clusters, the request is processed by SnapLogic’s integration pipelines. The Groundplex cluster executes SnapLogic tasks, which involve data integration, orchestration, transformation, or connection to third-party applications, databases, or APIs. The request might trigger various integration workflows, such as: Connecting to an on-premise database (e.g., Oracle, MySQL) to retrieve or update data. Calling an external cloud-based service (e.g., Salesforce, Workday, etc.). Processing data transformations (ETL/ELT) in a data pipeline.1.6KViews1like1CommentAutomated Deployment (CICD) of SnapLogic assets with GitHub
Introduction This guide is a reference document for the deployment of SnapLogic assets to a GitHub repository. It also includes sample YAML code for a GitHub Actions workflow which can be used to automate the deployment of assets across orgs (Dev -> Stg / Stg -> Prod, etc.) This guide is targeted towards SnapLogic Environment Administrators (Org Administrators) and users who are responsible for the deployment of SnapLogic assets / Release management operations. Section B covers automated deployment with GitHub Actions, and Section A illustrates a manual deployment flow using the Manager interface. Author: Ram Bysani SnapLogic Enterprise Architecture team SnapLogic Git Integration Git Integration allows you to track, update, and manage versions of SnapLogic assets using the graphical interface or the public APIs. The following asset types can be tracked in a GitHub repository: Accounts Files Pipelines Tasks Git model A) Asset deployment across environments - an example The example in this document illustrates a sample deployment of SnapLogic assets from the Dev environment (org) to the Prod environment. A similar methodology can be adopted to deploy assets from Dev -> Stg -> Prod environments. The environments should be configured for Git integration with GitHub. Please refer to the steps in the documentation. Git Integration Git operations The assets in this example are tracked at a project space level, i.e. one Project Space in Dev is associated with a single branch in the GitHub repository. A single GitHub repository is used to maintain the branches for Dev, Stg, Prod, etc. Repository branches can also be deleted and re-created for specific deployment needs. New / Modified Assets in the Dev Org Project Space: Dev_Integration_Space with the below project folders having SnapLogic assets. Integration_Project_1, Integration_Project_2, share Prod Environment We have already defined an empty project space named Prod_GH_Integration in the Prod org. This step can also be done by using the SnapLogic public API. Project APIs. Define branches in the GitHub repository Create individual branches in the GitHub repository for the Dev and Prod project space assets. You can choose the main branch as the default branch while creating Dev_GH_Space. Choose the Dev_GH_Space branch as the source when creating the Prod_GH_Space branch. Each branch in the GitHub repository corresponds to a Project Space in SnapLogic. e.g.: Dev_GH_Space, Prod_GH_Space Commit Dev assets to GitHub Connect to the Dev (source) environment in the SnapLogic Manager interface, and navigate to the project space named Dev_GH_Integration_Space. Right click and select Git Repository Checkout. Choose the Git repository branch Dev_GH_Space. You can see that the Git status has changed to Tracked for all assets under the child projects. Note that some assets appear with status Untracked as these were already existing in the main branch. These assets would not be committed to the Git repository. Notice the tracking message with the branch name and commit id next to the project space name: Tracked with Git repository: byaniram/RB_Snaprepo/heads/Dev_GH_Space, commit: 9a22ac8 Connect to the GitHub repository and verify the commit status for the branch Dev_GH_Space. Create Pull Request in GitHub At this step, you would need to create a Pull Request in GitHub. Choose Prod_GH_Space as the base branch, and Dev_GH_Space as the compare branch, and create the Pull request. This action would merge the assets contained in the Dev_GH_Space branch into the Prod_GH_Space branch. Connect to the GitHub repository and verify the commit status for the branch Prod_GH_Space. The assets have now been committed to the Prod environment and are tracked in the GitHub repository under the branch - Prod_GH_Space. It is also possible to merge and pull from additional branch(es) into a single Prod_GH_Space if you have a need for it. You would need to repeat the Pull / Merge process as above with the base branch being Prod_GH_Space, and the compare branch being one of Dev_GH_Space, Dev_GH_Space_1, or Dev_GH_Space_2. Pulling / Committing assets into the Prod Org Connect to the Prod (target) environment in the SnapLogic Manager interface, and navigate to the project space named Prod_GH_Integration_Space. Right click and select Git Repository Checkout. Choose the Git repository branch Prod_GH_Space. Choose Git Pull to pull the assets into the Project space. The assets from the Dev_Integration_Space project space of the Dev environment are deployed to the Prod_Integration_Space project space of the Prod environment. Notice the tracking message with the branch name and commit id next to the project space name: Tracked with Git repository: byaniram/RB_Snaprepo/heads/Prod_GH_Space, commit: ce0c368 For subsequent deployments of changed assets, you would first do a Commit to Git for the project space in the SnapLogic Dev environment, followed by the above steps. Changed assets would be visible with a Git status of ‘Tracked, Modified locally’ in the SnapLogic Manager. B) Deployment Automation using a GitHub Actions Workflow Actions workflow YAML sample A GitHub Actions workflow can be used to automate the deployment of assets across SnapLogic environments (such as Dev to Stg, Stg to Prod, etc.). A workflow is a configurable automated process made up of one or more jobs. You must create a YAML file to define your workflow configuration. Here’s a complete YAML file for the Dev -> Prod deployment example described in Section A above. The complete YAML file is attached for your reference. Please create a new Workflow from the Actions tab, and paste the contents of the file in the workflow editor and commit changes. # Actions workflow for automated deployment of SnapLogic assets name: SnapLogic CICD Sample on: push: branches: - Dev_GH_Space # Uncomment the below line if you need to execute the workflow manually. # workflow_dispatch: jobs: pull_merge_branches: runs-on: ubuntu-latest steps: - name: Checkout repository uses: actions/checkout@v4 - name: Merge Dev to Prod uses: devmasx/merge-branch@master with: type: now from_branch: Dev_GH_Space target_branch: Prod_GH_Space github_token: ${{ secrets.ACTIONS_TOKEN }} - name: Checkout project assets to Prod project space run: | curl -s -X POST \${{vars.SNAP_URL}}/api/1/rest/public/project/pull/${{vars.SNAP_ORG}}/${{vars.PROJECT_SPACE}} \ -H "Content-Type:application/json" -H "Authorization:Basic ${{secrets.BASE64_TOKEN}}" \ -d '{"use_theirs":"true"}' Please refer to the GitHub documentation for information related to Workflow usage and syntax: GitHub Workflows Workflow syntax The following table provides clarification on certain aspects of the sample workflow for better understanding. Section Comments runs-on: ubuntu-latest runs-on defines the runner (type of machine) to use to run the job. ubuntu-latest specifies a GitHub hosted runner image. GitHub hosted runners uses: actions/checkout@v4 checkout is an action which is available in the GitHub marketplace. This action checks out the repository for use. v4 is the version number of the action. https://github.com/marketplace/actions/checkout uses: devmasx/merge-branch@master merge-branch is an action from the GitHub marketplace. This action runs a Git merge operation. https://github.com/marketplace/actions/merge-branch It also requires you to define a personal access token (classic) under Developer Settings -> Personal access tokens. Select both the repo and workflow checkboxes. curl -s -X POST \ ${{vars.SNAP_URL}}/api/1/rest/public/project/pull/${{vars.SNAP_ORG}}/${{vars.PROJECT_SPACE}} \ -H "Content-Type:application/json" -H "Authorization:Basic ${{secrets.BASE64_TOKEN}}" \ -d '{"use_theirs":"true"}' This is a CURL command that executes the SnapLogic public API to pull the latest project files from Git. See Pull the latest project files from Git. The referenced variables are defined on the GitHub repository under Settings -> Secrets and variables -> Actions. The vars context is used to reference those variables. (e.g. SNAP_ORG, PROJECT_SPACE) You can also define encrypted Secrets for sensitive data and reference them using the secrets context as in the example. (e.g. BASE64_TOKEN has the base64 encoded string for username and password). Workflow Variables Table 1.0 - Workflow Actions Workflow execution The above Actions workflow will be automatically executed whenever there is a “Push” / “Git Commit” operation to the Dev_GH_Space branch. i.e. whenever a commit is done from the Dev SnapLogic environment project space. The workflow will execute the pull-merge operation to the Prod_GH_Space branch, and pull the latest project assets into the Prod SnapLogic environment. The YAML file must be created under the .github/workflows folder of the Dev_GH_Space branch in the GitHub repository. The workflow run status will be visible under the Actions tab. Note: If you wish to manually execute the pull-merge post code review, then you can uncomment the two lines in the script to enable workflow_dispatch, and execute the Actions workflow manually from the Actions tab on GitHub. # Uncomment the below line if you need to execute the workflow manually. # workflow_dispatch: You can edit and modify the YAML file as per your requirements. Subsequent commits and deployments from Dev->Prod can be automated similarly. Action Comments Developer commits new code or updates assets in the Dev org to the GitHub repository SnapLogic Dev org Manager Interface Asset -> Add to repository. Ensure status shows Tracked Project Space -> Commit to Git Create and merge Pull Request Create a new Pull Request on GitHub, and merge the newly committed assets by choosing the Prod branch as the base, and the Dev branch as the compare branch. Pull the updated assets into the Prod org SnapLogic Prod org Manager Interface Project Space -> Git Pull Table 2.0 - Steps for subsequent / future asset deployment Deployment flow (Dev->Test->Prod) Note: Future versions of this document will cover additional deployment scenarios. Please post your comments on the article.8.8KViews3likes7CommentsSnaplex Capacity Tuning Guide
Introduction This document serves as a comprehensive best practice guide for developing efficient and robust Pipelines within the SnapLogic Platform. It offers guidelines that aim to optimize performance, enhance maintainability, reusability, and provide a basis for understanding common integration scenarios and how best to approach them. The best practices encompass various aspects of Pipeline design, including Pipeline behavior, performance optimization and governance guidelines. By adhering to these best practices, SnapLogic developers can create high-quality Pipelines that yield optimal results while promoting maintainability and reuse. The content within this document is intended for the SnapLogic Developer Community or an Architect, in addition to any individuals who may have an influence on the design, development or deployment of Pipelines within the SnapLogic platform. Authors: SnapLogic Enterprise Architecture team Snaplex Planning Snaplexes are a grouping of co-located nodes which are treated as a single logical entity for the purpose of Pipeline execution. The SnapLogic Control plane automatically performs load balancing of Pipeline workload within a Snaplex. Nodes in Snaplexes should be homogeneous, with the same CPU/memory/disk sizing and network configurations per node type (i.e. JCC / FeedMaster). The JCC and Feedmaster nodes in a Snaplex can be of different sizes. Examples of recommended configurations: Snaplex configurations JCC node count - 4 JCC node size for each node - Large Feedmaster node count - 2 Feedmaster node size for each node - Medium JCC node count - 4 JCC node size for each node - X-Large Feedmaster node count - 2 Feedmaster node size for each node - Large Object Definition Node A Node is a JVM (Java Virtual Machine) process which is installed on a server such as Windows or Linux. JCC Node The JCC node is responsible for: Preparation, validation, and execution of Pipelines. Send heartbeat to the Snaplogic Control plane indicating the health of the node. FeedMaster Node The FeedMaster node acts as an interface between the JCC nodes and the client. The main functions of a FeedMaster node are: Manage message queues. Send heartbeat to the SnapLogic Control plane indicating the health of the node. When setting up Snaplexes, it is recommended to plan out the number of Snaplexes to configure along with the usage criteria to achieve isolation across workloads. Snaplexes can be organized in various ways such as: Pipeline Workload - Organize Snaplexes by workload type: Batch, Low latency, and On-demand. Business Unit - Organize Snaplexes by business units. Geographical location - Organize Snaplexes by data center or geographic location. The recommendation is to use a combination of the above to optimize resource usage and achieve workload isolation. Snaplex Network Requirements Snaplexes should have the below network characteristics: Within a Snaplex: Less than 10 ms round trip latency between Snaplex nodes. Greater than 40 MB/sec throughput between Snaplex nodes. Snaplex to Control Plane: Less than 50 ms round trip latency to the SnapLogic Control plane. Greater than 20 MB/sec throughput to the SnapLogic Control plane. Pipeline Execute Pipeline execution using the Pipeline Execute Snap, nodes communicate with each other using HTTPS on port 8081. There is some resiliency to network failures and HTTPS requests are retried in the case of failures. Even though requests are retried, high network latency and dropped connections can result in Pipeline execution failures. Regular Pipeline executions run within a node, requiring no communication with other nodes in the Snaplex. When a Pipeline Execute Snap is used to run child Pipelines, there are three options: Option Comments LOCAL_NODE This option is recommended when the child Pipeline is being used for Pipeline structuring and reuse rather than Pipeline workload distribution. Use this option for most regular child Pipeline executions. LOCAL_SNAPLEX The network communication is optimized for streaming data processing since the child Pipeline is on the local Snaplex. Use this option only when workload distribution within the Snaplex is required. SNAPLEX_WITH_PATH This has high dependency on the network. The network communication is optimized for batch data processing since the child Pipeline is on a remote Snaplex. Use this option only when the child Pipeline has to run on a different Snaplex, either because of endpoint connectivity restrictions or for workload distribution. Ultra Pipelines The JCC nodes communicate with the FeedMaster nodes over TCP with SSL on port 8084 when executing Ultra Pipelines. The communication between nodes is based on a message queue. This communication is not resilient to network failure, so a reliable network is required between the Snaplex nodes for Ultra Pipeline processing. In case of any network failures, the currently processing Ultra requests will be retried or in some instances fail with errors. If there is a communication failure between the JCC and Feedmaster nodes, then the request will be retried for up to five times. This is controlled by the ultra_max_redelivery_count Snaplex configuration. There is an overall 15-minute timeout for an Ultra request to the Feedmaster that is configurable at the request level using the X-SL-RequestTimeout HTTP request header or at the Snaplex level by using the llfeed.request_timeout config setting. Note that both ultra_max_redelivery_count and llfeed.request_timeout are configured under Node Properties -> Global Properties for GroundPlexes. You can submit a support request to configure these properties for your Cloudplexes. Pipeline Load Balancing The Control plane performs load balancing for Pipeline execution requests on a Snaplex. The following table lists the configurations that are involved: Property / Threshold Where configured Default value Comments Maximum Slots Node properties tab of the Snaplex 4000 One slot = One Snap = One active thread on the node A percentage of slots (configurable with the Reserved slot % property) are reserved for interactive Pipeline executions and validations thru the Designer tool. Pipelines will be queued if the threshold is reached. Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output, can use a higher number of threads compared to other Snaps. Maximum memory % Node properties tab of the Snaplex 85 (%) Threshold at which no more Pipelines will be assigned to a node Snaplex node resources (CPU, FDs, Memory) Node server configurations Configurable If the Control plane detects that there are not enough resources available on the Snaplex, then the Pipeline execution requests will be queued up on the control plane, and resume when resources are available. The Control plane dispatches the Pipeline to the node which has the most available capacity in terms of CPU/memory and file descriptors. For child Pipeline executions using the Pipeline Execute Snap, there is a preference given for running the child on the local node to avoid the network transfer penalty. Table 1.0 Configurations for Pipeline load balancing Snaplex Resource Management Capacity Planning This section provides some guidelines for Snaplex capacity planning and tuning. Configuration / Use-case Comments Workload isolation Isolate workloads across Snaplexes based on workload type, geographic location, and business unit. Node sizing Size the node (CPU, RAM, disk space) in a Snaplex based on Pipeline workload type. Batch data processing needs larger nodes while Streaming/API processing can use smaller nodes. Maximum Slots One slot = One Snap = One active thread on the node A percentage of slots (configurable with the Reserved slot % property) are reserved for interactive Pipeline executions and validations thru the Designer tool. Pipelines will be queued if the threshold is reached. Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output, can use a higher number of threads compared to other Snaps. The general recommendation is to configure this property based on the node memory configuration. Example: 8 GB - 2000 Slots 16 GB - 4000 Slots API Workloads For API workloads, the rule of thumb is to have 100 active ultra API calls per 8 GB of RAM, or 20 active triggered API calls per 8 GB of RAM. So a 16 GB node can have 200 active ultra API calls or 40 active triggered API calls. Node sizing The number of nodes in a Snaplex can be estimated based on the count of batch and streaming Pipelines. The number of FeedMaster nodes can be half of the JCC node count, with a minimum of two recommended for high availability. For active Pipeline count estimates, error Pipelines can be excluded from the count since they do not consume resources under the normal workload. Table 1.1 Configurations for Snaplex capacity planning Capacity Tuning Below are some best practices for Snaplex capacity tuning: Configuration / Use-case Comments Slot counts The Maximum slot count can be tuned based on the alerts and dashboard events. It is not required to restart the nodes for this configuration to take effect. Queued Pipelines - Increase slot count by 25% Busy nodes - Reduce slot count by 25% The slot count should not be set to more than 50% above the recommended value for the node configuration. e.g. The recommended slot count on a node with 16 GB RAM is 4000. Setting it to higher than 6000 is not advisable.If you observe high CPU / memory consumption on the node despite lowering the slot count by 25%, then consider allocating additional resources to the Snaplex nodes. Workloads Batch Workloads: Expand the node memory up to 64 GB, and deploy additional nodes for increased capacity. API Workloads: Deploy additional nodes instead of expanding the memory on the current node. Active Pipelines As a general rule, it's suggested to maintain fewer than 500 active Pipeline instances on a single node. Exceeding this threshold can lead to communication bottlenecks with the Control plane. If the number of active Pipeline instances exceeds 500, then the advisable course of action is to consider the addition of more nodes. CPU CPU consumption can be optimized by setting the Pool size and Batch size options on Pipeline Execute Snaps. Memory See Table 3.0 below Additional Reference: Optimizations for Swap Memory Table 2.0 Configurations for Snaplex capacity tuning Memory Configuration thresholds Property / Threshold Where configured Default value Comments Maximum memory % Node properties tab of the Snaplex 85 (%) Threshold at which no more Pipelines will be assigned to a node Pipeline termination threshold Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common.SnapThreadStatsPoller. MEMORY_HIGH_WATERMARK_PERCENT) 95 (%) Threshold at which the active Pipeline management feature kicks in and terminates pipelines when the node memory consumption exceeds the threshold. Ideal range: 75-99 Pipeline restart delay interval Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common.SnapThreadStatsPoller. PIPELINE_RESTART_DELAY_SECS) 30 (seconds) One Pipeline is terminated every 30 seconds until the node memory goes below the threshold (i.e. goes below 95%) Table 3.0 Snaplex node memory configurations The above thresholds can be optimized to minimize Pipeline terminations due Out-of-Memory exceptions. Note that the memory thresholds are based on the Physical memory on the node, and not the Virtual / Swap memory. Snaplex Alerts SnapLogic supports alerts and notifications through email and Slack channels. These can be configured in the Manager interface under Settings. The recommended alerts are listed in the table below. Alert type Comments Snaplex status alerts Status alerts can be created at the org level or the Snaplex level (in the Snaplex properties). These allow notifications to be sent when the Snaplex node is unable to communicate with the SnapLogic control plane or there are other issues detected with the Snaplex. Snaplex Resource usage alerts Set up alerts for these event types: Snaplex congestion Snaplex load average Snaplex node memory usage Snaplex node disk usage Table 4.0 Recommended Snaplex Alerts Reference: Alerts Slack notifications11KViews3likes5CommentsAPI Measurement and KPIs
When measuring the effectiveness and success of an API strategy, several Key Performance Indicators (KPIs) can be used to provide valuable insights into its performance. Historically integration KPI’s have been overwhelmingly technical in nature, for API Strategies there needs to be a wider set of measurements. Business Impact, API usage and adoption metrics to broader business goals. This document gives examples of how to measure success and include example KPIs Author: Guy Murphy How to measure the success of an API strategy? When measuring the effectiveness and success of an API strategy, several Key Performance Indicators (KPIs) can be used to provide valuable insights into its performance. Historically integration KPI’s have been overwhelmingly technical in nature, for API Strategies there needs to be a wider set of measurements. Business Impact, API usage and adoption metrics to broader business goals. For example, you could assess how API usage affects customer acquisition, retention, or revenue generation. This for many Enterprises API strategies are aspects of a wider strategy such as new digital products and services or internal IT strategies such as Composable Enterprise, Data Mesh and/or Data as a Service. Another concept is that of the Community that is the developers that use an API. The specific KPIs you choose may vary depending on your organization's strategy and the types of services that the API’s strategy supports. Technical KPI’s API Usage Metrics: Keep track of the number of API calls made over a specific period. This metric can give you an overall idea of how much your APIs are being utilized by developers or consumers. API Re-Use: Measure how many API’s are supporting multiple use cases, this can correlate a reduction of p2p patterns. User Engagement: Measure the number of unique users or applications accessing your APIs. Understanding who is using your APIs can help you target your efforts better. Response Time and Latency: Monitor the average response time and latency of your APIs. Faster response times usually lead to higher user satisfaction. API SLA: While one of the most basic metrics, API Service Level Agreements (SLA) notably measure when SLA’s are not achieve. Total pass and error rates: Measuring how often APIs trigger HTTP error (non‑200) status codes helps you understand how error‑prone your APIs may be. This aggregate measure provides information to help judge the overall quality of the APIs. API Traffic by Source: Analyse the sources of API traffic, such as mobile apps, web applications, or partner integrations. This information can help you prioritize support and improvements for integrations. API usage growth: This metric also measures API adoption and is often the preferred metric for doing so. Ideally, API traffic grows monthly as the number of applications and developers using them also increases API Version Adoption: Keep an eye on the adoption rate of new API versions. Encouraging developers to use the latest version can help manage technical debt and ensure better functionality. Security and Compliance Metrics: Keep track of security-related metrics, such as the number of security incidents, security vulnerabilities found, and API access logs to ensure data privacy and compliance. Cost Efficiency: Analyse the cost of maintaining and operating your APIs in relation to the value they bring to your organization. Assess whether the investment in API development aligns with the expected returns. User/Community Support One of the critical aspects of an API strategy is the ability for projects and developers to self-service to API in the appropriate manner. User Engagement: Measure the number of unique users or applications accessing your APIs. Understanding who is using your APIs can help you target your efforts better. Developer Onboarding Time: Measure how long it takes for developers to start using your APIs after registration. A shorter onboarding time implies developer-friendly documentation and an intuitive API design. API Documentation Quality: Evaluate the clarity, completeness, and ease of understanding of your API documentation. High-quality documentation can enhance developer experience and attract more users. Community Feedback: this indicates active support and interest in using API’s Community Response Rate: How quickly are community questions resolved. Community Response Rate: How quickly are community questions resolved. Product Metrics For APIs that are digital services that are partner or customer facing, a different set of KPIs should be considered to those previously described. Direct and indirect revenue: These metrics target the different ways APIs contribute to revenue. While some APIs are directly monetized, others support integrations with business partners or are third-party integrations valued by customers. As with the adoption rate for your APIs, tracking indirect revenue helps developers build revenue-generating apps for partners. Applications per API: APIs need to be reusable. This metric measures how many applications integrate with an API to see which APIs provide the most value. Number of partners: APIs often enable business relationships. Tracking the number of partner API integrations helps drive adoption and demonstrate value to other business units. Partner-Developer churn: This can be an indicator that the service, support or business case of an API service should be reviewed in depth. Service Failure Rate: Track the rate of errors and failures occurring in API calls this should include the incorrect or poor data quality of the service. Conclusion There are several different dimensions to measuring the success of an API strategy, these will often be aimed at different personas within the organization, these are some common examples of such measurements but are not exhaustive especially as measurement of a product is and should be highly reflective of the service exposed by the API(s) and its impact on the process/business.GuyM2 years agoEmployee1.9KViews4likes0CommentsPipeline Design and Performance Optimization Guide
Introduction This document serves as a comprehensive best practice guide for developing efficient and robust Pipelines within the SnapLogic Platform. It offers guidelines that aim to optimize performance, enhance maintainability, reusability, and provide a basis for understanding common integration scenarios and how best to approach them. The best practices encompass various aspects of Pipeline design, including Pipeline behavior, performance optimization and governance guidelines. By adhering to these best practices, SnapLogic developers can create high-quality Pipelines that yield optimal results while promoting maintainability and reuse. The content within this document is intended for the SnapLogic Developer Community or an Architect, in addition to any individuals who may have an influence on the design, development or deployment of Pipelines. Authors: SnapLogic Enterprise Architecture team Why good Pipeline Design is important The SnapLogic Pipeline serves as the foundation for orchestrating data across business systems, both within and outside of an organization. One of its key benefits is its flexibility and the broad range of "Snaps" that aim to reduce the complexity involved in performing specific technical operations. The “SnapLogic Designer”, a graphical low-code environment for building an integration use case with Snaps, provides a canvas enabling users with little technical knowledge to construct integration Pipelines. As with any user-driven environment, users must exercise careful attention to ensure they not only achieve their desired business goals but also adhere to the right approach that aligns with industry and platform best practices. When dealing with a SnapLogic Pipeline, these best practices may encompass various considerations: Is my Pipeline optimized to perform efficiently? Will the Pipeline scale effectively when there's an increase in data demand or volume? If another developer were to review the Pipeline, would they easily comprehend its functionality and intended outcome? Does my Pipeline conform to my company's internal conventions and best practices? Not considering these factors may cause undesirable consequences for the business and users concerned. Relative to the considerations stated above, these consequences could be as follows: If data is not delivered to the target system, there may be financial consequences for the business. The business may experience data loss or inconsistency when unexpected demand occurs. Development and project teams are impacted if they are unable to deliver projects in a timely fashion. Lack of internal standardization limits a company's ability to govern usage across the whole business, thus making them less agile. Therefore, it is essential that users of the Platform consider best practice recommendations and also contemplate how they can adopt and govern the process to ensure successful business outcomes. Understanding Pipeline Behavior To better understand how Pipelines can be built effectively within SnapLogic, it is essential to have an understanding of the Pipeline’s internal characteristics and behaviors. This section aims to provide foundational knowledge about the internal behavior of Pipelines, enabling you to develop a solid understanding of how they operate and help influence better design decisions. Pipeline Execution States The execution of a SnapLogic Pipeline can be initiated either via a Triggered, Ultra or Scheduled task. In each case, the Pipeline transitions through a number of different ‘states’ with each state reflecting a distinct processing the lifecycle of the Pipeline, from invocation, preparation, execution to completion. The following section of the document will look to highlight this process in more detail and explain some of the internal behaviors. The typical Pipeline execution flow is as follows: Initialize Pipeline. Send Metadata to Snaplex. Prepare Pipeline, fetch & decrypt account credentials. Connect to endpoint security. Send execution metrics. Pipeline completes, and resources are released. The following section describes the different Pipeline state transitions & respective behavior in sequential order. State Purpose NoUpdate A pre-preparing state. This indicates a request to invoke a Pipeline has been received but the leader node or control plane is trying to establish which Snaplex node it should run on. (This state is only relevant if the Pipeline is executed on the leader node). Preparing Indicates the retrieval of relevant asset metadata including dependencies from the control plane relating to the invoked Pipeline. This process also carries out pre-validation of snap configuration alerting the user of any missing mandatory snap attributes. Prepared Pipeline is prepared and is ready to be executed Executing Pipeline executes and processes data, connecting to any Snap Endpoints using the specified protocols. Completed Pipeline execution is complete and the teardown resulting in the releasing of compute resources within the Snaplex Node. Final Pipeline execution metrics and sent to the Control Plane. Table 1.0 Pipeline state transitions Pipeline execution flow Pipeline Design Decision Flow The following decision tree can be used to establish the best Pipeline Design approach for a given use case. Snap Execution Model Snaps can be generally categorized into these types: Fully Streaming Most Snaps follow a fully streaming model. i.e. Read one document from the Input view (or from the source endpoint for Read Snaps), and write one document to the Output view or to the target endpoint. Streaming with batching Some Snaps are streaming with batching behavior. For example, the DB Insert Snap reads N documents and then makes one call to the database (where N is the batch size set in the database account). Aggregating Aggregating type Snaps (e.g. Aggregate, Group By, Join, Sort, Unique etc.) read all input documents before any output is written to the Output view. Aggregating Snaps can change the Pipeline execution characteristics significantly as these Snaps must receive all upstream documents before processing and sending the documents to the downstream Snaps. Pipeline Data Buffering Connected Snaps with a Pipeline communicate with one another using Input and Output views. An Input view accepts data being passed from an upstream snap, it operates on the data and then passes the data to its Output view. Each view implements a separate in-memory ring buffer at runtime. Given the following example, the Pipeline will have three separate ring buffers. These are represented by the circular connections between each snap (diamond shaped connections for binary Snaps). The size of each ring buffer can be configured by setting the below feature flags on the org. The default values are 1024 and 128 for DOCUMENT and BINARY data formats respectively. com.snaplogic.cc.jstream.view.publisher.AbstractPublisher.DOC_RING_BUFFER_SIZE=1024 com.snaplogic.cc.jstream.view.publisher.AbstractPublisher.BINARY_RING_BUFFER_SIZE=128 The values must be set as powers of two. The source Snap reads data from the endpoint and writes to the Output view. If the buffer is full (i.e. if the Consumer Snap is slow), then the Producer Snap will block on the write operation for the 1025th document. Pipeline branches execute independently. However in some cases, the data flow of a branch in a Pipeline can get blocked until another branch completes streaming the document. Example: A Join Snap might hang if its upstream Snaps (e.g. Copy, Router, Aggregator, or similar) has a blocked branch. This can be alleviated by setting Sorted streams to Unsorted in the Join Snap to buffer all documents in input views internally. The actual threads that a Pipeline consumes can be higher than the number of Snaps in a Pipeline. Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output, can use a higher number of threads compared to other Snaps. Sample Pipeline illustration for threads and buffers The following example Pipeline demonstrates the practical example of how the usage and composition of Snaps within a Pipeline change the characteristics of how the Pipeline will operate once it is executed. Segment 1 Segment 2 Six threads are initialized at Pipeline startup. There are a total of seven ring buffers. The Copy Snap has two buffers, all other Snaps have one output buffer each. There are two segments that run in parallel and are isolated (other than the fact that they run on the same node, sharing CPU/memory/IO bandwidth). The first segment has two branches. Performance of one branch can impact the other. For example, if the SOAP branch is slow, then the Copy Snap’s buffer for the SOAP branch will get full. At this point, the Copy Snap will stop processing documents until there is space available in the SOAP branch’s buffer. Placing an aggregating Snap like the Sort Snap in the slow branch changes the performance characteristics significantly as the Snap must receive all upstream documents before processing and sending the documents to the downstream Snaps. Memory Configuration thresholds Property / Threshold Where configured Default value Comments Maximum memory % Node properties tab of the Snaplex 85 (%) Threshold at which no more Pipelines will be assigned to a node Pipeline termination threshold Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common. SnapThreadStatsPoller. MEMORY_HIGH_WATERMARK_PERCENT) 95 (%) Threshold at which the active Pipeline management feature kicks in and terminates Pipelines when the node memory consumption exceeds the threshold. Ideal range: 75-99 Pipeline restart delay interval Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common. SnapThreadStatsPoller. PIPELINE_RESTART_DELAY_SECS) 30 (seconds) One Pipeline is terminated every 30 seconds until the node memory goes below the threshold (i.e. goes below 95%) Table 2.0 Snaplex node memory configurations The above thresholds can be optimized to minimize Pipeline terminations due Out-of-Memory exceptions. Note that the memory thresholds are based on the Physical memory on the node, and not the Virtual / Swap memory. Additional Reference: Optimizations for Swap Memory Hypothetical scenario Add 16 GB swap memory to a Snaplex node with 8 GB physical memory. Property Comments Swap Space on the server Add 16 GB of swap / virtual memory to the node. Total Memory Total Memory is now = 24 GB (8 GB Physical plus 16 GB Virtual) Maximum Heap Size Set to 90% (of 24 GB) = 22 GB Maximum Memory Set to 31% rounded (of 22 GB) = 7 GB The intent of the above calculation is to ensure that the JCC utilizes 7GB of the available 8GB memory for normal workloads. Beyond that, the load balancer can queue up additional Pipelines or send them to other nodes for processing. If Pipelines that are running collectively start using over 7GB of memory, then the JCC can utilize up to 22GB of the total heap memory by using the OS swap space per the above configuration. Table 3.0 Snaplex node memory configurations By updating the memory configurations as in the above example, the JCC utilizes 7 GB of the available 8 GB memory. Beyond that value, the load balancer would queue up additional Pipelines or distribute them across other nodes. Use the default configurations for normal workloads, and use Swap-enabled configuration for dynamic workloads. When your workload exceeds the available physical memory and the swap is utilized, the JCC can become slower due to additional IO overhead caused by swapping. Hence, configure a higher timeout for jcc.status_timeout_seconds and jcc.jcc_poll_timeout_seconds for the JCC health checks. We recommend that you limit to 16 GB the maximum swap to be used by the JCC. Using a larger swap configuration causes performance degradation during the JRE garbage collection operations Modularization Modularization can be implemented in SnapLogic Pipelines by making use of the Pipeline Execute Snap. This approach enables you to: Structure complex Pipelines into smaller segments through child Pipelines. Initiate parallel data processing using the Pooling option. Reuse child Pipelines. Orchestrate data processing across nodes, within the Snaplex or across Snaplexes. Distribute global values through Pipeline parameters across a set of child Pipeline Snaps. Modularization best practices: Modularize by business or technical functions. Modularize based on functionality and avoid deep nesting or nesting without a purpose. Modularize to simplify overly-complex Pipelines and reduce in-page references. Use the Pipeline Execute Snap over other Snaps such as Task Execute, ForEach, Auto-router (i.e. Router Snap with no routes defined with expressions), or Nested Pipelines. Pipeline Reuse with Pipeline Execute Detailed documentation with examples can be found in the SnapLogic documentation for Pipeline Execute. Use Pipeline Execute when: The child Pipeline is CPU/memory heavy and parallel processing can help increase throughput. Avoid when: The child Pipeline is lightweight where the distribution overhead can be higher than the benefit. Additional recommendations and best practices for the Pipeline Execute Snap: Use Reuse mode to reduce child runtime creation overhead. Reuse mode allows each child Pipeline instance to process multiple input documents. Note that the child Pipeline must be a streaming Pipeline for reuse mode. Use the batching (Batch size) option to batch data (avoid grouping records in parent). Use the Pool size (parallelism) option to add concurrency. If the document count is low then use the Pipeline Execute Snap for structuring Pipelines else embed the child segment within the Parent Pipeline instead of using Pipeline Execute. Set the Pool Size to > 1 to enable concurrent executions up to the specified pool size. Set Batch Size = N (where N > 1). This sends N number of documents to the child Pipeline input view. Use Execute On to specify the target Snaplex for the child Pipeline. Execute On can be set to one of the below values: LOCAL_NODE. Runs the child Pipeline on the same node as the parent Pipeline. This is recommended when the child Pipeline is being used for Pipeline structuring and reuse rather than Pipeline workload distribution. This option is used for most child Pipeline executions. LOCAL_SNAPLEX. Runs the child Pipeline on one of the available nodes in the same Snaplex as the parent Pipeline. The least utilized node principle is applied to determine the node where the child Pipeline will run.This has dependency on the network, and must be used when workload distribution within the Snaplex is required. SNAPLEX_WITH_PATH. Runs the child Pipeline on a user-specified Snaplex. This allows high workload distribution, and must be used when the child Pipeline has to run on a different Snaplex for endpoint connectivity restrictions or for effective workload distribution. This option also allows you to use Pipeline parameters to define relative paths for the Snaplex name. Additional Pipeline design recommendations This section lists some recommendations to improve Pipeline efficiency SLDB Note: SLDB should not be used as a file source or as a destination in any SnapLogic orgs (Prod / Non-Prod). You can use your own Cloud storage provider for this purpose. You may encounter issues such as file corruption, pipeline failures, inconsistent behavior, SLA violations, and platform latency if using SLDB instead of a separate Cloud storage for the file store. This applies to all File Reader / Writer Snaps and the SnapLogic API. File Read from an SLDB File source. File Write operations to SLDB as a destination. Use your own Cloud storage instead of SLDB for the following (or any other) File Read / Write use-cases: Store last run timestamps or other tracking information for processed documents. Store log files. Store other sensitive information. Read files from SLDB store. Avoid using Record Replay Snap in Production environments as the recorded documents are stored in an SLDB path making them visible to users with Read access. Snaps Enable Pagination for Snaps where supported (e.g. REST Snaps, HTTP Client, GraphQL, Marketo, etc.). There should also always be a Pagination interval to ensure that too many requests are not made in a short time. Use the Group By N Snap where there is a requirement to limit request sizes. E.g. Marketo API request. The Group By Fields Snap creates a new group every time a record with a different Group Field value is received. Place a Sort Snap before Group By Fields to avoid multiple sets of documents with the same group value. XML Parser Snap with a Splitter expression reduces memory overhead when reading large XML files. Use an Email Sender Snap with a Group By Snap to minimize the number of emails that get sent out. Pipelines Batch Size (only available if the Reuse executions option is not enabled) is used to control the amount of records that are passed into a child Pipeline. Setting this value to 1 will pass a single record for each instance of the child Pipeline. Avoid using this approach when processing large volumes of documents. Do not schedule a chain reaction. When possible, separate a large Pipeline into smaller pieces and schedule the individual Pipelines independently. Distribute the execution of resources across the timeline and avoid a chain reaction. Integration API limits must not exceed across all integrations running at the same time. Group By Snaps or Pipeline Execute can be used to achieve this. Optimization recommendations for common scenarios Scenario Recommendation Feature(s) Multiple Pipelines with similar structure Use parameterization with Pipeline Execute to reuse Pipelines Pipeline Execute Pipeline parameters Bulk Loading to target datasource Use Bulk Load Snaps where available (e.g. Azure SQL - Bulk Load, Snowflake - Bulk Load) Bulk Loading Mapper snap contains a large amount of mappings where the source & target field names are consistent Enable “Pass through” setting on the Mapper. Mapper - Pass Through Processing large data loads Perform target load operation within a Child Pipeline using the “Pipeline Execute” snap with “Execute On” set to “LOCAL_SNAPLEX”. Pipeline Execute Performing complex transformations and/or JOIN/SORT operations across multiple tables Perform transformations & operations within SQL query SQL Query Snaps High Throughput Message Queue to Database ingestion Batch polling and ingestion of messages by: Specifying matching values for Max Poll Record (Consumer Snap) with Batch Size (Database Account Setting). Performing database ingestion within a child Pipeline with Reuse Enabled on the Pipeline Execute Snap. Consumer Snaps Database Load Snaps Table 4.0 Optimization recommendations Configuring Triggered and Ultra Tasks for Optimal Performance Ultra Tasks Definition and Characteristics An Ultra Task is a type of task which can be used to execute Ultra Pipelines. Ultra Tasks are well-suited for scenarios where there is a need to process large volumes of data with low latency, high throughput, and persistent execution. While the performance of an Ultra Pipeline largely depends on the response times of the external applications to which the Pipeline connects to, there are a number of best practice recommendations that can be followed to ensure optimal performance and availability. General Ultra Best Practices Before building an Ultra Pipeline, consult the “Snap Support for Ultra Pipelines” documentation to understand if the desired Snaps are supported. For optimal Ultra performance, deploy a dedicated Snaplex to support Ultra workloads. There are two modes of Ultra Tasks - Headless Ultra and Low Latency Ultra API with each mode being characterized by the design of the Pipeline which is invoked by the Ultra Task. The modes are described in more detail below. Headless Ultra A Headless Ultra Pipeline is an Ultra Pipeline which does not require a Feedmaster, and where the data source is a Listener or Consumer type construct, for example Kafka Consumer, File Poller, SAP IDOC Listener (For a detailed list of supported Snaps, please click here). The Headless Ultra Pipeline executes continuously and polls the data source according to the frequency configured within the Snap passing documents from the source to downstream Snaps. Use Cases Processing real-time data streams such as message queues. High volume message or file processing patterns with concurrency. Publish/Subscribe messaging patterns. Best Practices Deploy multiple instances of the Ultra Task for High Availability. Decompose complex Pipelines into independent Pipeline using a Publish-Subscribe pattern. Lower the dependency on the Control Plane by avoiding the use of expressions to declare queue names, account paths etc. Set the ‘Maximum Failures’ Ultra Task configuration threshold according to the desired tolerance for failure. For long running Ultra Pipelines, set a higher ‘Max In-Flight’ option to a higher value within the Ultra Task configuration. When slow performing endpoints are observed within the Pipeline, use the Pipeline Execute Snap with Reuse mode enabled and the Pool Size field set to > 1 to create concurrency across multiple requests to the endpoint. Additional reference: Ultra Tasks Low Latency API Ultra Low Latency API Ultra is a high-performance API execution mode designed for real-time, low-latency data integration and processing. The Pipeline invoked by the Ultra Task is characterized by having an open input view for the first Snap used in the Pipeline (typically a HTTP Router or Mapper Snap). Requests made to the API are brokered through a ‘FeedMaster Node’, guaranteeing at least once message delivery. Use Cases High frequency & high throughput request-response use cases. Sub-second response times requirement. Best Practices Deploy multiple Feedmasters for High Availability. Deploy multiple instances of the Ultra Task for High Availability running within the same Snaplex. Leverage the ‘Alias’ setting within the Ultra Task configuration to support multi Snaplex High Availablity. To support unpredictable high volume API workloads, leverage the ‘Autoscale based on Feedmaster queue’ instance setting in the Ultra task configuration. When slow performing endpoints are observed within the Pipeline, use the Pipeline Execute Snap with the Reuse mode enabled and the Pool Size field set to > 1 to create concurrency across multiple requests to the endpoint. Use the HTTP Router Snap to handle supported & unsupported HTTP methods implemented by the Pipeline. Handle errors that may occur during the execution of the Pipeline and return the appropriate HTTP status code within the API response. This can be done either by using the Mapper, JSON Formatter or the XML Formatter Snap. Reference request query parameters using the $query object. Set the ‘Maximum Failures’ Ultra Task configuration setting according to the desired tolerance for failure. For long running Ultra Pipelines, set a higher ‘Max In-Flight’ setting within the Ultra Task configuration. Triggered Tasks Definition and Characteristics Triggered Tasks offer the method of invoking a Pipeline using an API endpoint when the consumption pattern of the API is infrequent and/or does not require low latency response times. Use Cases When a batch operation is required within the Pipeline, e.g. Join, Group By, Sort etc. Integrations that need to be initiated on-demand. Non-real time data ingestion. File ingestion and processing. Bulk data export APIs. Best Practices Avoid deep nesting of large child Pipelines. Use Snaplex URL to execute Triggered Tasks for reduced latency response times. Handle errors that may occur during the execution of the Pipeline and return the appropriate HTTP status code within the API response. This can be done either by using the Mapper, JSON Formatter or the XML Formatter Snap. Use the HTTP Router snap to handle supported & unsupported HTTP methods implemented by the Pipeline. Parallelize large data loads using the “Pipeline Execute” Snap with Pool Size > 111KViews5likes0CommentsSnapLogic REST API Design - Best Practices
This document provides a guide on best practices for REST API design within the SnapLogic Platform. The REST API design guidelines are a collection of API design patterns and principles that all API teams within an organisation should adhere to when developing APIs. The definition and implementation of API design guidelines are among the most influential drivers of an API strategy — fostering a consistent approach to the creation of an API platform across the enterprise. Author and contributors; Roberto Oliva and Chris Ward2.3KViews2likes0Comments