Sigma Framework for Operational Excellence

Review reference architecture documents, videos, and other assets

Recent Content

SIGMA 2.0 FRAMEWORK FOR OPERATIONAL EXCELLENCE
Sigma Framework for Operational Excellence Modern Digital Platforms such as SnapLogic empower enterprises to unlock the value held in existing systems and data, allowing new services and strategies to be achieved. To support these best practices, governance and a culture of innovation are needed. Enterprises are adopting or evolving the concept of a Centre of Excellence to foster and support Digital Transformation. This concept is far from new, but trends such as Agile, DevOps, API strategy, AI, and the rise of the Citizen Developer have also forced these Operating Models to change, or risk being considered outdated and relegated as not fit for purpose to support Digital Transformation. The Sigma Framework for Operational Excellence is a set of best practices distilled from the experience of SnapLogic architects — customers, partners, and employees. SnapLogic is dedicated to continuously increasing the value our technology delivers to customers over time. Meanwhile, SnapLogic customer architects require a formalized operational framework to guide them from an initial tactical project to true strategic value realization through adoption across multiple roles and departments. The Sigma Framework delivers a standardized, integrated, holistic, and opinionated set of cross-disciplinary best practices, ready for adoption by customers to support sophisticated enterprise-scale deployments. Different assets in this section are designed for the various phases of a customer journey, addressing different personas and stages of maturity. By adopting the Sigma Framework for Operational Excellence, users will be able to extract maximum value from their investment in SnapLogic — both the technology, and the time commitment to develop required skills and competence. Impact of Adopting SnapLogic Best Practices Enterprises that embrace SnapLogic best practices can accelerate their digital transformation journey. By adopting standardized governance, automation patterns, and integration approaches, organizations reduce technical debt and increase operational agility. This enables faster onboarding of new systems, reduced integration errors, and more efficient use of data across the enterprise. The result is not only improved return on investment but also stronger alignment between IT and business strategy, empowering teams to deliver innovation at scale. Scope of the Framework The Sigma 2.0 Framework for Operational Excellence is structured to reflect the way your enterprise works. We coordinate, collaborate, and consolidate expertise between lines of business, development teams, and your enterprise architecture group. Let us help you operationalize and scale data-driven, 360-degree innovation throughout your organization. The Sigma Framework for Operational Excellence encompasses several domains, including strategy, technology, process, and the culture of the users of the SnapLogic Platform. The aim of the Sigma Frameworks to facilitate the adoption of SnapLogic at scale across the customer organization to achieve the required positive business outcomes. API Strategy at the Heart of Digital Transformation An API-first approach is central to enabling modern digital strategies. APIs serve as the connective tissue between systems, applications, and partners, creating an ecosystem where data and capabilities can be shared seamlessly. By putting an API strategy at the core of their digital vision, enterprises can enable modular architectures that support agility, scalability, and reusability. Moreover, APIs play a foundational role in enabling AI adoption, as they provide structured and secure access to the vast datasets that fuel AI-driven insights and automation. With APIs as the backbone, enterprises can ensure their platforms are future-ready to support AI innovations and evolving customer expectations. Emergence of AI Best Practices As enterprises expand their use of AI, the need for clear best practices has become critical. AI adoption must balance innovation with responsibility, ensuring that models are accurate, ethical, and compliant with regulations. Best practices now emphasize transparency in AI decision-making, robust governance frameworks, and rigorous security controls to protect both enterprise and customer data. These guardrails not only reduce risks but also foster trust, creating an environment where AI can be scaled safely to unlock innovation. By embedding AI best practices alongside SnapLogic and API strategies, enterprises position themselves to innovate with confidence while safeguarding their most valuable assets. This will be a critical aspect of the Sigma framework. The Sigma Framework As a set of public best practices snapLogic, snapLogic is committed to accelerating the successful adoption of the snapLogic platform, go to https://community.snaplogic.com/category/architecture_center
GuyM
3 months ago Place Sigma Framework Library
83Views
0likes
0Comments
Automated Deployment (CICD) of SnapLogic assets with GitHub
Introduction This guide is a reference document for the deployment of SnapLogic assets to a GitHub repository. It also includes sample YAML code for a GitHub Actions workflow which can be used to automate the deployment of assets across orgs (Dev -> Stg / Stg -> Prod, etc.) This guide is targeted towards SnapLogic Environment Administrators (Org Administrators) and users who are responsible for the deployment of SnapLogic assets / Release management operations. Section B covers automated deployment with GitHub Actions, and Section A illustrates a manual deployment flow using the Manager interface. Author: Ram Bysani SnapLogic Enterprise Architecture team SnapLogic Git Integration Git Integration allows you to track, update, and manage versions of SnapLogic assets using the graphical interface or the public APIs. The following asset types can be tracked in a GitHub repository: Accounts Files Pipelines Tasks Git model A) Asset deployment across environments - an example The example in this document illustrates a sample deployment of SnapLogic assets from the Dev environment (org) to the Prod environment. A similar methodology can be adopted to deploy assets from Dev -> Stg -> Prod environments. The environments should be configured for Git integration with GitHub. Please refer to the steps in the documentation. Git Integration Git operations The assets in this example are tracked at a project space level, i.e. one Project Space in Dev is associated with a single branch in the GitHub repository. A single GitHub repository is used to maintain the branches for Dev, Stg, Prod, etc. Repository branches can also be deleted and re-created for specific deployment needs. New / Modified Assets in the Dev Org Project Space: Dev_Integration_Space with the below project folders having SnapLogic assets. Integration_Project_1, Integration_Project_2, share Prod Environment We have already defined an empty project space named Prod_GH_Integration in the Prod org. This step can also be done by using the SnapLogic public API. Project APIs. Define branches in the GitHub repository Create individual branches in the GitHub repository for the Dev and Prod project space assets. You can choose the main branch as the default branch while creating Dev_GH_Space. Choose the Dev_GH_Space branch as the source when creating the Prod_GH_Space branch. Each branch in the GitHub repository corresponds to a Project Space in SnapLogic. e.g.: Dev_GH_Space, Prod_GH_Space Commit Dev assets to GitHub Connect to the Dev (source) environment in the SnapLogic Manager interface, and navigate to the project space named Dev_GH_Integration_Space. Right click and select Git Repository Checkout. Choose the Git repository branch Dev_GH_Space. You can see that the Git status has changed to Tracked for all assets under the child projects. Note that some assets appear with status Untracked as these were already existing in the main branch. These assets would not be committed to the Git repository. Notice the tracking message with the branch name and commit id next to the project space name: Tracked with Git repository: byaniram/RB_Snaprepo/heads/Dev_GH_Space, commit: 9a22ac8 Connect to the GitHub repository and verify the commit status for the branch Dev_GH_Space. Create Pull Request in GitHub At this step, you would need to create a Pull Request in GitHub. Choose Prod_GH_Space as the base branch, and Dev_GH_Space as the compare branch, and create the Pull request. This action would merge the assets contained in the Dev_GH_Space branch into the Prod_GH_Space branch. Connect to the GitHub repository and verify the commit status for the branch Prod_GH_Space. The assets have now been committed to the Prod environment and are tracked in the GitHub repository under the branch - Prod_GH_Space. It is also possible to merge and pull from additional branch(es) into a single Prod_GH_Space if you have a need for it. You would need to repeat the Pull / Merge process as above with the base branch being Prod_GH_Space, and the compare branch being one of Dev_GH_Space, Dev_GH_Space_1, or Dev_GH_Space_2. Pulling / Committing assets into the Prod Org Connect to the Prod (target) environment in the SnapLogic Manager interface, and navigate to the project space named Prod_GH_Integration_Space. Right click and select Git Repository Checkout. Choose the Git repository branch Prod_GH_Space. Choose Git Pull to pull the assets into the Project space. The assets from the Dev_Integration_Space project space of the Dev environment are deployed to the Prod_Integration_Space project space of the Prod environment. Notice the tracking message with the branch name and commit id next to the project space name: Tracked with Git repository: byaniram/RB_Snaprepo/heads/Prod_GH_Space, commit: ce0c368 For subsequent deployments of changed assets, you would first do a Commit to Git for the project space in the SnapLogic Dev environment, followed by the above steps. Changed assets would be visible with a Git status of ‘Tracked, Modified locally’ in the SnapLogic Manager. B) Deployment Automation using a GitHub Actions Workflow Actions workflow YAML sample A GitHub Actions workflow can be used to automate the deployment of assets across SnapLogic environments (such as Dev to Stg, Stg to Prod, etc.). A workflow is a configurable automated process made up of one or more jobs. You must create a YAML file to define your workflow configuration. Here’s a complete YAML file for the Dev -> Prod deployment example described in Section A above. The complete YAML file is attached for your reference. Please create a new Workflow from the Actions tab, and paste the contents of the file in the workflow editor and commit changes. # Actions workflow for automated deployment of SnapLogic assets name: SnapLogic CICD Sample on: push: branches: - Dev_GH_Space # Uncomment the below line if you need to execute the workflow manually. # workflow_dispatch: jobs: pull_merge_branches: runs-on: ubuntu-latest steps: - name: Checkout repository uses: actions/checkout@v4 - name: Merge Dev to Prod uses: devmasx/merge-branch@master with: type: now from_branch: Dev_GH_Space target_branch: Prod_GH_Space github_token: ${{ secrets.ACTIONS_TOKEN }} - name: Checkout project assets to Prod project space run: | curl -s -X POST \${{vars.SNAP_URL}}/api/1/rest/public/project/pull/${{vars.SNAP_ORG}}/${{vars.PROJECT_SPACE}} \ -H "Content-Type:application/json" -H "Authorization:Basic ${{secrets.BASE64_TOKEN}}" \ -d '{"use_theirs":"true"}' Please refer to the GitHub documentation for information related to Workflow usage and syntax: GitHub Workflows Workflow syntax The following table provides clarification on certain aspects of the sample workflow for better understanding. Section Comments runs-on: ubuntu-latest runs-on defines the runner (type of machine) to use to run the job. ubuntu-latest specifies a GitHub hosted runner image. GitHub hosted runners uses: actions/checkout@v4 checkout is an action which is available in the GitHub marketplace. This action checks out the repository for use. v4 is the version number of the action. https://github.com/marketplace/actions/checkout uses: devmasx/merge-branch@master merge-branch is an action from the GitHub marketplace. This action runs a Git merge operation. https://github.com/marketplace/actions/merge-branch It also requires you to define a personal access token (classic) under Developer Settings -> Personal access tokens. Select both the repo and workflow checkboxes. curl -s -X POST \ ${{vars.SNAP_URL}}/api/1/rest/public/project/pull/${{vars.SNAP_ORG}}/${{vars.PROJECT_SPACE}} \ -H "Content-Type:application/json" -H "Authorization:Basic ${{secrets.BASE64_TOKEN}}" \ -d '{"use_theirs":"true"}' This is a CURL command that executes the SnapLogic public API to pull the latest project files from Git. See Pull the latest project files from Git. The referenced variables are defined on the GitHub repository under Settings -> Secrets and variables -> Actions. The vars context is used to reference those variables. (e.g. SNAP_ORG, PROJECT_SPACE) You can also define encrypted Secrets for sensitive data and reference them using the secrets context as in the example. (e.g. BASE64_TOKEN has the base64 encoded string for username and password). Workflow Variables Table 1.0 - Workflow Actions Workflow execution The above Actions workflow will be automatically executed whenever there is a “Push” / “Git Commit” operation to the Dev_GH_Space branch. i.e. whenever a commit is done from the Dev SnapLogic environment project space. The workflow will execute the pull-merge operation to the Prod_GH_Space branch, and pull the latest project assets into the Prod SnapLogic environment. The YAML file must be created under the .github/workflows folder of the Dev_GH_Space branch in the GitHub repository. The workflow run status will be visible under the Actions tab. Note: If you wish to manually execute the pull-merge post code review, then you can uncomment the two lines in the script to enable workflow_dispatch, and execute the Actions workflow manually from the Actions tab on GitHub. # Uncomment the below line if you need to execute the workflow manually. # workflow_dispatch: You can edit and modify the YAML file as per your requirements. Subsequent commits and deployments from Dev->Prod can be automated similarly. Action Comments Developer commits new code or updates assets in the Dev org to the GitHub repository SnapLogic Dev org Manager Interface Asset -> Add to repository. Ensure status shows Tracked Project Space -> Commit to Git Create and merge Pull Request Create a new Pull Request on GitHub, and merge the newly committed assets by choosing the Prod branch as the base, and the Dev branch as the compare branch. Pull the updated assets into the Prod org SnapLogic Prod org Manager Interface Project Space -> Git Pull Table 2.0 - Steps for subsequent / future asset deployment Deployment flow (Dev->Test->Prod) Note: Future versions of this document will cover additional deployment scenarios. Please post your comments on the article.
ramaonline
4 months ago Place Sigma Framework Library
Actions workflow
CICD
collaborative development
Continuous Integration
Deployment
9.3KViews
3likes
8Comments
Snaplex Capacity Tuning Guide
Introduction This document serves as a comprehensive best practice guide for developing efficient and robust Pipelines within the SnapLogic Platform. It offers guidelines that aim to optimize performance, enhance maintainability, reusability, and provide a basis for understanding common integration scenarios and how best to approach them. The best practices encompass various aspects of Pipeline design, including Pipeline behavior, performance optimization and governance guidelines. By adhering to these best practices, SnapLogic developers can create high-quality Pipelines that yield optimal results while promoting maintainability and reuse. The content within this document is intended for the SnapLogic Developer Community or an Architect, in addition to any individuals who may have an influence on the design, development or deployment of Pipelines within the SnapLogic platform. Authors: SnapLogic Enterprise Architecture team Snaplex Planning Snaplexes are a grouping of co-located nodes which are treated as a single logical entity for the purpose of Pipeline execution. The SnapLogic Control plane automatically performs load balancing of Pipeline workload within a Snaplex. Nodes in Snaplexes should be homogeneous, with the same CPU/memory/disk sizing and network configurations per node type (i.e. JCC / FeedMaster). The JCC and Feedmaster nodes in a Snaplex can be of different sizes. Examples of recommended configurations: Snaplex configurations JCC node count - 4 JCC node size for each node - Large Feedmaster node count - 2 Feedmaster node size for each node - Medium JCC node count - 4 JCC node size for each node - X-Large Feedmaster node count - 2 Feedmaster node size for each node - Large Object Definition Node A Node is a JVM (Java Virtual Machine) process which is installed on a server such as Windows or Linux. JCC Node The JCC node is responsible for: Preparation, validation, and execution of Pipelines. Send heartbeat to the Snaplogic Control plane indicating the health of the node. FeedMaster Node The FeedMaster node acts as an interface between the JCC nodes and the client. The main functions of a FeedMaster node are: Manage message queues. Send heartbeat to the SnapLogic Control plane indicating the health of the node. When setting up Snaplexes, it is recommended to plan out the number of Snaplexes to configure along with the usage criteria to achieve isolation across workloads. Snaplexes can be organized in various ways such as: Pipeline Workload - Organize Snaplexes by workload type: Batch, Low latency, and On-demand. Business Unit - Organize Snaplexes by business units. Geographical location - Organize Snaplexes by data center or geographic location. The recommendation is to use a combination of the above to optimize resource usage and achieve workload isolation. Snaplex Network Requirements Snaplexes should have the below network characteristics: Within a Snaplex: Less than 10 ms round trip latency between Snaplex nodes. Greater than 40 MB/sec throughput between Snaplex nodes. Snaplex to Control Plane: Less than 50 ms round trip latency to the SnapLogic Control plane. Greater than 20 MB/sec throughput to the SnapLogic Control plane. Pipeline Execute Pipeline execution using the Pipeline Execute Snap, nodes communicate with each other using HTTPS on port 8081. There is some resiliency to network failures and HTTPS requests are retried in the case of failures. Even though requests are retried, high network latency and dropped connections can result in Pipeline execution failures. Regular Pipeline executions run within a node, requiring no communication with other nodes in the Snaplex. When a Pipeline Execute Snap is used to run child Pipelines, there are three options: Option Comments LOCAL_NODE This option is recommended when the child Pipeline is being used for Pipeline structuring and reuse rather than Pipeline workload distribution. Use this option for most regular child Pipeline executions. LOCAL_SNAPLEX The network communication is optimized for streaming data processing since the child Pipeline is on the local Snaplex. Use this option only when workload distribution within the Snaplex is required. SNAPLEX_WITH_PATH This has high dependency on the network. The network communication is optimized for batch data processing since the child Pipeline is on a remote Snaplex. Use this option only when the child Pipeline has to run on a different Snaplex, either because of endpoint connectivity restrictions or for workload distribution. Ultra Pipelines The JCC nodes communicate with the FeedMaster nodes over TCP with SSL on port 8084 when executing Ultra Pipelines. The communication between nodes is based on a message queue. This communication is not resilient to network failure, so a reliable network is required between the Snaplex nodes for Ultra Pipeline processing. In case of any network failures, the currently processing Ultra requests will be retried or in some instances fail with errors. If there is a communication failure between the JCC and Feedmaster nodes, then the request will be retried for up to five times. This is controlled by the ultra.max_redelivery_count Snaplex configuration. There is an overall 15-minute timeout for an Ultra request to the Feedmaster that is configurable at the request level using the X-SL-RequestTimeout HTTP request header or at the Snaplex level by using the llfeed.request_timeout config setting. Note that both ultra.max_redelivery_count and llfeed.request_timeout are configured under Node Properties -> Global Properties for GroundPlexes. You can submit a support request to configure these properties for your Cloudplexes. Pipeline Load Balancing The Control plane performs load balancing for Pipeline execution requests on a Snaplex. The following table lists the configurations that are involved: Property / Threshold Where configured Default value Comments Maximum Slots Node properties tab of the Snaplex 4000 One slot = One Snap = One active thread on the node A percentage of slots (configurable with the Reserved slot % property) are reserved for interactive Pipeline executions and validations thru the Designer tool. Pipelines will be queued if the threshold is reached. Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output, can use a higher number of threads compared to other Snaps. Maximum memory % Node properties tab of the Snaplex 85 (%) Threshold at which no more Pipelines will be assigned to a node Snaplex node resources (CPU, FDs, Memory) Node server configurations Configurable If the Control plane detects that there are not enough resources available on the Snaplex, then the Pipeline execution requests will be queued up on the control plane, and resume when resources are available. The Control plane dispatches the Pipeline to the node which has the most available capacity in terms of CPU/memory and file descriptors. For child Pipeline executions using the Pipeline Execute Snap, there is a preference given for running the child on the local node to avoid the network transfer penalty. Table 1.0 Configurations for Pipeline load balancing Snaplex Resource Management Capacity Planning This section provides some guidelines for Snaplex capacity planning and tuning. Configuration / Use-case Comments Workload isolation Isolate workloads across Snaplexes based on workload type, geographic location, and business unit. Node sizing Size the node (CPU, RAM, disk space) in a Snaplex based on Pipeline workload type. Batch data processing needs larger nodes while Streaming/API processing can use smaller nodes. Maximum Slots One slot = One Snap = One active thread on the node A percentage of slots (configurable with the Reserved slot % property) are reserved for interactive Pipeline executions and validations thru the Designer tool. Pipelines will be queued if the threshold is reached. Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output, can use a higher number of threads compared to other Snaps. The general recommendation is to configure this property based on the node memory configuration. Example: 8 GB - 2000 Slots 16 GB - 4000 Slots API Workloads For API workloads, the rule of thumb is to have 100 active ultra API calls per 8 GB of RAM, or 20 active triggered API calls per 8 GB of RAM. So a 16 GB node can have 200 active ultra API calls or 40 active triggered API calls. Node sizing The number of nodes in a Snaplex can be estimated based on the count of batch and streaming Pipelines. The number of FeedMaster nodes can be half of the JCC node count, with a minimum of two recommended for high availability. For active Pipeline count estimates, error Pipelines can be excluded from the count since they do not consume resources under the normal workload. Table 1.1 Configurations for Snaplex capacity planning Capacity Tuning Below are some best practices for Snaplex capacity tuning: Configuration / Use-case Comments Slot counts The Maximum slot count can be tuned based on the alerts and dashboard events. It is not required to restart the nodes for this configuration to take effect. Queued Pipelines - Increase slot count by 25% Busy nodes - Reduce slot count by 25% The slot count should not be set to more than 50% above the recommended value for the node configuration. e.g. The recommended slot count on a node with 16 GB RAM is 4000. Setting it to higher than 6000 is not advisable.If you observe high CPU / memory consumption on the node despite lowering the slot count by 25%, then consider allocating additional resources to the Snaplex nodes. Workloads Batch Workloads: Expand the node memory up to 64 GB, and deploy additional nodes for increased capacity. API Workloads: Deploy additional nodes instead of expanding the memory on the current node. Active Pipelines As a general rule, it's suggested to maintain fewer than 500 active Pipeline instances on a single node. Exceeding this threshold can lead to communication bottlenecks with the Control plane. If the number of active Pipeline instances exceeds 500, then the advisable course of action is to consider the addition of more nodes. CPU CPU consumption can be optimized by setting the Pool size and Batch size options on Pipeline Execute Snaps. Memory See Table 3.0 below Additional Reference: Optimizations for Swap Memory Table 2.0 Configurations for Snaplex capacity tuning Memory Configuration thresholds Property / Threshold Where configured Default value Comments Maximum memory % Node properties tab of the Snaplex 85 (%) Threshold at which no more Pipelines will be assigned to a node Pipeline termination threshold Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common.SnapThreadStatsPoller. MEMORY_HIGH_WATERMARK_PERCENT) 95 (%) Threshold at which the active Pipeline management feature kicks in and terminates pipelines when the node memory consumption exceeds the threshold. Ideal range: 75-99 Pipeline restart delay interval Internal (Can be configured by setting the feature flag at the org level com.snaplogic.cc.snap.common.SnapThreadStatsPoller. PIPELINE_RESTART_DELAY_SECS) 30 (seconds) One Pipeline is terminated every 30 seconds until the node memory goes below the threshold (i.e. goes below 95%) Table 3.0 Snaplex node memory configurations The above thresholds can be optimized to minimize Pipeline terminations due Out-of-Memory exceptions. Note that the memory thresholds are based on the Physical memory on the node, and not the Virtual / Swap memory. Snaplex Alerts SnapLogic supports alerts and notifications through email and Slack channels. These can be configured in the Manager interface under Settings. The recommended alerts are listed in the table below. Alert type Comments Snaplex status alerts Status alerts can be created at the org level or the Snaplex level (in the Snaplex properties). These allow notifications to be sent when the Snaplex node is unable to communicate with the SnapLogic control plane or there are other issues detected with the Snaplex. Snaplex Resource usage alerts Set up alerts for these event types: Snaplex congestion Snaplex load average Snaplex node memory usage Snaplex node disk usage Table 4.0 Recommended Snaplex Alerts Reference: Alerts Slack notifications
ramaonline
4 months ago Place Sigma Framework Library
Admin and Operation
Sigma
11KViews
3likes
5Comments
Platform Administration Reference guide v3
Introduction This document is a reference manual for common administrative and management tasks on the SnapLogic platform. It has been revised to include the new Admin Manager and Monitor functionality, which replace the Classic Manager and Dashboard interfaces respectively. This document is for SnapLogic Environment Administrators (Org Administrators) and users involved in supporting or managing the platform components. Author: Ram Bysani SnapLogic Enterprise Architecture team Environment Administrator (known as Org Admin in the Classic Manager) permissions There are two reserved groups in SnapLogic: admins: Users in this group have full access to all projects in the Org. members: Users in this group have access to projects that they create, or to which they are granted access. Users are automatically added to this group when you create them, and they must be a part of the members group to have any privileges within that Org. There are two user roles: Environment admins: Org users who can manage the Org. Environment admins are part of the admins group, and this role is named “Org Admin” in the classic Manager. Basic user: All non-admin users. Within an Org, basic users can create projects and work with assets in the Project spaces to which they have been granted permission. To gain Org administrator privileges, a Basic user can be added to the admins group. The below table lists the various tasks under the different categories that an Environment admin user can perform: Task Comments USER MANAGEMENT Create and delete users. Update user profiles. Create and delete groups. Add users to a group. Configure password expiration policies. Enable users’ access to applications (AutoSync, IIP) When a user is removed from an Org, the administrator that removes the user becomes the owner of that user's assets. Reference: User Management MANAGER Create and manage Project Spaces. Update permissions (R, W, X) on an individual Project space and projects. Delete a Project space. Restore Project spaces, projects, and assets from the Recycle bin. Permanently delete Project spaces, projects, and assets from the Recycle bin. Configure Git integration and integration with tools such as Azure Repos, GitLab, and GHES. View Account Statistics, and generate reports for accounts, projects, and pipelines within the project that use an account. Upgrade/downgrade Snap Pack versions. ALERTS and NOTIFICATIONS Set up alerts and notifications. Set up Slack channels and recipients for notifications. Reference: Alerts SNAPLEX and ORG Create Groundplexes. Manage Snaplex versions. Update Snaplex settings. Update or revert a Snaplex version. APIM Publish, unpublish, and deprecate APIs on the Developer portal. Configure the Developer portal. Approve API subscriptions and manage/approve user accounts. Reference: API Management AutoSync Configure AutoSync user permissions. Configure connections for data pipeline endpoints. Create user groups to share connection configuration. View information on all data pipelines in the Org. Reference: AutoSync Administration Table 1.0 Org Admin Tasks SnapLogic Monitoring Dashboards The enhanced Monitor interface can be launched from the Apps (Waffle) menu located on the top right corner of the page. The enhanced Monitor Interface enables you to observe integration executions, activities, events, and infrastructure health in your SnapLogic environment. The Monitor pages are categorized under three main groups: Analyze Observe Review Reference: Move_from_Dashboard_to_Monitor The following table lists some common administrative and monitoring tasks for which the Monitor interface can be used. Task Monitor App page Integration Catalog to fetch and display metadata for all integrations in the environment. Monitor -> Analyze -> Integration Catalog Reference: Integration Catalog View of the environment over a time period. Monitor -> Analyze -> Insights Reference: Insights View pipeline and task executions along with statistics, logs, and other details. Stop executions. Download execution details. Monitor -> Analyze -> Execution Reference: Execution Monitor and manage Snaplex services and nodes with graph views for a time period. Monitor -> Analyze -> Infrastructure Reference: Infrastructure View and download metrics for Snaplex nodes for a time period. Monitor -> Analyze -> Metrics Monitor -> Observe -> API Metrics Reference: Metrics, API-Metrics Review Alert history and Activity logs. Monitor -> Review Reference: Alert History, Activity Log Troubleshooting Snaplex / Node / Pipeline issues. Reference: Troubleshooting Table 2.0 Monitor App features Metrics for monitoring CPU Consumption CPU consumption can be high (and exceed 90% at times) when pipelines are executing. A high CPU consumption percentage when no pipelines are executing could indicate a high CPU usage by other processes on the Snaplex node. Review CPU Metrics under the Monitor -> Metrics, and Monitor -> Infrastructure tabs. Reference: CPU utilization metrics System load average (For Unix based systems) Load average is a measure of the number of processes that are either actively running on the CPU or waiting in line to be processed by the CPU. e.g. in a system with 4 virtual CPUs: A load average value of 4.0 means average full use of all CPUs without any idle time or queue. A load average value of >4.0 suggests that processes are waiting for CPU time. A load average value of <4.0 indicates underutilization. System load. Monitor -> Metrics tab. Heap Memory Heap memory is used by the SnapLogic application to dynamically allocate memory at runtime to perform memory intensive operations. The JVM can crash with an Out-of-Memory exception if the heap memory limit is reached. High heap memory usage can also impact other application functions such as pipeline execution, metrics collection, etc. The key heap metrics are listed in the table below: Metric Comments Heap Size Amount of heap memory reserved by the OS This value can grow or shrink depending on usage. Used heap Portion of heap memory in use by the application’s Java objects This value changes constantly with usage. Max heap size Upper heap memory limit This value is constant and does not change. It can be configured by setting the jcc.heap.max_size property in the global.properties file or as a node property. Heap memory. Monitor -> Metrics tab. Non-heap memory consumption The JVM reserves additional native memory that is not part of the heap memory. This memory area is called Metaspace, and is used to store class metadata. Metaspace can grow dynamically based on the application’s needs. Non-heap memory metrics are similar to heap memory metrics however there is no limit on the size of the non-heap memory. In a Snaplex, non-heap size tends to stay somewhat flat or grow slowly over longer periods of time. Non-heap size values larger than 1 GiB should be investigated with help from SnapLogic support. Note that all memory values are displayed in GiB (Gibibytes). Non-Heap memory. Monitor -> Analyze -> Metrics (Node) Swap memory Swap memory or swap space is a portion of disk used by the operating system to extend the virtual memory beyond the physical RAM. This allows multiple processes to share the computer’s memory by “swapping out” some of the RAM used by less active processes to the disk, making more RAM available for the more active processes. Swap space is entirely managed by the operating system, and not by individual processes such as the SnapLogic Snaplex. Note that swap space is not “extra” memory that can compensate for low heap memory. Refer to this document for information about auto, and custom heap settings. Reference: Custom heap setting. High swap utilization is an indicator of contention between processes, and may suggest a need for higher RAM. Additional Metrics Select the node from Monitor -> Analyze, and navigate to the Metrics tab. Review the following metrics. Active Pipelines Monitor the Average and Max active pipeline counts for specific time periods. Consider adding nodes for load balancing and platform stability if these counts are consistently high. Active Pipelines. Monitor -> Analyze -> Metrics (Node) Active Threads Active threads. Monitor -> Analyze -> Metrics (Node) Every Snap in an active pipeline consumes at least one thread. Some Snaps such as Pipeline Execute, Bulk loaders, and Snaps performing input/output can use a higher number of threads compared to other Snaps. Refer to this Sigma document on community.snaplogic.com: Snaplex Capacity Tuning Guide for additional configuration details. Disk Utilization It is important to monitor disk utilization as the lack of free disk space can lead to blocking threads, and can potentially impact essential Snaplex functions such as heartbeats to the Control Plane. Disk utilization. Monitor -> Analyze -> Metrics (Node) Additional Reference: Analyze Metrics. Download data in csv format for the individual Metrics graphs. Enabling Notifications for Snaplex node events Event Notifications can be created on the Manager (Currently in the Classic Manager) under Settings -> Notifications. The notification rule can be set up to send an alert about a tracked event to multiple email addresses. The alerts can also be viewed on the Manager under the Alerts tab. Reference: Notification Events Snaplex Node notifications Telemetry Integration with third-party observability tools using OpenTelemetry (OTEL) The SnapLogic platform uses OpenTelemetry (OTEL) to support telemetry data integration with third-party observability tools. Please contact your CSM to enable the Open Telemetry feature. Reference: Open Telemetry Integration Node diagnostics details The Node diagnostics table includes diagnostic data that can be useful for troubleshooting. For configurable settings, the table displays the Maximum, Minimum, Recommended, and Current values in GiB (Gibibytes) where applicable. The values in red indicate settings outside of the recommended range. Navigate to the Monitor -> infrastructure -> (Node) -> Additional Details tab. Example: Node diagnostics table Identifying pipelines that contribute to a node crash / termination Monitor Page Comments Monitor -> Activity logs Filter by category = Snaplex. Make note of the node crash events for a specific time period Event name text: Node crash event is reported Reference: Activity Logs Monitor -> Execution Select the execution window in the Calendar. Filter executions by setting these Filter conditions: Status: Failed Node name: <Enter node name from the crash event> Reference: Execution Sort on the Documents column to identify the pipeline executions processing the most number of documents. Click anywhere on the row to view the execution statistics. You can also view the active pipelines for that time period from the Monitor -> Metrics -> Active pipelines view. Table 3.0 Pipeline execution review Additional configurations to mitigate pipeline terminations The below thresholds can be optimized to minimize pipeline terminations due to Out-of-Memory exceptions. Note that the memory thresholds are based on the physical memory on the node, and not the Virtual / Swap memory. Maximum Memory % Pipeline termination threshold Pipeline restart delay interval Refer to the table Table 3.0 Snaplex node memory configurations in this Sigma document for additional details and recommended values: Snaplex Capacity Tuning Pipeline Quality Check API The Linter public API for pipeline quality provides additional rules to provide complete reports for all standard checks, including message levels (Critical / Warning / Info), with actionable message descriptions for pipeline quality. Reference: Pipeline Quality Check By applying the quality checks, it is possible to optimize pipelines, and improve maintainability. You can also use SnapGPT to analyze pipelines, identify issues, and suggest best practices to improve your pipelines. (SnapGPT_Analyze_Pipelines) Other third party profiling tools Third party profiling tools such as VisualVM can be used to monitor local memory, CPU, and other metrics. This document will be updated in a later version to include the VisualVM configurations for the SnapLogic application running on a Groundplex. Java Component Container (jcc) command line utility (for Groundplexes) The jcc script is a command-line tool that provides a set of commands to manage the Snaplex nodes. This utility is installed in the /opt/snaplogic/bin directory of the Groundplex node. The below table lists the commonly used arguments for the jcc script (jcc.sh on Linux and jcc.bat on Windows). Note that the command would list other arguments (for example, try-restart). However, those are mainly included for backward compatibility and not frequently used. $SNAPLOGIC refers to the /opt/snaplogic directory on Linux or the <Windows drive>:\opt\snaplogic directory on Windows servers. Run these commands as the root user on Linux and as an Administrator on Windows. Example: sudo /opt/snaplogic/bin/jcc.sh restart or c:\snaplogic\bin\jcc.bat restart Argument Description Comments status Returns the Snaplex status. The response string would indicate if the Snaplex Java process is running. start Starts the Snaplex process on the node. stop Stops the Snaplex process on the node. restart Stops and restarts the Snaplex process on the node. Restarts both the monitor and the Snaplex processes. diagnostic Generates the diagnostic report for the Snaplex node. The HTML output file is generated in the $SNAPLOGIC/run/log directory. Resolve any warnings from the report to ensure normal operations. clearcache Clears the cache files from the node. This command must be executed when the JCC is stopped. addDataKey Generates a new key pair and appends it to the keystore in the /etc/snaplogic folder with the specified alias. This command is used to rotate the private keys for Enhanced Account Encryption. Doc reference: Enhanced Account Encryption The following options are available for a Groundplex on Windows server. install_service remove_service The jcc.bat install_service command installs the Snaplex as a Windows service. The jcc.bat remove_service command removes the installed Windows service. Run these commands as an Administrator user. Table 4.0 jcc script arguments Example of custom log configuration for a Snaplex node (Groundplex) Custom log file configuration is occasionally required due to internal logging specifications or to troubleshoot problems with specific Snaps. In the following example, we illustrate the steps to configure the log level of ‘Debug’ for the Azure SQL Snap pack. The log level can be customized for each node of the Groundplex where the related pipelines are executed, and will be effective for all pipelines that use any of the Azure SQL Snaps (for example, Azure SQL - Execute, Azure SQL - Update, etc.). Note that Debug logging can affect pipeline performance so this configuration must only be used for debugging purposes. Configuration Steps Follow steps 1 and 2 from this document: Custom log configuration Note: You can perform Step 2 by adding the property key and value under the Global Properties section. Example: Key: jcc.jvm_options Value: -Dlog4j.configurationFile=/opt/snaplogic/logconfig/log4j2-jcc.xml The Snaplex node must be restarted for the change to take effect. Refer to the commands in Table 3.0. b. Edit the log4j2-jcc.xml file configured in Step a. c. Add a new RollingRandomAccessFile element under <Appenders>. In this example, the element is referenced with a unique name JCC_AZURE. It also has a log size and rollover policy defined. The policy would enable generation of up to 10 log files of 1 MB each. These values can be adjusted depending on your requirements. <RollingRandomAccessFile name="JCC_AZURE" fileName="${env:SL_ROOT}/run/log/${sys:log.file_prefix}jcc_azure.json" immediateFlush="true" append="true" filePattern="${env:SL_ROOT}/run/log/jcc_azure-log-%d{yyyy-MM-dd-HH-mm}.json” ignoreExceptions="false"> <JsonLogLayout properties="true"/> <Policies> <SizeBasedTriggeringPolicy size="1 MB"/> </Policies> <DefaultRolloverStrategy max="10"/> </RollingRandomAccessFile> … … </Appenders> d. The next step is to configure a Logger that references the Appender defined in step #c. This is done by adding a new <Logger> element. In this example, the Logger is defined with log level = Debug. <Logger name="com.snaplogic.snaps.azuresql" level="debug" includeLocation="true" additivity="false"> <AppenderRef ref="JCC_AZURE" /> </Logger> .. .. <Root> … </Root </Loggers> </Configuration> The value for the name attribute is derived from the Class FQID value of the associated Snap. The changes to log4j2-jcc.xml are marked by the highlighted text in steps c and d. The complete XML file is also attached for reference. You can refer to the Log4j documentation for more details on the attributes or for additional customization. Log4j reference Debug log messages and log files Additional debug log messages will be printed to the pipeline execution logs for any pipeline with Azure SQL Snaps. These logs can be retrieved from Dashboard. Example: {"ts": "2023-11-30T20:21:33.490Z", "lvl": "DEBUG", "fi": "JdbcDataSourceRegistryImpl.java:369", "msg": "JDBC URL: jdbc:sqlserver://sltapdb.database.windows.net:1433;database=SL.TAP;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;authentication=sqlPassword;loginTimeout=30;connectRetryCount=3;connectRetryInterval=5;applicationName=SnapLogic (main23721) - pid-113e3955-1969-4541-9c9c-e3e0c897cccd, database server: Microsoft SQL Server(12.00.2531), driver: Microsoft JDBC Driver 11.2 for SQL Server(11.2.0.0)", "snlb": "Azure+SQL+-+Update", "snrd": "5c06e157-81c7-497f-babb-edc7274fa4f6", "plrd": "5410a1bdc8c71346894494a2_f319696c-6053-46af-9251-b50a8a874ff9", "prc": "Azure SQL - The updated log configuration would also write the custom JCC logs (for all pipelines that have executed the Azure SQL Snaps) to disk under the /opt/snaplogic/run/log directory. The file size for each log file and the number of files would depend on the configuration in the log4j2-jcc.xml file. The changes to log4j2-jcc.xml can be reverted if the additional custom logging is no longer required. Log level configuration for a Snaplex in Production Orgs The default log level for a new Snaplex is ‘Debug.’ This value can be updated to ‘Info’ in Production Orgs as a best practice. The available values are: Trace: Records details of all events associated with the Snaplex. Debug: Records all events associated with the Snaplex. Info: Records messages that outline the status of the Snaplex and the completed Tasks. Warning: Records all warning messages associated with the Snaplex. Error: Records all error messages associated with the Snaplex. Reference: Snaplex logging PlexFS File Storage considerations PlexFS also known as suggest space is a storage location on the local disk of the JCC node. The /opt/snaplogic/run/fs folder is commonly designated for this purpose. It is used as a data store to temporarily store preview data during pipeline validation, as well as to maintain the state data for Resumable pipelines. Disk volumes To address issues that cause disk full errors and to ensure smoother operations of the systems that affect the stability of the Groundplex, you need to have separate mounts on Groundplex nodes. Follow the steps suggested below to create two separate disk volumes on the JCC nodes. Reference: Disk Volumes The /opt/snaplogic/run/fs folder location is used for the PlexFS operations. mount --bind /workspace/fs /opt/snaplogic/run/fs Folder Structure: The folders under PlexFS are created with this path structure: /opt/snaplogic/run/fs/<Environment>/<ProjectSpace>/<Project>/__suggest__/<Asset_ID> Example: /opt/snaplogic/run/fs/Org1/Proj_Space_1/Project1/__suggest__/aaa5010bc The files in the sub-folders are created with these extensions: *.jsonl *.dat PlexFS File Creation The files in /opt/snaplogic/run/fs are generated when a user performs pipeline validation. The amount of data in a .dat file is based on the “Preview Document Count” user setting. For Snaps with binary output (such as File Reader), the Snap will stop writing to PlexFS when the next downstream Snap has generated its limit of Preview data. PlexFS File Deletion The files for a specific pipeline are deleted when the user clicks ‘Retry’ to perform validation. New data files are generated. Files for a specific user session are deleted when the user logs out of SnapLogic. All PlexFS files are deleted when the Snaplex is restarted. Files in PlexFS are generated with an expiration date. The default expiration date is two days. The files are cleaned up periodically based on the expiration date. It is possible to set a feature flag to override the expiration time, and delete the files sooner. Recommendations The temp files are cleaned up periodically based on the default expiration date however you might occasionally encounter disk space availability issues due to excessive Preview data being written to the PlexFS file storage. The mount directory location can be configured with additional disk space or shared file storage (e.g. Amazon EFS). Contact SnapLogic support for details on the feature flag configuration to update the expiration time to a shorter duration for faster file clean up. The value for this feature flag is set in seconds.
ramaonline
6 months ago Place Sigma Framework Library
Administration
Dashboard
file system
Logging
Monitor
5.9KViews
4likes
0Comments
Recipes for Success with SnapLogic’s GenAI App Builder: From Integration to Automation
16 MIN READ For this episode of the Enterprise Alchemists podcast, Guy and Dominic invited Aaron Kesler and Roger Sramkoski to join them to discuss why SnapLogic's GenAI App Builder is the key to success with AI projects. Aaron is the Senior Product Manager for all things AI at SnapLogic, and Roger is a Senior Technical Product Marketing Manager focused on AI. We kept things concrete, discussing real-world results that early adopters have already been able to deliver by using SnapLogic's integration capabilities to power their new AI-driven experiences.
Dominic
7 months ago Place Enterprise Alchemists
AI
Enterprise Architects
GenAI
Generative AI
Podcast
2.3KViews
4likes
2Comments
Best Practices for Adopting AI Solutions in the Enterprise with SnapLogic AgentCreator
Best Practices for Adopting AI Solutions in the Enterprise with SnapLogic AgentCreator Version: 1.2 Authors: Dominic Wellington, Guy Murphy, Pat Traynor, Bash Badawi, Ram Bysani, David Dellsperger, Aaron Kesler Introduction: AI in the Modern Enterprise AI is fast becoming a cornerstone of modern enterprises, transforming how businesses operate, make decisions, and interact with customers. Its capabilities, such as automation, predictive analytics, and natural language processing, allow companies to streamline processes, gain deeper insights from data, and enhance customer experiences. From optimizing supply chains to personalizing marketing strategies, AI is enabling enterprises to innovate, drive efficiency, and be competitive in an increasingly data-driven world. As AI continues to evolve, its role in shaping business strategy and operations will only grow. Precisely because of its novelty and importance, leaders will need to think carefully about various aspects of how these powerful new capabilities can be deployed in a manner that is compliant with existing legislation and regulation, and how best to integrate them with existing systems and processes. There are no generally-accepted best practices in this field yet due to its novelty, but there are lessons that we can learn from past waves of technological change and adoption. In this document we set out some suggestions for how to think about these topics in order to ensure a positive outcome. Data Data is the lifeblood of IT — arguably the reason for the field’s entire existence — but its importance is only magnified when it comes to AI. Securing access to data is a requirement for an AI project to get off the ground in the first place, but managing that access over time, especially as both the data and the policies that apply to it change and evolve over time. Data Security and Management Data security has always been a complex issue for IT organizations. When considered from the perspective of AI adoption, two main areas need to be considered: external and internal usage. Externally-hosted Large Language Models (LLMs) offer powerful and rapidly-evolving capabilities, but also have inherent risks as they are operated by third parties. The second area of focus is how and what internal data should be used with AI models, whether self-managed or externally operated. External Security Organizations have reasonable concerns about their proprietary, regulated, or otherwise sensitive information “leaking” beyond the organizational boundaries. For this reason, simply sending internal information to a public LLM without any controls in place is not considered a viable solution. An approach to this problem that was previously considered promising was to use a technique called Retrieval Augmented Generation, or RAG. In this approach, rather than passing user queries directly to an LLM for answers, a specialized data store is deployed, called a vector database. When a user query is received, the vector data store is consulted first to identify relevant chunks of information with which to answer the query, and only after this step is the LLM used to provide the conversational response back to the user. However, while RAG does limit the potential for information leakage, it does not reduce it to zero. The vector database can be operated according to the organization’s own risk profile: fully in-house, as a private cloud instance, or leveraging a shared platform, depending on the information it contains and the policies or regulations that apply to that information. However, a chunk of information will be sent from the vector store to the LLM to answer each query, and over time, this process can be expected to expose a substantial part of the knowledge base to the LLM. It is also important to be aware that the chunking process itself still uses a LLM. More security-sensitive organizations or those operating in regulated industries may choose to leverage a more restricted deployment model for the LLM as well, much as discussed for the vector database itself, in order to avoid this leakage. However, it is worth noting that while an “open-source” language model can be prevented from contributing training data back to its developers, its own pre-existing training data may still leak out into the answers. The ultimate risk here is of “model poisoning” from open-source models. That is, injection of data from outside the user’s domain which may lead to inconsistent or undesirable responses. One example of this phenomenon is “context collapse”, which may occur in the case of overloaded acronyms, where the same acronym can represent vastly different concepts in different domains. A generalist model may mis-understand or mis-represent the acronym — or worse, may do so inconsistently. The only way to be entirely certain of data security and hygiene is to train the model from scratch — an undertaking that, due to its cost in both time and resources, is practical only for the largest organisations, and is anyway required only for the most sensitive data sets. A halfway house that is suitable for organisations that have concerns in this domain, but not to the point of being willing to engineer everything themselves from the ground up, is fine-tuning. In this approach, a pre-trained model is further trained on a specific data set. This is a form of transfer learning where a pre-trained model trained on a large dataset is adapted to work for a specific task. The dataset required for this sort of fine-tuning is very small compared to the dataset required for full model training, bringing this approach within reach of far more organisations. Internal Data Access Controls The data that is consumed by the AI model also needs to be secured inside the organization, ensuring that access controls on that data follow the data through the system at all levels. It is all too easy to focus on ingesting the data and forget about the metadata, such as role-based access controls. Instead, these controls should be maintained throughout the AI-enabled system. Any role-based access controls (RBAC) that are placed on the input data should also be reflected in the output data. Agentic approaches are useful here, as they give the opportunity to enforce such controls at various points. The baseline should be that, if a user ought not be able to access certain information through traditional means such as database queries or direct filesystem access, they also must not be able to access it by querying an AI overlay over those systems — and vice-versa, of course. Prompt Logging and Observability An emerging area of concern is the security of prompts used with AI models. Especially when using public or unmodified open-source models, the primary input to the models is the prompt that is passed to them. Even minor changes to that prompt can cause major differences in what is returned by the model. For this reason, baseline best practice is to ensure that prompts are backed up and versioned, just as would be done for more traditional program code. In addition, both prompts and their corresponding responses should be logged in order to be able to identify and troubleshoot issues such as performance changes or impact to pricing models of public LLMs. Some more detailed suggestions are available here. Prompts should also be secured against unauthorized modification, or “prompt injection”. Similarly to the analogous “SQL injection”, attackers may attempt to modify or replace the prompt before it is passed to the AI model, in order to produce outputs that are different from those expected and desired by users and operators of the system. The potential for damage increases further in the case of agentic systems that may chain multiple model prompts together, and potentially even take actions in response to those prompts. Again, logging for both in-the-moment observability and later audit is important here, including the actual final prompt that was sent to the model, especially when that has been assembled across multiple steps. These logs are useful for troubleshooting, but may also be formally required for demonstrating compliance with regulation or legislation. Example Prompt Injection Scenarios Direct Injection An attacker injects a prompt into a customer support chatbot, instructing it to ignore previous guidelines, query private data stores, and send emails, leading to unauthorized access and privilege escalation. Indirect Injection A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to insert an image linking to a URL, leading to exfiltration of the private conversation. Unintentional Injection A company includes an instruction in a job description to identify AI-generated applications. An applicant, unaware of this instruction, uses an LLM to optimize their resume, inadvertently triggering the AI detection. Intentional Model Influence An attacker modifies a document in a repository used by a Retrieval-Augmented Generation (RAG) application. When a user’s query returns the modified content, the malicious instructions alter the LLM’s output, generating misleading results. Code Injection An attacker exploits a vulnerability in an LLM-powered email assistant to inject malicious commands, allowing access to sensitive information and manipulation of email content. Payload Splitting An attacker uploads a resume with split malicious prompts. When an LLM is used to evaluate the candidate, the combined prompts manipulate the model’s response, resulting in a positive recommendation regardless of the resume’s actual contents. Multimodal Injection An attacker embeds a malicious prompt within an image that accompanies benign text. When a multimodal AI processes the image and text concurrently, the hidden prompt alters the model’s behavior, potentially leading to unauthorized actions or disclosure of sensitive information. Adversarial Suffix An attacker appends a seemingly meaningless string of characters to a prompt, which influences the LLM’s output in a malicious way, bypassing safety measures. Multilingual/Obfuscated Attack An attacker uses multiple languages or encodes malicious instructions (e.g., using Base64 or emojis) to evade filters and manipulate the LLM’s behavior. reference: https://genai.owasp.org/llmrisk/llm01-prompt-injection/ As these examples show, there are many patterns and return sets from LLMs that will need to be managed and observed, comparing prompts, responses, and data sets with certified sets and expected structures. Hopefully over time and as commercial LLMs mature, many of these issues will be managed by the LLMs themselves, but today these concerns will have to be part of the enterprise’s own governance framework for AI adoption. Data Ownership and Observability Much of the value of most Generative AI (GenAI) applications is based on the quantity, freshness and reliability of the source data that is provided. An otherwise fully-functional GenAI tool that provides responses based on incomplete or out-of-date data will not be useful or valuable to its users. The first question is simply how to gain access to useful source data, and to maintain that access in the future. This work spans both technical and policy aspects. SnapLogic of course makes technical connectivity easy, but there may still be questions of ownership and compliance, not to mention identifying where necessary data even resides. Beyond the initial setup of the AI-enabled system, it will be important to maintain ongoing access to up-to-date data. For instance, if a RAG approach is used, the vector data store will need to be refreshed periodically from the transactional data platform. The frequency of such updates will vary between use cases, depending on the nature of the data and its natural rate of change. For instance, a list of frequently asked questions, or FAQs, can be updated whenever a new entry is added to the list. Meanwhile, a data set that is updated in real time, such as airline operations, will need much more frequent synchronization if it is to remain useful. Recommendations Data is key to the success of AI-enabled systems – and not just one-time access to a dataset that is a point-in-time snapshot, but ongoing access to real-time data. Fortunately, these are not new concerns, and existing tools and techniques can be applied readily to securing and managing that flow of data. In fact, the prominence and urgency of AI projects can even facilitate the broad deployment of such tools and techniques, where they had previously been relegated to specialised domains of data and analytics. It is important to note that as the SnapLogic platform facilitates connectivity and movement of data, none of the patterns of such movement are used to enrich the learnings of a LLM. In other words, pipelines act to transport encrypted data from source to destination without any discernment of the actual payload. No payload data and no business logic governing the movement of data are ever gleaned from such movement or used to train any models. In fact, the SnapLogic platform can be used to enhance data security at source and destination as highlighted above, adding guardrails to an AI system to enforce policies against publication of sensitive or otherwise restricted data. In general it is recommended for domain experts, technical practitioners, and other stakeholders to work together and analyze each proposed use case for AI, avoiding both reflexive refusals and blind enthusiasm, focusing instead on business benefit and how to achieve that in a specific regulatory or policy context. Auditability and Data Lineage The ability to audit the output of AI models is a critical requirement, whether for routine debugging, or in response to incoming regulation (e.g. the EU AI Act) that may require auditability for the deployment of AI technology in certain sectors or for particular use cases. For instance, use of AI models for decision support in legal cases or regulated industries, especially concerning health and welfare, may be subject to legal challenges, leading to requests to audit particular responses that were generated by the model. Commercial and legal concerns may also apply when it comes to use cases that may impinge on IP protection law. Complete forensic auditability of the sort that is provided by traditional software is not possible for LLMs, due to their non-deterministic nature. For this reason, deterministic systems may still be preferable in certain highly-regulated spaces, purely to satisfy this demand. However, a weaker definition of auditability is becoming accepted when it comes to LLMs, where both inputs and outputs are preserved, and the model is required to provide the source information used to generate that output. The source data is considered important both to evaluate the factual correctness of the answer, and also to identify any bias which may make its way into the model from its source data. These factors make auditability and data lineage critical part of the overall AI Strategy, which will have to be applied at various different stages of the solution lifecycle; Model creation and training - This aspect relates to how the model was created, and whether the data sets used to train the model have a risk of either skewing over time, or exposing proprietary information used during model development. Model selection - The precise version of AI model that was used to generate a response will need to be tracked, as even different versions of the same model may produce different responses to the same prompt. For this reason it is important to document the moment of any change in order to be able to track and debug any drift in response or behaviour. For external third-party AI services, these models may need to be tested and profiled as part of both an initial selection process and ongoing validation The reality is that there are is no single AI that is best for all use cases. Experience from real-world deployment of actual AI projects shows that some models are noticeably better for some functions than others, as well as having sometimes radically different cost profiles. Thiese factors means that most probably several different AI models will be used across an enterprise, and even (as agents) to satisfy a single use case. Prompt engineering - Unlike in traditional software development, by its very nature, prompt engineering includes the data sets and structures in the development cycle in a way that traditional functional coding practices do not. The models’ responses are less predictable, so understanding how the data will be processed is an integral part of the prompt engineering lifecycle. To understand how and why a set of prompts are put into production will be driven by both the desired functionality and the data that will be provided, in order to be able to review these inputs if issues arise in the production environment. Prompt evaluation in production - All critical systems today should have robust logging and auditing processes. In reality however many enterprises rarely achieve universal deployment, and coverage is often inconsistent. Due to the nature of AI systems, and notably LLMs, there will be a critical need to audit the data inputs and outputs. The model is effectively a black box that is not available for operators to reconstruct exactly why a given response was provided. This issue is especially critical when multiple AI models, agents, and systems are chained together or networked within a wider process. Logging the precise inputs that are sent to the model will be key for all these purposes. For custom-trained models, these requirements may also extend to the training data — although this case is presumed to remain relatively rare for the foreseeable future, given the prohibitive costs of performing such training. Where more common approaches (RAG, fine-training) are used that do not require an entire model to be trained from scratch, the audit would naturally focus on the inputs to the model and how those are managed. In both these cases, good information hygiene should be maintained, including preservation of historical data for point-in-time auditability. Backing up the data inputs is necessary but not sufficient: after all, a different LLM (or a subsequent version of the same LLM) may provide different responses based on the same prompt and data set. Therefore, if a self-trained LLM is employed, that model should also be backed up in the same way as the data that feeds it. If a public LLM is used, rigorous documentation should be maintained identifying any changes or version upgrades to the external model. All of this work is in addition to the tracking of the prompts and data inputs themselves, as described previously. All of these backups will in turn need to be preserved according to whatever evidentiary concerns are expected to apply. In the case of simple technical audits to ensure continuous improvements and avoid downward pressure on the quality of responses provided, organizations can make their own determination on the level of detail, the width of the time window to be preserved, and the granularity of the data. In more highly regulated scenarios, some or all of these elements may be mandated by outside parties. In those situations, the recommendation would also generally be to specify the backup policy defensively, to avoid any negative impacts in the case of future challenges. Development and Architecture Best Practices While AI systems have notable differences from earlier systems, they are still founded in large part on pre-existing components and techniques, and many existing best practices will still apply, if suitably modified and updated. CI/CD Continuous Integration and Continuous Deployment is of course not specific to Generative AI. However, as GenAI projects move from demo to production, and then evolve over subsequent releases, it becomes necessary to consider them as part of that process. Many components of a GenAI application are stateful, and the relationship between them can also be complex. A roll-back of a vector data store used to support a RAG application may have unforeseen effects if the LLM powering that RAG application remains at a different point of the configuration timeline. Therefore the different components of an AI-enabled system should be considered as tightly coupled for development purposes, as otherwise the GenAI component risks never becoming a fully-fledged part of the wider application environment. In particular, all of the traditional CI/CD concepts should apply also to the GenAI component: Continuous Development Continuous Testing Continuous Integration Continuous Deployment Continuous Monitoring Ensuring the inclusion of development teams in the process is unlikely to be a problem, as the field of AI is still evolving at breakneck pace. However, some of the later stages of an application’s lifecycle are often not part of the worldview of the developers of early demo AI applications, and so may be overlooked in the initial phases of productization of GenAI functionality. All of these phases also have specific aspects that should be considered when it comes to their application to GenAI, so they cannot simply be integrated into existing processes, systems, or modes of thought. New aspects of development and DevOps are needed to support, notable prompt engineering will have to be treated as code artifacts, but will also have to be associated with prompts such as model version, test data sets and samples of return data so that consistent functional management of the combined set of capabilities can be understood and tracked over time. QA and Testing Quality Assurance (QA) and testing strategies for AI projects, particularly GenAI, must address challenges that differ significantly from traditional IT projects. Unlike traditional systems where output is deterministic and follows predefined rules, GenAI systems are probabilistic and rely on complex models trained on vast datasets. A robust QA strategy for GenAI must incorporate dynamic testing of outputs for quality, coherence, and appropriateness across a variety of scenarios. This involves employing both automated testing frameworks and human evaluators to assess the AI's ability to understand prompts and generate contextually accurate responses, while also mitigating risks such as bias, misinformation, or harmful outputs. A GenAI testing framework should include unique approaches like model evaluation using synthetic and real-world data, stress testing for edge cases, and adversarial testing to uncover vulnerabilities such as the attack scenarios listed above. Frameworks such as CI/CD are essential but need to be adapted to accommodate iterative model training and retraining processes. Tools like Explainable AI (XAI) help provide transparency into model decisions, aiding in debugging and improving user trust. Additionally, feedback loops from production environments become vital in fine-tuning the model, enabling ongoing improvement based on real-world performance metrics rather than static, pre-defined test cases. However, depending on the use case and the data provided, such fine-tuning based on user behaviour may itself be sensitive and need to be managed with care. The QA process for GenAI also emphasizes ethical considerations and regulatory compliance more prominently than traditional IT projects. Testing needs to go beyond technical correctness to assess social impact, ensuring that the system avoids perpetuating harmful bias or misinformation. Continuous monitoring after deployment is crucial, as model performance can degrade over time due to shifting data distributions. This contrasts with traditional IT projects, where testing is often a finite phase before deployment. In GenAI, QA is an evolving, lifecycle-long endeavor requiring multidisciplinary collaboration among data scientists, ethicists, domain experts, and software engineers to address the complex, dynamic nature of generative models. Grounding, as an example, is a technique that can be used to help produce model responses that are more trustworthy, helpful, and factual. Grounding generative AI model responses means connecting them to verifiable sources of information. To implement groundin, usually means retrieving relevant source data. The recommended best practice is to use the retrieval-augmented generation (RAG) technique. Other test concepts include; Human-in-the-Loop Testing: Involves human evaluators judging the quality, relevance, and appropriateness of the model's outputs. Source data accuracy with details and data. Adversarial Testing: Actively trying to "break" the model by feeding it carefully crafted inputs designed to expose weaknesses and vulnerabilities Deployment This aspect might superficially be considered among the easiest to cover, but that may well not be the case. Most CI/CD pipelines are heavily automated; can the GenAI aspects be integrated easily into that flow? Some of the processes involved have long durations, e.g. chunking a new batch of information; can they be executed as part of a deployment, or do they need to be pre-staged so that the result can simply be copied into the production environment during a wider deployment action? Monitoring Ongoing monitoring of the performance of the system will also need to be considered. For some metrics, such as query performance or resource utilization, it is simply a question of ensuring that coverage also spans to the new GenAI experience. Other new metrics may also be required that are specific to GenAI, such as users’ satisfaction with the results they receive. Any sudden change in that metric, especially if correlated with a previous deployment of a change to the GenAI components, is grounds for investigation. While extensive best practices exist for the identification of technical metrics to monitor, these new metrics are still very much emergent, and each organization should consider carefully what information is required — or is likely to be required in the event of a future investigation or incident response scenario. Integration of AI with other systems and applications AI strategies are already moving beyond pure analytic or chatbot use case, as the agentic trend continues to develop. These services, whether home grown or hosted by third parties, will need to interface with other IT systems, most notably business processes, and this integration will need to be well considered to be successful. Today LLMs are producing return sets in seconds, and though the models are getting quicker, there is a trend to trade time for greater resilience of quality. How this trade-off is integrated into high performance business systems that operate many orders of magnitude faster will need to be considered and managed with care. Finally, as stated throughout this paper, AI’s non deterministic nature will mandate a focus on compensating patterns across the blend of AI and process systems. Recommendation While it is true that the specifics of AI-enabled systems differ from previous application architectures, general themes should still be carried over, whether by analogy, applying the spirit of the techniques to a new domain, or by ensuring that the more traditional infrastructural components of the AI application are managed with the same rigour as they would be in other contexts. Service / Tool Catalog The shift from content and chatbot experiences to agentic approaches imply a new fundamental architectural consideration of which functions and services should be accessible for use by the model. In the public domain there are simple patterns and models will mainly be operating with other public services and sources — but in the enterprise context, the environment will be more complex. Some examples of questions that a mature enterprise will need to address to maximize the potential of an agentic capability ; What are the “right” services? Large enterprises have hundreds, if not thousands, of services today, all of which are (or should be)managed according to what their business context is. A service management catalog will be key to manage many of these issues,as this will give a consistent point of entry to the service plane. Here again, pre-existing API management capabilities can ensure that the right access and control policies can be applied to support the adoption of composable AI-enabled applications and agents. When it comes to security profiling of a consuming LLM, the requester of the LLM service will have a certain level of access based on a combination of user role and security policy that is enforced. The model will have to pass on this to core systems at run time so that there are no internal data breaches. When it comes to agentic systems, new questions arise, beyond the simpler ones that apply to generative or conversational applications. For instance, should an agent be able to change a record? How much change should be allowed and how will this be tracked? Regulatory Compliance While the field of GenAI-enabled applications is still extremely new, best practices are beginning to emerge, such as those provided by the Open Web Application Security Project (OWASP). These cybersecurity recommendations are of course not guaranteed to cover any particular emerging regulation, but should be considered a good baseline which is almost certain to give a solid foundation from which to work to achieve compliance with national or sector-specific regulation and legislation as it is formalised. In general, it is recommended to ensure that any existing controls on systems and data sets, including RBAC and audit logs, are extended to new GenAI systems as well. Any changes to the new components — model version upgrades, changes to prompts, updates to training data sets, and more — will need to be documented and tracked with the same rigour as established approaches would mandate for traditional infrastructure changes. The points made previously about observability and auditability all contribute to achieving that foundational level of best-practice compliance. It is worth reiterating here that full coverage is expected to be an important difference between GenAI and previous domains. Compliance is likely to go far beyond the technical systems and their configurations, which were previously sufficient, and to require tracking of final prompts as supplied to models, including user input and runtime data. Conclusion Planning and managing the deployment and adoption of novel AI-enabled applications will require new policies and expertise to be developed. New regulation is already being created in various jurisdictions to apply to this new domain, and more is sure to be added in coming months and years. However, much as AI systems require access to existing data and integration with existing systems to deliver value at scale, existing policies, experience, and best practices can be leveraged to ensure success. For this reason, it is important to treat AI as an integral part of strategy, and not its own isolated domain, or worse, delegated to individual groups or departments without central IT oversight or support. By engaging proactively with users’ needs and business cases, IT leaders will have a much better chance of achieving measurable success and true competitive advantage with these new technologies — and avoiding the potential downsides: legal consequences of non-compliance, embarrassing public failures of the system, or simply incorrect responses being generated and acted upon by employees or customers.
GuyM
11 months ago Place Sigma Framework Library
AgentCreator
AI
Best Practices
GenAI
IT. Strategy
1.1KViews
3likes
0Comments
SnapLogic deployment on Kubernetes - A reference guide
Overview SnapLogic supports the deployment of Groundplexes on Kubernetes platforms, thus enabling the application to leverage the various capabilities of Kubernetes. This document explains a few best practice recommendations for the deployment of SnapLogic on Kubernetes along with a sample deployment example using GKE. The examples in this document are specific to the GKE platform however the concepts can be applied to other Kubernetes platforms such as AWS and Azure. Author: Ram Bysani SnapLogic Enterprise Architecture team Helm Chart A Helm chart is used to define the various deployment configurations for an application on Kubernetes. Additional information about Helm charts can be found here. The Helm chart package for a SnapLogic deployment can be downloaded from the Downloads section. It contains the following files: Artifact Comments values.yaml This file defines the default configuration for the SnapLogic Snaplex deployment. It includes variables like the number of JCC nodes, container image details, resource limits, and settings for Horizontal Pod Autoscaling (HPA). Reference: values.yaml Chart.yaml This file defines the metadata and version information for the Helm chart. templates folder This directory contains the Kubernetes manifest templates which define the resources to be deployed into the cluster. These templates are YAML files that specify Kubernetes resources with templating capabilities that allow for parameterization, flexibility, and reuse. templates/deployment.yaml This file defines a Kubernetes Deployment resource for managing the deployment of JCC instances in a cluster. The deployment is created only if the value of jccCount is greater than 0, as specified in the Helm chart's values.yaml file. templates/deployment-feed.yaml This file defines a Kubernetes Deployment resource for managing the deployment of Feedmaster instances. The deployment is conditionally created if the feedmasterCount value in the Helm chart's file values.yaml file is greater than 0. templates/hpa.yaml The hpa.yaml file defines a Horizontal Pod Autoscaler (HPA) resource for a Kubernetes application. The HPA automatically scales the number of pod replicas in a deployment or replica set based on observed metrics such as CPU utilization or custom metrics. templates/service.yaml The service.yaml file describes a Kubernetes service that exposes the JCC component of your Snaplex. It creates a LoadBalancer type service, which allows external access to the JCC components through a public IP address. The service targets only pods labeled as 'jcc' within the specified Snaplex and Helm release, ensuring proper communication and management. templates/service-feed.yaml The service-feed.yaml file describes a Kubernetes service that exposes the Feedmaster components. The service is only created if the value of feedmasterCount in the Helm chart’s values.xml file is > 0. It creates a LoadBalancer type service, which allows external access to the Feedmaster components through a public IP address. templates/service-headless.yaml The service-headless.yaml file describes a Kubernetes service for IPv6 communication. The service is only created if the value of enableIPv6 in the Helm chart’s values.xml file is set to true. Table 1.0 Helm Chart configurations Desired State vs Current State The configurations in the various yaml files (e.g. Deployment, HPA, values, etc.) represent the “Desired” state of a Kubernetes deployment. The Kubernetes controllers constantly monitor the Current state of the deployment to bring it in alignment with the Desired state. Horizontal Pod Autoscaling (HPA) Horizontal Pod Autoscaling (HPA) is a feature in Kubernetes that automatically adjusts the number of replicas (pods) for your deployments based on resource metrics like CPU utilization and memory usage. SnapLogic supports HPA for deployments in a Kubernetes environment. The add-on Metrics server must be installed. Reference: Metrics-Server. Metrics collection is enabled by default in GKE as part of Cloud Monitoring. Note that Custom Metrics and External Metrics, and Vertical Pod Autoscaling (VPA) are not supported for SnapLogic deployments on Kubernetes. Groundplex deployment in a GKE environment - Example In this section, we will go over the various steps for a SnapLogic Groundplex deployment in a GKE environment. Groundplex creation Create a new Groundplex from the Admin Manager interface. Reference: Snaplex_creation. The nodes for this Snaplex will be updated when the application is deployed to the GKE environment. New Snaplex creation GKE Cluster creation Next, we create the GKE cluster on the Google Cloud console. We have created our cluster in Autopilot mode. In this mode, GKE manages the cluster and node configurations including scaling, load balancing, monitoring, metrics, and workload optimization. Reference: GKE Cluster GKE cluster Configure the SnapLogic platform Allowlist Add the SnapLogic platform IP addresses to the Allowlist. See Platform Allowlist. In GKE, this is usually done by configuring an Egress Firewall rule on the GKE cluster. Please refer to the GKE documentation for additional details. Firewall rule - Egress Helm configurations values.yaml The below table explains the configurations for some of the sections from the values.yaml file which we have used in our set up. The modified files are attached to this article for reference. Reference: Helm chart configuration Section Comments # Regular nodes count jccCount: 3 # Feedmaster nodes count feedmasterCount: 0 This defines the number of JCC pods. We have enabled HPA for our test scenario, so the jccCount will be picked from the HPA section. (i.e. minReplicas and maxReplicas). The pod count is the number of pods across all nodes of the cluster. No Feedmaster pods are configured in this example. Feedmaster count can be half of the JCC pod count. Feedmaster is used to distribute Ultra task requests to the JCC pods. HPA configuration is only applicable to the JCC pods and not to the Feedmaster pods. # Docker image of SnapLogic snaplex image: repository: snaplogic/snaplex tag: latest This specifies the latest and most recent release version of the repository image. You can specify a different tag if you need to update the version to a previous release for testing, etc. # SnapLogic configuration link snaplogic_config_link: https://uat.elastic.snaplogic.com/api/1/rest/plex/config/ org/proj_space/shared/project Retrieve the configuration link for the Snaplex by executing the Public API. The config link string is the portion before ?expires in the output value of the API. Example: snaplogic_config_link: https://uat.elastic.snaplogic.com/api/1/rest/plex/config/ QA/RB_Temp_Space/shared/RBGKE_node1 # SnapLogic Org admin credential snaplogic_secret: secret/mysecret Execute the kubectl command: kubectl apply -f snapSecret.yaml Please see the section To create the SnapLogic secret in this document: Org configurations. # CPU and memory limits/requests for the nodes limits: memory: 8Gi cpu: 2000m requests: memory: 8Gi cpu: 2000m Set requests and limits to the same values to ensure resource availability for the container processes. Avoid running other processes in the same container as the JCC so that the JCC can have the maximum amount of memory. # Default file ulimit and process ulimit sl_file_ulimit: 8192 sl_process_ulimit: 4096 The value should be more than the # of slots configured for the node. (Maximum Slots under Node properties of the Snaplex). If not set, then the node defaults will be used. (/etc/security/limits.conf). The JCC process is initialized with these values. # JCC HPA autoscaling: enabled: true minReplicas: 1 maxReplicas: 3 minReplicas defines the minimum number of Pods that must be running. maxReplicas defines the maximum number of Pods that can be scheduled on the node(s). The general guideline is to start with 1:2 or 1:3 Pods per node. The replica Pods are across all nodes of a deployment and not per node. targetAvgCPUUtilization: 60 targetAvgMemoryUtilization: 60 To enable these metrics, the Kubernetes Metrics Server installation is required. Metrics collection is enabled by default in GKE as part of Cloud Monitoring. targetAvgCPUUtilization: Average CPU utilization percentage (i.e. 60 = 60%) This is the average CPU utilization across all Pods. HPA will scale up or scale down Pods to maintain this average. targetAvgMemoryUtilization: Average memory utilization percentage. This parameter is used to specify the average memory utilization (as a percentage of the requested memory) that the HPA should maintain across all the replicas of a particular deployment or stateful set. scaleDownStabilizationWindowSeconds: 600 terminationGracePeriodSeconds: 900 # Enable IPv6 service for DNS routing to pods enableIPv6: false scaleDownStabilizationWindowSeconds is a parameter used in Kubernetes Horizontal Pod Autoscaler (HPA) It controls the amount of time the HPA waits (like a cool-down period) before scaling down the number of pods after a decrease in resource utilization. terminationGracePeriodSeconds defines the amount of time Kubernetes gives a pod to terminate before killing it. If the containers have not exited after terminationGracePeriodSeconds, then Kubernetes sends a SIGKILL signal to forcibly terminate the containers, and remove the pod from the cluster. Table 2.0 - values.yaml Load balancer configuration The service.yaml file contains a section for the Load balancer configuration. Autopilot mode in GKE supports the creation of a Load balancer service. Section Comments type: LoadBalancer ports: - port: 8081 protocol: TCP name: jcc selector: A Load balancer service will be created by GKE to route traffic to the application’s pods. The external IP address and port details must be configured on the Settings tab of the Snaplex. An example is included in the next section of this document. Table 3.0 service.yaml Deployment using Helm Upload the helm zip file package to the Cloud Shell instance by selecting the Upload option. The default Helm package for SnapLogic can be downloaded from here. It is recommended to download the latest package from the SnapLogic documentation link. The values.yaml file with additional custom configurations (as described in Tables 2.0 / 3.0 above) is attached to this article. Execute the command on the terminal to install and deploy the Snaplex release with a unique name such as snaplogic-snaplex using the configurations from the values.yaml file. The release name is a unique identifier, and can be different for multiple deployments such as Dev / Prod, etc. helm install snaplogic-snaplex . -f values.yaml <<Output>> NAME: snaplogic-snaplex NAMESPACE: default STATUS: deployed REVISION: 5 TEST SUITE: None NOTES: You can run this command to update an existing deployment with any new or updated Helm configurations. helm upgrade snaplogic-snaplex . -f values.yaml View the deployed application under the Workloads tab on the Google Cloud Console. Workloads This command returns the HPA details. $ kubectl describe hpa Name: snaplogic-snaplex-hpa Namespace: default Labels: app.kubernetes.io/instance=snaplogic-snaplex app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=snaplogic-snaplex app.kubernetes.io/version=1.0 helm.sh/chart=snaplogic-snaplex-0.2.0 Annotations: meta.helm.sh/release-name: snaplogic-snaplex meta.helm.sh/release-namespace: default Deployment/snaplogic-snaplex-jcc Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 8% (153m) / 60% resource memory on pods (as a percentage of request): 28% (1243540138666m) / 60% Min replicas: 1 Max replicas: 3 Run the kubectl command to list the services. You can see the external IP addresses for the Load balancer service. kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 34.118.224.1 <none> 443/TCP 16d kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 34.118.224.1 <none> 443/TCP 16d snaplogic-snaplex-regular LoadBalancer 34.118.227.164 34.45.230.213 8081:32526/TCP 25m Update Load balancer url on the Snaplex Note the external IP address for the LoadBalancer service, and update the host and port on the Load balancer field of the Snaplex. Example: http://1.3.4.5:8081 Load balancer Listing pods in GKE The following commands can be executed to view the pod statuses. The pod creation and maintenance is fully managed by GKE. $ kubectl top pods $ kubectl get pods kubectl get pods --field-selector=status.phase=Running NAME READY STATUS RESTARTS AGE snaplogic-snaplex-jcc-687d87994-crzw9 0/1 Running 0 2m snaplogic-snaplex-jcc-687d87994-kks7l 1/1 Running 0 2m38s snaplogic-snaplex-jcc-687d87994-pcfvp 1/1 Running 0 2m24s View node details in the SnapLogic Monitor application Each pod represents a JCC node. The maxReplica value is set to 3 so you would see a maximum of 3 nodes (pods) deployed. (Analyze -> Infrastructure tab). Snaplex nodes The below command uninstalls and deletes the deployment from the cluster. All deployed services, metadata, and associated resources are also removed. helm uninstall <deployment_name> Pod registration with the SnapLogic Control Plane Scenario Comments How are the Pod neighbors resolved and maintained by the SnapLogic Control Plane? When a JCC/FeedMaster node (Pod) starts, it registers with the SnapLogic Control Plane, and the Control Plane maintains the list of Pod neighbors. When a JCC/FeedMaster node (Pod) registers, it also publishes its IP address to the Control Plane. An internal list of Pod IP addresses is updated dynamically for neighbor to neighbor communication. DNS resolution is not used. How are the container repository versions updated? The latest Snaplex release build is updated in the docker repository version tagged ‘latest’. The pods will be deployed with this version on startup by referencing the tags from the values.yaml file. If the Snaplex version is updated on the Control Plane to a different version (e.g. main-2872), then the JCC nodes (pods) will be updated to match that version (i.e. main-2872). Reference Groundplex Deployment on Kubernetes https://kubernetes.io/ GKE HPA
ramaonline
11 months ago Place Sigma Framework Library
GKE
groundplex
Kubernetes
Platform
Platform administration
1.7KViews
2likes
0Comments
Data At Scale For AI At Scale: How To Think About Data Readiness
6 MIN READ This week's episode of the Enterprise Alchemists is another live recording from Integreat 2024 in London! This week we have Maks Shah of Syngenta; we had a fascinating conversation during the after-event cocktail party — which is why this episode is a bit shorter than normal. Maks's key takeaway was that "there was a common theme throughout all the presentations this afternoon, and that was that your data has to be fit for it". There is no hashtag#AI success without the data to feed it with. Establishing a solid data foundation is step zero on your journey to hashtag#GenAI.
Dominic
2 years ago Place Enterprise Alchemists
AI
Data
Data Science
GenAI
Podcast
2KViews
0likes
7Comments
The Rise of Agent-Based AI Systems in Enterprise IT, with Jeremiah Stone
26 MIN READ Guy and Dominic talked to Jeremiah Stone, SnapLogic CTO, about the rise of agentic AI, and what that means for traditional conceptions of enterprise architecture: "As an industry, we're turning the corner from shiny object, what does it do, to all right, drop it on your foot". And one of the things that we have learned is what is needed to get GenAI into production: "pair programming, but it's a pair programming model that is pairing a business process expert or an organizational implementer with a technical expert or a technical implementer". We mentioned Nicole Houts and Chris Ward, who did exactly that sort of pair programming here at SnapLogic, with some very significant results. We recorded right before Integreat 2024 in London, so I'm afraid the audio quality isn't up to our usual standards. This was our first time trying to record in the field, so let's call it a learning experience. Fortunately, there is as ever a transcript, so if you prefer to follow along by reading, please do that. There are also links in the show notes to the resources that we mentioned in our conversation.
Dominic
2 years ago Place Enterprise Alchemists
AgentCreator
agentic AI
AI
Enterprise Architects
GenAI
604Views
1like
0Comments
Securing SnapLogic APIs in Hybrid Deployments: The Role of WAF
Securing SnapLogic APIs in Hybrid Deployments: The Role of WAF APIs play a vital role in integrating on-premises, cloud-based, and third-party applications for SnapLogic integration workloads. As API connectivity scales over time, so does the need for robust security measures to protect these integration points from potential threats. This is where a Web Application Firewall (WAF) can be leveraged by organizations to ensure API security. A WAF, positioned between client applications and SnapLogic's Groundplex clusters (as seen in the diagrams), helps by inspecting and filtering traffic to and from SnapLogic's API endpoints. The WAF provides defense against a wide range of common web threats, including: SQL Injection Cross-site Scripting (XSS) Distributed Denial of Service (DDoS) attacks Brute-force attacks Organizations can implement a WAF in front of their SnapLogic's Groundplex clusters, whether in cloud environments like AWS, Azure, or on-premise data centers, to monitor and control API traffic. This ensures that only legitimate requests reach the integration layers, helping to prevent malicious traffic from compromising your critical data and services. The WAF inspects incoming API requests for common security threats, such as SQL injections, cross-site scripting (XSS), and other vulnerabilities, ensuring that integrations running in SnapLogic operate within a secure framework. This added layer of protection not only shields your infrastructure from external attacks but also helps maintain the integrity and performance of your API-driven workloads. Key Benefits of Deploying a WAF Enhanced API Protection: A WAF scrutinizes incoming requests, identifying and blocking malicious payloads, ensuring the APIs that connect your cloud apps and on-premise systems remain secure. Scalability and High Availability: In SnapLogic’s hybrid environments, including on-premise and cloud (Azure/AWS), a WAF helps ensure traffic is balanced and high availability is maintained, even during periods of peak demand. Compliance Support: Many industries require stringent security standards (e.g., HIPAA, GDPR). A WAF helps ensure SnapLogic's API traffic meets these regulatory requirements by preventing unauthorized data leakage and access. Traffic Filtering and Logging: WAFs can analyze traffic patterns and provide detailed logs of API interactions. This is valuable for detecting anomalies and improving incident response times. SnapLogic supports multiple deployment models, including on-premise and cloud configurations. Below are two typical deployment scenarios showing where WAF integrates into the SnapLogic runtime infrastructure (Snaplex) Single Region - Cloud-Native SnapLogic Deployment (Azure/AWS/GCP) In cloud-based deployments, organizations leverage platforms like Azure and AWS to scale SnapLogic integration workloads. A WAF (such as Azure Application Gateway) can be deployed in front of the API Gateway to add an additional security layer for all API interactions. This setup helps ensure that integrations can securely connect to a wide range of cloud apps and data sources, protecting them from external threats. On Premise - Multi Cluster Configuration In this example of an on-premise setup, an organization deploys a WAF (such as Akamai) in the network’s DMZ (Demilitarized Zone) to protect SnapLogic’s Groundplex clusters. The WAF inspects all incoming traffic from external clients and forwards only secure and legitimate API requests to the internal SnapLogic Groundplex nodes. This approach helps ensure that sensitive integration workflows, databases, and applications remain isolated from external threats. Traffic Flow Here’s a description of the flow of an API request as it passes through a Web Application Firewall (WAF) to the SnapLogic Snaplex infrastructure. 1. API Request from the Client Application Originating from the client (either a web application, mobile app, or another API client), the API request is sent over the internet to an endpoint. This request is typically directed at the API Gateway, which acts as the initial point of contact for all external API calls. The request contains various headers, data payloads, and parameters that specify what kind of operation (GET, POST, PUT, DELETE, etc.) the client wants to perform on the API. 2. Traffic Hits the Web Application Firewall (WAF) Before reaching the Snaplex infrastructure, the API request first passes through the WAF. The WAF is typically deployed between the public internet and the organization's internal network (cloud or on-premises). Inspection and Filtering: The WAF inspects the API request for any malicious content or behaviors that could indicate a security threat. This might include: SQL Injections Cross-Site Scripting (XSS) Distributed Denial of Service (DDoS) attacks Brute-force attacks Any other patterns that could compromise the API or application. Traffic Policies: Based on predefined security policies and rule sets (specific to the organization’s needs), the WAF determines if the request is safe to proceed or needs to be blocked. Requests that violate any of the rules (e.g., malformed headers, suspicious payloads, unexpected request methods) are blocked or redirected. 3. API Gateway or Load Balancer If the request passes through the WAF without being flagged as a security threat, it is forwarded to the organization’s API Gateway or load balancer. In a cloud-based architecture, this could be services like AWS Elastic Load Balancer or Azure Application Gateway, which manage API traffic and distribute it across backend resources. In an on-premise architecture, similar load balancing and routing components manage the flow. The API Gateway ensures that traffic is efficiently routed to the appropriate Snaplex nodes and that only valid, secure API requests proceed. 4. Reaching SnapLogic Groundplex Clusters After passing through the WAF and load balancer, the API request reaches SnapLogic's Groundplex clusters. Depending on the deployment (on-premise, AWS, Azure), the clusters can be distributed across different regions and environments. Within the Groundplex clusters, the request is processed by SnapLogic’s integration pipelines. The Groundplex cluster executes SnapLogic tasks, which involve data integration, orchestration, transformation, or connection to third-party applications, databases, or APIs. The request might trigger various integration workflows, such as: Connecting to an on-premise database (e.g., Oracle, MySQL) to retrieve or update data. Calling an external cloud-based service (e.g., Salesforce, Workday, etc.). Processing data transformations (ETL/ELT) in a data pipeline.
Dominic
2 years ago Place Sigma Framework Library
WAF
Web Application Firewall
1.7KViews
1like
1Comment