Secure your AgentCreator App - With SnapLogic API Management
Why Security is Essential for Generative AI Applications As generative AI applications transition from prototypes to enterprise-grade solutions, ensuring security becomes non-negotiable. These applications often interact with sensitive user data, internal databases, and decision-making logic that must be protected from unauthorized access. Streamlit, while great for quickly developing interactive AI interfaces, lacks built-in access control mechanisms. Therefore, integrating robust authentication and authorization workflows is critical to safeguarding both the user interface and backend APIs. Overview of the AgentCreator + Streamlit Architecture This guide focuses on securing a generative AI-powered Sales Agent application built with SnapLogic AgentCreator and deployed via Streamlit. The application integrates Salesforce OAuth 2.0 as an identity provider and secures its backend APIs using SnapLogic API Management. Through this setup, only authorized Salesforce users from a trusted domain can access the application, ensuring end-to-end security for both the frontend and backend. Understanding the Application Stack Role of SnapLogic's AgentCreator Toolkit The SnapLogic AgentCreator Toolkit enables developers and sales engineers to build sophisticated AI-powered agents without having to manage complex infrastructure. These agents operate within SnapLogic pipelines, making it easy to embed business logic, API integrations, and data processing in a modular way. For example, a sales assistant built with AgentCreator and exposed as API using Triggered Tasks can pull real-time CRM data, generate intelligent responses, and return it via a clean web interface. Streamlit as User Interface On the frontend, Streamlit is used to build a simple, interactive web interface for users to query the Sales Agent. Importance of API Management in AI Workflows Once these agents are exposed via HTTP APIs, managing who accesses them—and how—is crucial. That’s where SnapLogic API Management comes in. It provides enterprise-grade tools for API publishing, securing endpoints, enforcing role-based access controls, and monitoring traffic. These features ensure that only verified users and clients can interact with your APIs, reducing the risk of unauthorized data access or abuse. However, the real challenge lies in securing both ends: The Streamlit UI, which needs to restrict access to authorized users. The SnapLogic APIs, exposing the AgentCreator Pipelines which must validate and authorize each incoming request. OAuth 2.0 Authentication: Fundamentals and Benefits What is OAuth 2.0? OAuth 2.0 is an open standard protocol designed for secure delegated access. Instead of sharing credentials directly, users grant applications access to their resources using access tokens. This model is particularly valuable in enterprise environments, where central identity management is crucial. By using OAuth 2.0, applications can authenticate users through trusted Identity Providers (IDPs) while maintaining a separation of concerns between authentication, authorization, and application logic. Why Use Salesforce as the Identity Provider (IDP)? Salesforce is a robust identity provider that many organizations already rely on for CRM, user management, and security. Leveraging Salesforce for OAuth 2.0 authentication allows developers to tap into a pre-existing user base and organizational trust framework. In this tutorial, Salesforce is used to handle login and token issuance, ensuring that only authorized Salesforce users can access the Streamlit application. This integration also simplifies compliance with enterprise identity policies such as SSO, MFA, and domain-based restrictions. To address the authentication challenge, we use the OAuth 2.0 Authorization Code Flow, with Salesforce acting as both the Identity and Token Provider. Here is Salesforce’s official documentation on OAuth endpoints, which is helpful for configuring your connected app. 🔒 Note: While Salesforce is a logical choice for this example—since the Sales Agent interacts with Salesforce data—any OAuth2-compliant Identity Provider (IDP) such as Google, Okta, or Microsoft Entra ID (formerly Azure AD) can be used. The core authentication flow remains the same, with variations primarily in OAuth endpoints and app registration steps. Architecture Overview and Security Objectives Frontend (Streamlit) vs Backend (SnapLogic APIs) The application architecture separates the frontend interface and backend logic. The frontend is built using Streamlit, which allows users to interact with a visually intuitive dashboard. It handles login, displays AI-generated responses, and captures user inputs. The backend, powered by SnapLogic's AgentCreator, hosts the core business logic within pipelines that are exposed as APIs. This separation ensures flexibility and modular development, but it also introduces the challenge of securing both components independently yet cohesively. Threat Model and Security Goals The primary security threats in such a system include unauthorized access to the UI, data leaks through unsecured APIs, and token misuse. To mitigate these risks, the following security objectives are established: Authentication: Ensure only legitimate users from a trusted identity provider (Salesforce) can log in. Authorization: Grant API access based on user roles and domains, verified via SnapLogic APIM policies. Token Integrity: Validate and inspect access tokens before allowing backend communication with SnapLogic APIM Policies Secret Management: Store sensitive credentials (like Client ID and Secret) securely using Streamlit's secret management features. This layered approach aligns with enterprise security standards and provides a scalable model for future generative AI applications. Authentication & Authorization Flow Here’s how we securely manage access: 1. Login via Salesforce: Users are redirected to Salesforce’s login screen. After successful login, Salesforce redirects back to the app with an access token. The token and user identity info are stored in Streamlit’s session state. 2. Calling SnapLogic APIs: The frontend sends requests to SnapLogic’s triggered task APIs, attaching the Salesforce access token in the Authorization HTTP Header. 3. Securing APIs via SnapLogic Policies: Callout Authenticator Policy: Validates the token by sending it to Salesforce’s token validation endpoint, as Salesforce tokens are opaque and not self-contained like JWTs. AuthorizeByRole Policy: After extracting the user’s email address, this policy checks if the domain (e.g., @snaplogic.com) is allowed. If so, access is granted. Below you can find the complete OAuth 2 Authorization Code Flow enhanced with the Token Introspection & Authorization Flow This setup ensures end-to-end security, combining OAuth-based authentication with SnapLogic’s enterprise-grade API Management capabilities. In the following sections, we’ll walk through how to implement each part—from setting up the Salesforce Connected App to configuring policies in SnapLogic—so you can replicate or adapt this pattern for your own generative AI applications. Step 1: Set Up Salesforce Connected App Navigate to Salesforce Developer Console To initiate the OAuth 2.0 authentication flow, you’ll need to register your application as a Connected App in Salesforce. Begin by logging into your Salesforce Developer or Admin account. From the top-right gear icon, navigate to Setup → App Manager. Click on “New Connected App” to create a new OAuth-enabled application profile. Define OAuth Callback URLs and Scopes In the new Connected App form, set the following fields under the API (Enable OAuth Settings) section: Callback URL: This should be the URL of your Streamlit application (e.g., https://snaplogic-genai-builder.streamlit.app/Sales_Agent). Selected OAuth Scopes: Include at least openid, email, and profile. You may also include additional scopes depending on the level of access required. Ensure that the “Enable OAuth Settings” box is checked to make this app OAuth-compliant. Retrieve Client ID and Client Secret After saving the app configuration, Salesforce will generate a Consumer Key (Client ID) and a Consumer Secret. These are crucial for the OAuth exchange and must be securely stored. You will use these values later when configuring the Streamlit OAuth integration and environmental settings. Do not expose these secrets in your codebase or version control. 📄 For details on Salesforce OAuth endpoints, see: 👉 Salesforce OAuth Endpoints Documentation Step 2: Integrate OAuth with Streamlit Using streamlit-oauth Install and Configure streamlit-oauth Package To incorporate OAuth 2.0 authentication into your Streamlit application, you can use the third-party package streamlit-oauth (streamlit-oauth). This package abstracts the OAuth flow and simplifies integration with popular identity providers like Salesforce. To install it, run the following command in your terminal: pip install streamlit-oauth After installation, you'll configure the OAuth2Component to initiate the login process and handle token reception once authentication is successful. Handle ClientID and ClientSecret Securely Once users log in through Salesforce, the app receives an Access Token and an ID token. These tokens should never be exposed in the UI or logged publicly. Instead, store them securely in st.session_state, Streamlit's native session management system. This ensures the tokens are tied to the user's session and can be accessed for API calls later in the flow. Store Credentials via Streamlit Secrets Management Storing secrets such as CLIENT_ID and CLIENT_SECRET directly in your source code is a security risk. Streamlit provides a built-in Secrets Management system that allows you to store sensitive information in a .streamlit/secrets.toml file, which should be excluded from version control. Example: # .streamlit/secrets.toml SF_CLIENT_ID = "your_client_id" SF_CLIENT_SECRET = "your_client_secret" In your code, you can access these securely: CLIENT_ID = st.secrets["SF_CLIENT_ID"] CLIENT_SECRET = st.secrets["SF_CLIENT_SECRET"] Step 3: Manage Environment Settings with python-dotenv Why Environment Variables Matter Managing environment-specific configuration is essential for maintaining secure and scalable applications. In addition to storing sensitive credentials using Streamlit’s secrets management, storing dynamic OAuth parameters such as URLs, scopes, and redirect URIs in an environment file (e.g., .env) allows you to keep code clean and configuration flexible. This is particularly useful if you plan to deploy across multiple environments (development, staging, production) with different settings. Store OAuth Endpoints in .env Files To manage environment settings, use the python-dotenv package (python-dotenv), which loads environment variables from a .env file into your Python application. First, install the library: pip install python-dotenv Create a .env file in your project directory with the following format: SF_AUTHORIZE_URL=https://login.salesforce.com/services/oauth2/authorize SF_TOKEN_URL=https://login.salesforce.com/services/oauth2/token SF_REVOKE_TOKEN_URL=https://login.salesforce.com/services/oauth2/revoke SF_REDIRECT_URI=https://your-streamlit-app-url SF_SCOPE=id openid email profile Then, use the dotenv_values function to load the variables into your script: from dotenv import dotenv_values env = dotenv_values(".env") AUTHORIZE_URL = env["SF_AUTHORIZE_URL"] TOKEN_URL = env["SF_TOKEN_URL"] REVOKE_TOKEN_URL = env["SF_REVOKE_TOKEN_URL"] REDIRECT_URI = env["SF_REDIRECT_URI"] SCOPE = env["SF_SCOPE"] This approach ensures that your sensitive and environment-specific data is decoupled from the codebase, enhancing maintainability and security. Step 4: Configure OAuth Flow in Streamlit Define OAuth2 Component and Redirect Logic With your environment variables and secrets in place, it’s time to configure the OAuth flow in Streamlit using the OAuth2Component from the streamlit-oauth package. This component handles user redirection to the Salesforce login page, token retrieval, and response parsing upon return to your app. from streamlit_oauth import OAuth2Component oauth2 = OAuth2Component( client_id=CLIENT_ID, client_secret=CLIENT_SECRET, authorize_url=AUTHORIZE_URL, token_url=TOKEN_URL, redirect_uri=REDIRECT_URI ) # create a button to start the OAuth2 flow result = oauth2.authorize_button( name="Log in", icon="https://www.salesforce.com/etc/designs/sfdc-www/en_us/favicon.ico", redirect_uri=REDIRECT_URI, scope=SCOPE, use_container_width=False ) This button initiates the OAuth2 flow and handles redirection transparently. Once the user logs in successfully, Salesforce redirects them back to the app with a valid token. Handle Session State for Tokens and User Data After authentication, the returned tokens are stored in st.session_state to maintain a secure, per-user context. Here’s how to decode the token and extract user identity details: if result: #decode the id_token and get the user's email address id_token = result["token"]["id_token"] access_token = result["token"]["access_token"] # verify the signature is an optional step for security payload = id_token.split(".")[1] # add padding to the payload if needed payload += "=" * (-len(payload) % 4) payload = json.loads(base64.b64decode(payload)) email = payload["email"] username = payload["name"] #storing token and its parts in session state st.session_state["SF_token"] = result["token"] st.session_state["SF_user"] = username st.session_state["SF_auth"] = email st.session_state["SF_access_token"]=access_token st.session_state["SF_id_token"]=id_token st.rerun() else: st.write(f"Congrats **{st.session_state.SF_user}**, you are logged in now!") if st.button("Log out"): cleartoken() st.rerun() This mechanism ensures that the authenticated user context is preserved across interactions, and sensitive tokens remain protected within the session. The username displays in the UI after a successful login. 😀 Step 5: Create and Expose SnapLogic Triggered Task Build Backend Logic with AgentCreator Snaps With user authentication handled on the frontend, the next step is to build the backend business logic using SnapLogic AgentCreator. This toolkit lets you design AI-powered pipelines that integrate with data sources, perform intelligent processing, and return contextual responses. You can use pre-built Snaps (SnapLogic connectors) for Salesforce, OpenAI, and other services to assemble your Sales Agent pipeline. Generate the Trigger URL for API Access Once your pipeline is tested and functional, expose it as an API using a Triggered Task: In SnapLogic Designer, open your Sales Agent pipeline. Click on “Create Task” and choose “Triggered Task”. Provide a meaningful name and set runtime parameters if needed. After saving, note the generated Trigger URL—this acts as your backend endpoint to which the Streamlit app will send requests. This URL is the bridge between your authenticated frontend and the secure AI logic on SnapLogic’s platform. However, before connecting it to Streamlit, you'll need to protect it using SnapLogic API Management, which we'll cover in the next section. Step 6: Secure API with SnapLogic API Manager Introduction to API Policies: Authentication and Authorization To prevent unauthorized access to your backend, you must secure the Triggered Task endpoint using SnapLogic API Management. SnapLogic enables policy-based security, allowing you to enforce authentication and authorization using Salesforce-issued tokens. Two primary policies will be applied: Callout Authenticator and Authorize By Role. The new Policy Editor of SnapLogic APIM 3.0 Add Callout Authenticator Policy This policy validates the access token received from Salesforce. Since Salesforce tokens are opaque (not self-contained like JWTs), the Callout Authenticator policy sends the token to Salesforce’s introspection endpoint for validation. If the token is active, Salesforce returns the user's metadata (email, scope, client ID, etc.). Example of a valid token introspection response: { "active": true, "scope": "id refresh_token openid", "client_id": "3MVG9C...", "username": "mpentzek@snaplogic.com", "sub": "https://login.salesforce.com/id/...", "token_type": "access_token", "exp": 1743708730, "iat": 1743701530, "nbf": 1743701530 } If the token is invalid or expired, the response will simply show: { "active": false } Below you can see the configuration of the Callout Authenticator Policy: Extract the domain from the username (email) returned by the Introspection endpoint after successful token validation for use in the Authorize By Role Policy. Add AuthorizeByRole Policy Once the token is validated, the Authorize By Role policy inspects the username (email) returned by Salesforce. You can configure this policy to allow access only to users from a trusted domain (e.g., @snaplogic.com), ensuring that external users cannot exploit your API. For example, you might configure the policy to check for the presence of “snaplogic” in the domain portion of the email. This adds a second layer of security after token verification and supports internal-only access models. Step 7: Connect the Streamlit Frontend to the Secured API Pass Access Tokens in HTTP Authorization Header Once the user has successfully logged in and the access token is stored in st.session_state, you can use this token to securely communicate with your SnapLogic Triggered Task endpoint. The access token must be included in the HTTP request’s Authorization header using the Bearer token scheme. headers = { 'Authorization': f'Bearer {st.session_state["SF_access_token"]}' } This ensures that the SnapLogic API Manager can validate the request and apply both authentication and authorization policies before executing the backend logic. Display API Responses in the Streamlit UI To make the interaction seamless, you can capture the user’s input, send it to the secured API, and render the response directly in the Streamlit app. Here’s an example of how this interaction might look: import requests import streamlit as st prompt = st.text_input("Ask the Sales Agent something:") if st.button("Submit"): with st.spinner("Working..."): data = {"prompt": prompt} headers = { 'Authorization': f'Bearer {st.session_state["SF_access_token"]}' } response = requests.post( url="https://your-trigger-url-from-snaplogic", data=data, headers=headers, timeout=10, verify=False # Only disable in development ) if response.status_code == 200: st.success("Response received:") st.write(response.text) else: st.error(f"Error: {response.status_code}") This fully connects the frontend to the secured backend, enabling secure, real-time interactions with your generative AI agent. Common Pitfalls and Troubleshooting Handling Expired or Invalid Tokens One of the most common issues in OAuth-secured applications is dealing with expired or invalid tokens. Since Salesforce access tokens have a limited lifespan, users who stay inactive for a period may find their sessions invalidated. To address this: Always check the token's validity before making API calls. Gracefully handle 401 Unauthorized responses by prompting the user to log in again. Implement a token refresh mechanism if your application supports long-lived sessions (requires refresh token configuration in Salesforce). By proactively managing token lifecycle, you prevent disruptions to user experience and secure API communications. Debugging OAuth Redirection Errors OAuth redirection misconfigurations can block the authentication flow. Here are common issues and their solutions: Incorrect Callback URL: Ensure that the SF_REDIRECT_URI in your .env file matches exactly what’s defined in the Salesforce Connected App settings. Missing Scopes: If the token does not contain expected identity fields (like email), verify that all required scopes (openid, email, profile) are included in both the app config and OAuth request. Domain Restrictions: If access is denied even after successful login, confirm that the user’s email domain matches the policy set in the SnapLogic API Manager. Logging the returned error messages and using browser developer tools can help you pinpoint the issue during redirection and callback stages. Best Practices for Secure AI Application Deployment Rotate Secrets Regularly To reduce the risk of secret leakage and potential exploitation, it's essential to rotate sensitive credentials—such as CLIENT_ID and CLIENT_SECRET—on a regular basis. Even though Streamlit’s Secrets Management securely stores these values, periodic rotation ensures resilience against accidental exposure, insider threats, or repository misconfigurations. To streamline this, set calendar reminders or use automated DevSecOps pipelines that replace secrets and update environment files or secret stores accordingly. Monitor API Logs and Auth Failures Security doesn’t stop at implementation. Ongoing monitoring is critical for identifying potential misuse or intrusion attempts. SnapLogic’s API Management interface provides detailed metrics that can help you: Track API usage per user or IP address. Identify repeated authorization failures or token inspection errors. Spot anomalous patterns such as unexpected call volumes or malformed requests. Extending the Architecture Supporting Other OAuth Providers (Google, Okta, Entra ID) While this tutorial focuses on Salesforce as the OAuth 2.0 Identity Provider, the same security architecture can be extended to support other popular providers like Google, Okta, and Microsoft Entra ID (formerly Azure AD). These providers are fully OAuth-compliant and typically offer similar endpoints for authorization, token exchange, and user introspection. To switch providers, update the following in your .env file: SF_AUTHORIZE_URL SF_TOKEN_URL SF_SCOPE (as per provider documentation) Also, make sure your app is registered in the respective provider’s developer portal and configured with the correct redirect URI and scopes. Adding Role-Based Access Controls For larger deployments, simple domain-based filtering may not be sufficient. You can extend authorization logic by incorporating role-based access controls (RBAC). This can be achieved by: Including custom roles in the OAuth token payload (e.g., via custom claims). Parsing these roles in SnapLogic’s AuthorizeByRole policy. Restricting access to specific APIs or features based on user roles (e.g., admin, analyst, viewer). RBAC allows you to build multi-tiered applications with differentiated permissions while maintaining strong security governance. Conclusion Final Thoughts on Secure AI App Deployment Securing your generative AI applications is no longer optional—especially when they’re built for enterprise use cases involving sensitive data, customer interactions, and decision automation. This tutorial demonstrated a complete security pattern using SnapLogic AgentCreator and Streamlit, authenticated via Salesforce OAuth 2.0 and protected through SnapLogic API Management. By following this step-by-step approach, you ensure only verified users can access your app, and backend APIs are shielded by layered authentication and role-based authorization policies. The same architecture can easily be extended to other providers or scaled across multiple AI workflows within your organization. Resources for Further Learning SnapLogic Resources and Use Cases Salesforce Developer Docs Streamlit Documentation OAuth 2.0 Official Specification With a secure foundation in place, you’re now empowered to build and scale powerful, enterprise-grade AI applications confidently.100Views0likes0CommentsUnlocking the Power of LLMs with OpenAPI Tool Integration
Large Language Models (LLMs) are revolutionizing the way we interact with digital systems, from conversational agents to intelligent automation. But to truly harness their capabilities, especially in enterprise and developer ecosystems, it’s essential to bridge the gap between LLMs and external systems through tools—specifically APIs. This is where OpenAPI plays a pivotal role. What is OpenAPI? OpenAPI (formerly Swagger) is an open-source specification that defines a standard, machine-readable format for describing RESTful APIs. It enables developers and automated systems to understand an API’s structure—including endpoints, request parameters, authentication methods, and response types—without relying on traditional documentation or access to source code. Its adoption spans industries such as technology, finance, and healthcare, thanks to its interoperability with a wide array of tools and frameworks. Why OpenAPI Matters for LLMs Integrating OpenAPI with LLMs enhances their ability to interact with real-world systems. Here's how: Universal Interface: OpenAPI acts as a universal bridge to RESTful APIs, making it possible for LLMs to interact with services ranging from cloud infrastructure to productivity apps. Standardized Format: The standardized schema helps LLMs accurately interpret API functionality—including expected inputs and outputs—without ambiguity. Accelerated Tool Creation: Developers can efficiently build LLM-compatible tools by parsing OpenAPI definitions directly. Seamless Integration: With broad support from API tooling ecosystems, OpenAPI enables quick embedding of LLM agents into existing workflows. Supports Tool Calling: Tool calling allows LLMs to autonomously select and invoke relevant APIs based on user prompts—a key feature unlocked by structured OpenAPI descriptions. Enabling LLM Tool Calling with SnapLogic To connect LLMs with OpenAPI-defined tools, the OpenAPI Function Generator Snap plays a crucial role. This component converts any OpenAPI spec into a tool object that LLMs can use through the Tool Calling pipeline in SnapLogic. Input Options for the Generator Snap The generator supports multiple input methods: URL: Directly fetch the OpenAPI spec from a provided URL. Text Editor: Paste the raw spec into a built-in editor. Input Document: Pass the OpenAPI string as part of an input document via expression. File Upload: Select a spec file stored in the SLDB. Output Structure The generated tool output includes: sl_tool_metadata: Metadata such as security parameters, headers, and base URLs. json_schema: A schema of the input parameters. These tools can be passed into the Tool Calling Snap, which then resolves runtime variables like headers and endpoint URLs dynamically. Developers can chain this with an HTTP Client Snap to perform real API calls based on LLM outputs. Passing Through the Tool Calling Snap When the tool is passed through the Tool Calling Snap, it dynamically processes and resolves several key components using the metadata and user input: Resolved URL: The base URL and path parameters from the OpenAPI spec are combined with user-supplied values to generate the final API endpoint. Headers: Custom headers, or content-type headers are filled in based on the OpenAPI security definitions or context provided by the LLM. This resolved output makes it simple for downstream snaps (like HTTP Client) to directly execute the API call. Action tools with HTTP Client Snap Once the Tool Calling Snap generates the resolved tool data, this output can be piped directly into an HTTP Client Snap for execution: This setup effectively turns a static OpenAPI definition into a fully dynamic and executable workflow, allowing LLMs to autonomously interact with real services. Real-World Use Cases With the right configuration, LLMs can interact with virtually any OpenAPI-compliant service. This opens up a wide range of practical applications across productivity tools, developer APIs, data services, and more. Example Use Case: Load Products from FakeStore API and Save as CSV in GitHub Gist This example shows how an LLM can orchestrate a two-step integration using OpenAPI specs and tool calling via SnapLogic: Fetch Data: Retrieve product data from FakeStore API. Transform & Upload: Format the data as CSV and post it as a public GitHub Gist using GitHub’s Gist API. Main Pipeline (download) Loop Pipeline (download, github openapi file, fake store openapi file) Prompt to LLM: “Load all products from FakeStore API and upload them as a CSV file to GitHub Gist.” Pipeline Flow Breakdown Step 1: FakeStore API Tool Call OpenAPI Tool: FakeStore API spec (loaded via URL or file). LLM Task: Recognize the available /products endpoint and trigger a GET request to retrieve the full list of products. Tool Calling Snap Output: Resolved URL to https://fakestoreapi.com/products, method GET, no authentication needed. Step 2: GitHub Gist API Tool Call OpenAPI Tool: GitHub Gist API spec, with token-based authentication defined in sl_tool_metadata. LLM Task: Use the POST /gists endpoint, and construct the request body with: description: e.g., "FakeStore Products Export" public: true files: A JSON object with one file (e.g., "products.csv": { content: "<csv data>" }) Step 3: Summarize the Result LLM Task: Extract and present key details from the final Gist API response, such as: Total number of products exported Link to the created Gist (e.g., html_url) Confirmation message for the user Final Result:638Views0likes0CommentsMulti Pipeline Function Generator - Simplifies Agent Worker Pipeline
This article introduces a new Snap called the “Multi Pipeline Function Generator”. The Multi Pipeline Function Generator is designed to take existing Pipelines in your SnapLogic Project and turn their configurations into function definitions for LLM-based tool calling. It achieves the following: It replaces the existing chain of function generators, therefore reduces the length of the worker pipeline. Combined with our updates to the tool calling snaps, this snap allows multiple tool calling branches to be merged into a single branch, simplifying the pipeline structure. With it, users can directly select the desired pipeline to be used as a tool from a dropdown menu. The snap will automatically retrieve the tool name, purpose, and parameters from the pipeline properties to generate a function definition in the required format. Problem Statement Currently, the complexity of the agent worker pipeline increases linearly with the number of tools it has. The image below shows a worker pipeline with three tools. It requires three function generators and has three tool calling branches to execute different tools. This becomes problematic when the number of tools is large, as the pipeline becomes very long both horizontally and vertically. Current Agent Worker Pipeline With Three Tools Solution Overview One Multi Pipeline Function Generator snap can replace multiple function generators (as long as the tool is a pipeline; it's not applicable if the tool is of another type, such as OpenAPI or APIM service). New Agent Worker Pipeline Using “Multi Pipeline Function Generator” Additionally, for each outputted tool definition, it includes the corresponding pipeline's path. This allows downstream components (the Pipeline Execute snap) to directly call the respective tool pipeline with the path, as shown below. The Multi Pipeline Function Generator snap allows users to select multiple tool pipelines at once through dropdown menus. It reads the necessary data for generating function definition from the pipeline properties. Of course, this requires that the data has been set up in the pipeline properties beforehand (will be explained later). The image below shows the settings for this snap. Snap Settings How to Use the Snap To use this snap, you need to: Fill in the necessary information for generating the function definition in the properties of your tool pipeline. The pipeline's name will become the function name The information under 'info -> purpose' will become the function description. Each key in your OpenAPI specification will be treated as a parameter, so you will ALSO need to add the expected input parameters to the list of pipeline parameters. Please note that in the current design, the pipeline parameters specified here are solely used for generating the function definition. When utilizing parameters within the pipeline, you do not need to retrieve their values using pipeline parameters. Instead, you can directly access the argument values from the input document, as determined by the model based on the function definition. Then, you can select this pipeline as a tool from the dropdown menu in the Multi Pipeline Function Generator snap. In the second output of the tool calling snap, we only need to keep one branch. In the pipeline execute snap, we can directly use the expression $sl_tool_metadata.path to dynamically retrieve the path of the tool pipeline being called. See image below. Below is an example of the pipeline properties for the tool 'CRM_insight' for your reference. Below is the settings page of the original function generator snap for comparison. As you can see, the information required is the same. The difference is that now we directly fill this information into the pipeline's properties. Step 3 - reduce the number of branches More Design Details The tool calling snap has also been updated to support $sl_tool_metadata.path , since the model's initial response doesn't include the pipeline path which is needed. After the tool calling snap receives the tools the model needs to call, it adds the sl_tool_metadata containing the pipeline path to the model's response and outputs it to the snap's second output view. This allows us to use it in the pipeline execute snap later. This feature is supported for tool calling with Amazon Bedrock, OpenAI, Azure OpenAI, and Google GenAI snap packs. The pipeline path can accept either a string or a list as input. By turning on the 'Aggregate input' mode, multiple input documents can be combined into a single function definition document for output, similar to that of a gate snap. This can be useful in scenarios like this: you use a SnapLogic list snap to enumerate all pipelines within a project, then use a filter snap to select the desired tool pipelines, and finally use the multi pipeline function generator to convert this series of pipelines into function definitions. Example Pipelines Download here. Conclusion In summary, the Multi Pipeline Function Generator snap streamlines the creation of function definitions for pipeline as tool in agent worker pipelines. This significantly reduces pipeline length in scenarios with numerous tools, and by associating pipeline information directly with the pipeline, it enhances overall manageability. Furthermore, its applicability extends across various providers.617Views0likes1CommentHello World with GenAI
GenAI is a powerful toolset designed to help you develop and optimize large language models (LLMs) such as OpenAI, Claude, Google Gemini, and more, within your own data pipelines on the SnapLogic platform. By leveraging SnapLogic Designer, you can seamlessly integrate LLMs with your data pipeline and store the responses on various data platforms supported by SnapLogic. Get Started This section will guide you through initiating your first interaction with LLM models on the SnapLogic platform. Follow these steps to get familiar with GenAI. 1. Open the Designer page on SnapLogic and create a new, empty pipeline. 2. Introduce the new Snap called “Chat Completion.” This Snap allows you to send prompts to LLM models. Locate it in the Snap list by searching for “Chat Completion.” 3. Drag the Chat Completion Snap to the canvas screen (we’ll use OpenAI as an example). 4. Configuring the Chat Completion Snap a. Click on the Snap to open the configuration modal. b. In the “Account” tab, select the account that will be used to connect to the LLM model. i. (Optional) If necessary, you can create a new account by selecting “Add New Account,” choosing the location, and entering your credentials. You can retrieve your API key from OpenAI API Keys. 5. Move to the “Settings” tab. a. Select a model from the list of available options by clicking to list the available models. b. Customize the “prompt” field, e.g., by entering “Hello GenAI.” 6. Save and close the settings. The pipeline should validate automatically, but if it doesn’t, you can manually validate it by clicking the validate button . 7. To check the result, click the circle to the right of the Snap. The LLM model’s response will appear in the “content” field. For example, it might return “Hello! How can I assist you today?” The “finish_reason” field will indicate “stop,” meaning the full chat completion has been generated. Using Prompt Generator In real-world scenarios, prompts are often more complex than a single sentence. To handle this, we introduced the “Prompt Generator” Snap, which helps create prompts using preset templates such as context Q&A. Let’s explore how to use this feature. 1. Search for “Prompt Generator” in the side panel and drag it to the canvas screen. In this example, we will use the OpenAI Prompt Generator. 2. To simulate mock input, drag the “JSON Generator” Snap to the canvas screen. 3. Connect the two Snaps as shown below: 4. Configure the JSON Generator to output the following data: [ { "occupation": "astronaut", "situation": "On a distant planet, exploring uncharted terrain." } ] The setting should be like the below: 5. Save and close the configuration. 6. Next, configure the Prompt Generator: a. Click on Edit prompt and remove any existing text. b. Paste the following text into the prompt box: “You are a {{occupation}}. {{situation}}. What action do you take?” 7. Save and close the configuration. 8. To test the prompt, validate the pipeline by clicking the validate button . The result will appear in the prompt field, showing how the template text is replaced with values from the input. Example Input: { "occupation": "astronaut", "situation": "On a distant planet, exploring uncharted terrain." } Template: “You are a {{occupation}}. {{situation}}. What action do you take?” Output: “You are a astronaut. On a distant planet, exploring uncharted terrain.. What action do you take?” Example Use Cases Below are some examples of how to use prompts to accomplish real tasks. You can modify these templates to suit your needs. Text Summarization Prompt: Explain AI A: Output: Artificial Intelligence (AI) refers to the simulation of human intelligence processes by computer systems. These processes include learning (the acquisition of information and rules for using it), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI can be classified into several categories: **Narrow AI**: Also known as weak AI, this type of AI is designed for specific tasks. Examples include virtual assistants like Siri and Alexa, recommendation systems, and image recognition software. Narrow AI does not possess general intelligence—it operates within a limited context. **General AI**: Also known as strong AI or AGI (Artificial General Intelligence), this type refers to a hypothetical AI that can understand, learn, and apply knowledge across a wide range of tasks, much like a human. As of now, AGI remains a theoretical concept and has not been achieved. **Machine Learning (ML)**: A subset of AI that involves the development of algorithms that enable computers to learn from and make predictions or decisions based on data. ML can be further divided into supervised learning, unsupervised learning, and reinforcement learning. **Deep Learning**: A specialized form of machine learning that uses neural networks with many layers (hence "deep"), allowing the system to learn from vast amounts of data. It is particularly effective in areas like image and speech recognition. **Natural Language Processing (NLP)**: This area of AI focuses on the interaction between computers and humans through natural language. It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful. AI has numerous applications across various industries, including healthcare (diagnosing diseases), finance (fraud detection), transportation (self-driving cars), and entertainment (recommendation systems). While the potential benefits of AI are vast, there are also concerns related to ethics, privacy, job displacement, and the need for regulation to ensure its responsible use. To summarize the explanation into one sentence: Prompt: Artificial Intelligence (AI) refers to the simulation of human intelligence processes by computer systems. These processes include learning (the acquisition of information and rules for using it), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI can be classified into several categories: **Narrow AI**: Also known as weak AI, this type of AI is designed for specific tasks. Examples include virtual assistants like Siri and Alexa, recommendation systems, and image recognition software. Narrow AI does not possess general intelligence—it operates within a limited context. **General AI**: Also known as strong AI or AGI (Artificial General Intelligence), this type refers to a hypothetical AI that can understand, learn, and apply knowledge across a wide range of tasks, much like a human. As of now, AGI remains a theoretical concept and has not been achieved. **Machine Learning (ML)**: A subset of AI that involves the development of algorithms that enable computers to learn from and make predictions or decisions based on data. ML can be further divided into supervised learning, unsupervised learning, and reinforcement learning. **Deep Learning**: A specialized form of machine learning that uses neural networks with many layers (hence "deep"), allowing the system to learn from vast amounts of data. It is particularly effective in areas like image and speech recognition. **Natural Language Processing (NLP)**: This area of AI focuses on the interaction between computers and humans through natural language. It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful. AI has numerous applications across various industries, including healthcare (diagnosing diseases), finance (fraud detection), transportation (self-driving cars), and entertainment (recommendation systems). While the potential benefits of AI are vast, there are also concerns related to ethics, privacy, job displacement, and the need for regulation to ensure its responsible use. Explain the above in one sentence: Output: Artificial Intelligence (AI) encompasses the simulation of human cognitive processes by computer systems, classified into categories such as Narrow AI for specific tasks, General AI for broad understanding, Machine Learning (ML) for data-driven learning, Deep Learning for processing large datasets with neural networks, and Natural Language Processing (NLP) for human-computer language interaction, with applications across various industries alongside ethical and regulatory concerns. Information Extraction Prompt: Author-contribution statements and acknowledgements in research papers should state clearly and specifically whether, and to what extent, the authors used AI technologies such as ChatGPT in the preparation of their manuscript and analysis. They should also indicate which LLMs were used. This will alert editors and reviewers to scrutinize manuscripts more carefully for potential biases, inaccuracies and improper source crediting. Likewise, scientific journals should be transparent about their use of LLMs, for example when selecting submitted manuscripts. Mention the large language model based product mentioned in the paragraph above: Output: The large language model based product mentioned in the paragraph above is ChatGPT. Question Answering Prompt: Answer the question based on the context below. Keep the answer short and concise. Respond "Unsure about answer" if not sure about the answer. Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use. Question: What was OKT3 originally sourced from? Answer: Output: Mice. Text Classification Prompt: Classify the text into neutral, negative or positive. Text: I think the food was okay. Sentiment: Output: Neutral Conversation Prompt: The following is a conversation with an AI research assistant. The assistant tone is technical and scientific. Human: Hello, who are you? AI: Greeting! I am an AI research assistant. How can I help you today? Human: Can you tell me about the creation of blackholes? AI: Output: Sure! Black holes are regions of spacetime where the gravitational force is so strong that nothing, not even light, can escape from it. They are created when a very massive star dies and its core collapses in on itself, forming a singularity of infinite density. The intense gravity of the singularity pulls in all the matter and radiation around it, creating the black hole. Code Generation Prompt: """ Table departments, columns = [DepartmentId, DepartmentName] Table students, columns = [DepartmentId, StudentId, StudentName] Create a MySQL query for all students in the Computer Science Department """ Output: SELECT StudentId, StudentName FROM students WHERE DepartmentId IN (SELECT DepartmentId FROM departments WHERE DepartmentName = 'Computer Science'); Reasoning Prompt: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even. Output: Odd numbers: 15, 5, 13, 7, 1 Sum: 41 41 is an odd number.1.2KViews4likes1CommentA Comprehensive Guide to Integrate Azure AI Search to Azure OpenAI
The Retrieval-Augmented Generation (RAG) pipeline has gained significant traction in recent years. Large Language Models (LLMs) leverage domain-specific knowledge through the RAG mechanism to generate insightful and contextually relevant responses. Implementing a RAG pipeline requires a well-architected infrastructure, including vector databases and a data ingestion pipeline to efficiently transfer information from source systems to the database. Azure offers Azure AI Search, a fully managed RAG solution that simplifies implementation and reduces operational complexity. As an enterprise-grade information retrieval system, Azure AI Search processes heterogeneous content, indexes data for optimized retrieval, and delivers relevant information through queries and applications, and it is the recommended retrieval system for developing RAG-based applications on Azure. It features native LLM integrations with Azure OpenAI Service and Azure Machine Learning, supports custom model integration mechanisms, and offers multiple relevance-tuning strategies to enhance search effectiveness. To further streamline RAG implementation, SnapLogic facilitates seamless integration between Azure AI Search and Azure OpenAI, enabling organizations to build intelligent and efficient AI-powered applications. The following guideline outlines the steps required to achieve this integration. Basic of Using Azure AI Search The following steps provide a structured approach to setting up and utilizing Azure AI Search for indexing and querying data. Step 1: Set Up Azure AI Search Subscribe to Azure AI Search via the Azure portal. Import Data: Click "Import Data" and select the data source you want to integrate. Create an Index: Define an index on the data source. Ensure the field you want Azure AI Search to process has the searchable attribute enabled. Configure the Indexer: Complete the setup wizard to create an indexer. Once the indexer is created, your Azure AI Search instance is ready to use. Step 2: Configure the Azure Data Source AI Snap To enable seamless integration between Azure AI Search and Azure OpenAI, follow these steps to configure the Azure Data Source AI Snap in Snaplogic Designer: Configure the Snap Account Set up the Azure Data Source AI Snap by configuring the associated account. Provide the AI Search Endpoint Enter the Azure AI Search endpoint to the snap settings. Specify the Index Name Define the index name that will be used for searching. Field Mapping (Optional) An array of values. A comma-separated string. Users can optionally provide a field mapping configuration to define relationships between different fields. Specify which field represents the title and which field contains the content. The content field can be either: The title field must be a string. Set the Query Type (Optional) The default query type is set to "simple". Users can modify this setting as needed or retain the default value. Connect to Azure Chat Completion Once configured, connect the Data Source AI Search Snap to Azure ChatCompletion to enable intelligent query responses using Azure OpenAI. Step 3: Configure Azure OpenAI Chat Completion To complete the integration and enable AI-powered responses, follow these steps to configure Azure OpenAI Chat Completion: Configure the Data Source Field Use the output of the Azure Data Source AI Search Snap as the input for Azure OpenAI Chat Completion. Provide the Prompt Define the prompt that you want to use for querying the AI model. The prompt should be designed to leverage the retrieved data effectively. Execute the Pipeline Run the pipeline to process the query. The ChatCompletion Snap will generate responses based on Azure AI Search results. The output will include a "citations" field, indicating the source of the retrieved information. Using Vector Queries in Azure AI Search Step 1: Set Up Azure AI Search Subscribe to Azure AI Search Access the Azure Portal and create an Azure AI Search service. Import and Vectorize Data Click "Import Data and Vectorize Data" and select the data source to be integrated. Embed Data into Vectors (if applicable) To enable vector search, data must be converted into vector embeddings using an embedding model. If your dataset already contains vectorized data, you can integrate it directly without re-vectorizing. Verify Index Creation After completing the index setup, the vector field will be visible in the index schema. Step 2: Configure the Azure Data Source AI Snap To enable seamless integration between Azure AI Search and Azure OpenAI, configure the Azure Data Source AI Snap by following these steps: Configure the Snap Account Set up the Azure Data Source AI Snap by configuring the associated SnapLogic account. Provide the Azure AI Search Endpoint Enter the Azure AI Search endpoint to establish the connection. Specify the Index Name Define the index name that will be used for vector-based searching. Configure Field Mapping (Optional) Users can define field mappings to specify relationships between different fields. Assign a title field and a content field: The content field can be: A list (array) of values. A comma-separated string. The title field must be a string. For vector-based queries, specify the vector field to inform Azure AI Search which field to use for vector comparisons. The vector field can be: A string. A list of strings. A comma-separated string. Set the Query Type Specify the query type as "vector" to enable vector-based searches. Connect to Azure Chat Completion Once configured, connect the Azure Data Source AI Search Snap to Azure ChatCompletion to enable AI-powered responses using Azure OpenAI. Step 3: Configure Azure OpenAI Chat Completion To complete the integration and enable AI-powered responses, follow these steps to configure Azure OpenAI Chat Completion: Configure the Data Source Field Use the output of the Azure Data Source AI Search Snap as the input for Azure OpenAI Chat Completion. Set Up the Embedding Model Dependency Ensure that the same embedding model used to vectorize the data is referenced in Azure OpenAI. This step is crucial for accurate vector similarity comparisons and retrieval performance. Provide the Prompt Define the prompt that will be used for querying the AI model. Ensure the prompt is structured to effectively leverage retrieved vector-based data for optimal AI responses. Using Semantic Queries in Azure AI Search Step 1: Set Up Azure AI Search Access Azure AI Search Navigate to the Azure AI Search service in the Azure Portal. Select the Index Choose the index you want to use for semantic search. Create a Semantic Configuration Define a new semantic configuration for the selected index. Configure Semantic Fields Specify the required fields: Title Field – Represents the document title. Content Field – Contains the main body of the document. Keywords Field – Includes key terms for enhanced semantic matching. Save the Configuration Once all fields are assigned, save the configuration. Your index is now ready for semantic search. Step 2: Configure the Azure Data Source AI Search Snap Change the Query Type Set the query type to "semantic" to enable semantic search capabilities. Specify the Semantic Configuration Enter the semantic configuration name created in Azure AI Search. Connect to Azure OpenAI Chat Completion Link the Azure Data Source AI Search Snap to Azure OpenAI ChatCompletion. This integration allows semantic search to enhance the accuracy and relevance of AI-generated responses. Customizing Search Results in Azure AI Search To further refine and enhance search accuracy and relevance, Azure AI Search allows users to customize their search queries with hybrid query types and filters. Hybrid Query Types Azure AI Search supports hybrid search, which allows combining different query types to improve search results: Hybrid of Vector and Simple Queries This combines vector-based similarity with traditional keyword-based search, ensuring both semantic relevance and text-based keyword matching. Hybrid of Vector and Semantic Queries This approach enhances vector similarity search with semantic ranking, enabling context-aware results with better relevance scoring. To enable hybrid search: Set the query type to either: "vector_simple_hybrid" "vector_semantic_hybrid" This ensures search results are a blend of the two selected query types. Applying Search Filters Filters help narrow down search results to match specific conditions or constraints. Steps to Apply Filters: Define a Filter Condition Use filters to restrict results based on specific criteria, such as date ranges, categories, or custom attributes. Please refer the filter syntax to Azure AI Search document (https://learn.microsoft.com/en-us/azure/search/search-filters) Ensure Index Fields are Filterable Filters only work if the index fields have the filterable attribute enabled. Before applying filters, verify that the selected index supports filtering. Integrate the Filter in Your Query Apply custom filters to refine search results based on your requirements. Conclusion Integrating Azure AI Search with Azure OpenAI unlocks powerful capabilities for retrieval-augmented generation (RAG), enabling organizations to build intelligent, AI-powered applications with enhanced search functionality. By leveraging vector, semantic, and hybrid search queries, businesses can optimize information retrieval and improve the relevance of AI-generated responses. This guide has outlined the key steps to: Set up Azure AI Search, including configuring vector and semantic search. Integrate Azure Data Source AI Search Snap, enabling seamless data retrieval. Configure Azure OpenAI Chat Completion, ensuring AI-generated responses are contextually aware and accurate. Customize search results using hybrid search queries and filtering mechanisms to refine and enhance query outcomes. By following these steps, organizations can maximize the effectiveness of Azure AI Search and OpenAI, improving search relevance, accuracy, and AI-driven insights for a wide range of applications. With scalability, flexibility, and advanced AI integration, this solution is ideal for businesses looking to deploy cutting-edge enterprise search and AI-driven automation.1.4KViews1like0CommentsA Comparison of Assistant and Non-Assistant Tool Calling Pipelines
Introduction At a high level, the logic behind assistant tool calling and non-assistant tool calling is fundamentally the same: the model instructs the user to call specific function(s) in order to answer the user's query. The user then executes the function and returns the result to the model, which uses it to generate an answer. This process is identical for both. However, since the assistant specifies the function definitions and access to tools as part of the Assistant configuration within the OpenAI or Azure OpenAI dashboard rather than within your pipelines, there will be major differences in the pipeline configuration. Additionally submitting tool responses to an Assistant comes with significant changes and challenges since the Assistant owns the conversational history rather than the pipeline. This article focuses on contrasting these differences. For a detailed understanding of assistant pipelines and non-assistant pipelines, please refer to the following article: Non-assistant pipelines: Introducing Tool Calling Snaps and LLM Agent Pipelines Assistant pipelines: Introducing Assistant Tool Calling Pipelines Part 1: Which System to Use: Non-Assistant or Assistant? When to Use Non-Assistant Tool Calling Pipelines: Non-Assistant Tool Calling Pipelines offer greater flexibility and control over the tool calling process, making them suitable for the following specific scenarios. When preferring a “run-time“ approach: Non-Assistant pipelines exhibit greater flexibility in function definition, offering a more "runtime" approach. You can dynamically adjust the available functions by simply adding or removing Function Generator snaps within the pipeline. In contrast, Assistant Tool Calling Pipelines necessitate a "design-time" approach. All available functions must be pre-defined within the Assistant configuration, requiring modifications to the Assistant definition in the OpenAI/Azure OpenAI dashboard. When wanting detailed chat history: Non-Assistant pipelines provide a comprehensive history of the interaction between the model and the tools in the output message list. The message list within the Non-Assistant pipeline preserves every model response and the results of each function execution. This detailed logging allows for thorough debugging, analysis, and auditing of the tool calling process. In contrast, Assistant pipelines maintain a more concise message history, focusing on key steps and omitting some intermediate details. While this can simplify the overall view of the message list, it can also make it more difficult to trace the exact sequence of events or diagnose issues that may arise during tool execution in child pipelines. When needing easier debugging and iterative development: Non-Assistant pipelines facilitate more granular debugging and iterative development. You can easily simulate individual steps of the agent by making calls to the model with specific function call histories. This allows for more precise control and experimentation during development, enabling you to isolate and address issues more effectively. For example, by providing three messages, we can "force" the model to call the second tool, allowing us to inspect the tool calling process and its result against our expectations. In contrast, debugging and iterating with Assistant pipelines can be more cumbersome. Since Assistants manage the conversation history internally, to simulate a specific step, you often need to replay the entire interaction from the beginning, potentially requiring multiple iterations to reach the desired state. This internal management of history makes it less straightforward to isolate and debug specific parts of the interaction. To simulate calling the third tool, we need to start a new thread from scratch and then call tool1 and tool2, repeating the preceding process. The current thread cannot be reused. When to Use Assistant Tool Calling Pipelines: Assistant Tool Calling Pipelines also offer a streamlined approach to integrating LLMs with external tools, prioritizing ease of use and built-in functionalities. Consider using Assistant pipelines in the following situations: For simplified pipeline design: Assistant pipelines reduce pipeline complexity by eliminating the need for Tool Generator snaps. In Non-Assistant pipelines, these snaps are essential for dynamically generating tool definitions within the pipeline itself. With Assistant pipelines, tool definitions are configured beforehand within the Assistant settings in the OpenAI/Azure OpenAI dashboard. This pre-configuration results in shorter, more manageable pipelines, simplifying development and maintenance. When leveraging built-in tools is required: If your use case requires functionalities like searching external files or executing code, Assistant pipelines offer these capabilities out-of-the-box through their built-in File Search and Code Interpreter tools (see Part 5 for more details). These tools provide a convenient and efficient way to extend the LLM's capabilities without requiring custom implementation within the pipeline. Part 2: A brief introduction to two pipelines Non-assistant tool calling pipelines Key points: Functions are defined in the worker. The worker pipeline's Tool Calling snap manages all model interactions. Function results are collected and sent to the model in the next iteration via the Tool Calling snap. Assistant tool calling pipelines Key points: No need to define functions in any pipeline. Functions are pre-defined in the assistant. Two snaps : interact with the model: Create and Run Thread, and Submit Tool Outputs. Function results are collected and sent to the model immediately during the current iteration. Part 3: Comparison between two pipelines Here are two primary reasons why the assistant and non-assistant pipelines differ, listed in decreasing order of importance: Distinct methods of submitting tool results: For non-assistant pipelines, tool results are appended to the message history list and subsequently forwarded to the model during the next iteration. Non-assistant pipelines exhibit a "while-loop" behavior, where the worker interacts with the model at the beginning of the iteration, and while any tools need to be called, the worker executes those tool(s). In contrast, for assistants, tool results are specifically sent to a dedicated endpoint designed to handle tool call results within the current iteration. The assistant pipelines operate more like a "do-while-loop." The driver initiates the interaction by sending the prompt to the model. Subsequently, the worker execute the tool(s) first and interacts with the model at the end of the iteration to deliver tool results. Predefined and stored tool definitions for assistants: Unlike non-assistant pipelines, assistants have the capability to predefine and store function definitions. This eliminates the need for the three Function Generator snaps to repeatedly transmit tool definitions to the model with each request. Consequently, the worker pipeline for assistants appears shorter. Due to the aforementioned differences, non-assistant pipelines have only one interaction point with the model, located in the worker. In contrast, assistant pipelines involve two interaction points: the driver sends the initial prompt to the model, while the worker sends tool results back to the model. Part 4: Differences in snap settings Stop condition of Pipeloop A key difference in snap settings lies in the stop condition of the pipeloop. Assistant pipeline’s stop condition: $run.required_action == null . Non-assistant pipeline’s stop condition: $finish_reason != "tool_calls" . Assistant’s output Example when tool calls are required: Example when tool calls are NOT required: Non-assistant’s output Example when tool calls are required: Example when tool calls are NOT required: Part 5: Assistant’s two built-in tools The assistant not only supports all functions that can be defined in non-assistant pipelines but also provides two special built-in functions, file search and code interpreter, for user convenience. If the model determines that either of these tools is required, it will automatically call and execute the tool within the assistant without requiring manual user intervention. You don't need a tool call pipeline to experiment with file search and code interpreter. A simple create and run thread snap is sufficient. File search File Search augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. OpenAI automatically parses and chunks your documents, creates and stores the embeddings, and use both vector and keyword search to retrieve relevant content to answer user queries. Example Prompt: What is the number of federal fires between 2018 and 2022? The assistant’s response is as below: The assistant’s response is correct. As the answer to the prompt is in the first row of a table on the first page of wildfire_stats.pdf, a document accessible to the assistant via a vector store. Answer to the prompt: The file is stored in a vector store used by the assistant: Code Interpreter Code Interpreter allows Assistants to write and run Python code in a sandboxed execution environment. This tool can process files with diverse data and formatting, and generate files with data and images of graphs. Code Interpreter allows your Assistant to run code iteratively to solve challenging code and math problems. When your Assistant writes code that fails to run, it can iterate on this code by attempting to run different code until the code execution succeeds. Example Prompt: Find the number of federal fires between 2018 and 2022 and use Matplotlib to draw a line chart. * Matplotlib is a python library for creating plots. The assistant’s response is as below: From the response, we can see that the assistant indicated it used file search to find 5 years of data and then generated an image file. This file can be downloaded from the assistant's dashboard under storage-files. Simply add a file extension like .png to see the image. Image file generated by assistant: Part 6: Key Differences Summarized Feature Non-Assistant Tool Calling Pipelines Assistant Tool Calling Pipelines Function Definition Defined within the worker pipeline using Function Generator snaps. Pre-defined and stored within the Assistant configuration in the OpenAI/Azure OpenAI dashboard. Tool Result Submission Appended to the message history and sent to the model in the next iteration. Sent to a dedicated endpoint within the current iteration. Model Interaction Points One (in the worker pipeline). Two (driver sends initial prompt, worker sends tool results). Built-in Tools None. File Search and Code Interpreter. Pipeline Complexity More complex pipeline structure due to function definition within the pipeline. Simpler pipeline structure as functions are defined externally.786Views4likes0CommentsIntroducing Tool Calling Snaps and LLM Agent Pipelines
Introduction In this article, we will be introducing the following. Part 1: Four new classes of snaps for LLM function calling: Function Generator, Tool Calling, Function Result Generator, and Message Appender, which have been developed specifically for tool calling. Part 2: The Function Calling pipeline to demonstrate how the new Function calling snaps work together to perform LLM function calling. Part 3: Using PipeLoop snap to orchestrate Agent pipelines: iteratively call the Function Calling pipeline until the model generates a final result or meets other termination conditions to perform agentic workflows. Part 1: Introducing 4 new snap classes for tool calling Function Generator Snap: create a function definition. Tool Calling Snap: sends tool calling request to the model to retrieve LLM model response. Function Result Snap: formats the result of tool run to be sent back to the LLM. Message Appender Snap: append the tool results to the messages payload array. Function Generator Snap The Function Generator Snap facilitates the creation of a Tool definition, enabling the model to understand and utilize the available tools. Sample Output: Tool Calling Snap The Tool Calling Snap forwards user input and tool specifications to the model's API, receiving the model's generated output in return. This snap has 2 output views: The first view outputs the full response from the model the complete message payload, including the model's current response The second view outputs the list of tools to call In the OpenAI and Azure OpenAI Tool Calling Snap, a JSON argument field is added by SnapLogic, whose value is a JSON object derived from converting the string-formatted argument of the model's response tool call. Sample Input: Sample Output - LLM Response View: Sample Output - Tool Calls View: Function Result Generator Snap The Function Result Generator Snap formats the results generated by user-invoked functions into a custom data output structure defined within SnapLogic. Different models have different requirements for the data type of the Content field. For example, Bedrock Converse requires Content to be a string or a json , OpenAI requires Content to be string only. The Snap will stringify the content from the user if the format in the field is not supported. Sample Input: Sample Output: Message Appender Snap The message appender snap adds the results of tool runs to the message list, serving as input for subsequent tool calls. Sample Input - First Input View - Messages Sample Input - Second Input View - Tool Result Sample Output By leveraging the four new Snaps, we will be able to create pipelines that are capable of LLM function calling, which we will refer as Function Calling Pipelines. Part 2: Function Calling Pipeline Example This pipeline demonstrates how to use the new snaps to perform LLM function calling. Setup We will be using the following four snaps for LLM Function calling: Function Generator Snap Tool Calling Snap Function Result Generator Snap Message Appender Snap The function calling pipeline incorporates two tools as pipelines: get_current_weather (Using PipeExec) This pipeline retrieves weather information for a given location. Pipeline setup An HTTP Client that connects to the weatherapi endpoint A mapper that passes the JSON output to the content field foo_tool A toy tool that outputs “foo“ as the result, used to demonstrate multi-tool calling capabilities. Pipeline setup A mapper that outputs “foo” in the content output Execution Flow The execution flow of this pipeline follows the following steps: The user provides the prompt (wrapped in a messages payload) in a JSON Generator Snap, creates tool definitions using the Function Generator Snap, which is then sent to the LLM through the Tool Calling Snap. The Chat completions view of the Tool Calling Snap outputs the response from the LLM and adds the current response from the LLM into the messages payload, and is connected to the first input of the Message Appender Snap for processing, the Tool calls view is connected to a router to pass tool calls to the individual tools. The tools are invoked, then results are formatted by the Function Result Generator Snap The Message Appender Snap collects and appends all tool invocation results to the messages array from the Chat completions view output from the Tool Calling Snap and outputs the modified messages array. The output of the Message Appender contains the message history from the User prompt, LLM model respones, and the tool calling output, which marks the end of this round of tool calling. Part 3: Agent Pipelines To orchestrate LLM function calling pipelines or LLM Agent Pipelines, we introduce two patterns as pipelines to enable this functionality. Agent Driver Pipeline The Agent Driver Pipeline Leverages PipeLoop Snap to allow iterative executions on a single pipeline. The prompt input is defined then sent to the Agent Worker Pipeline (a Function Calling pipeline). The output of the Function Calling pipeline is then collected and sent again as the second iteration input of the Function Calling pipeline, the iteration will continue until the stop condition set in PipeLoop is reached or when the iteration limit is reached. Agent Worker Pipeline The Agent Worker Pipeline is similar to a Function Calling pipeline, the only difference is that the message payload is sent from the Agent Driver Pipeline through PipeLoop Snap instead of a JSON Generator snap. Agent Pipeline Example - get_weather This example demonstrates a weather checking assistant. This agent is equipped with a single tool - get_weather, which retrieves the weather data of a given location. Agent Driver Pipeline In this example, the user will provide a payload like below, which is to ask about the weather of a given location. (Which is mocked using a JSON generator Snap) { "prompt": "What's the weather in San Francisco?" } The system prompt for this weather assistant is then defined in the first Prompt Generator "You are a helpful weather assistant that will answer questions about the weather of a given location. You will be assigned with a tool to check the weather of the location." The user prompt for this case is simply the prompt payload from the user, which we will pass to the Agent Worker Pipeline through the PipeLoop Snap. We will stop the PipeLoop Execution when the finish reason of the LLM is stop or end_turn (depending on the LLM model) Agent Worker Pipeline In the Agent Worker Pipeline, the flow follows the following steps First Iteration: Create function definitions for the tools to be called. In this case, the get_weather function. Pass the message payload (system and user prompts), and tools payload (function definitions) to the Tool Calling Snap. The Tool Calling Snap will then decide to either call a tool or generate a result. In the first case, it will return a tool call decision for the pipeline to process. [ { "toolUse": { "toolUseId": "tooluse_YOLmGccxRGWPmCKqxAKvgw", "name": "get_current_weather", "input": { "location": "San Francisco, CA", "unit": "fahrenheit" } } } ] The Worker pipeline will then diverge into two branches. The first branch will pass the messages payload of this round to the Message Appender Snap, and the second branch will pass the tool call request to the tool to invoke a call and get the weather of San Francisco. The result of the tool call will be collected and formatted by the Function Result Generator Snap, then passed to the Message Appender Snap so that the the Tool Call result can be added into the Message Payload. For this round, the finish reason of the LLM is tool_use , which means the execution should continue, and the output of the Message Appender will be sent directly to the input of the Agent Worker Pipeline. Message Appender Output Second Iteration: The updated message payload is then sent again with the function definitions to the Tool Calling Snap, the Tool Calling Snap for this round will then generate a result since it has retrieved the weather of San Francisco. The Tool Call output of the Tool Calling Snap will be empty for this round since no tool calls are required for this iteration. The message payload is sent directly to the Message Appender Snap, and the finish reason of the LLM is end_turn , which means the LLM has successfully carried out the request. PipeLoop execution will stop and the result will be sent to the output of the PipeLoop Snap in the Agent Driver Pipeline. And the execution is finished. Summary In this article, we have introduced the new Snaps for Tool calling - Function Generator, Tool Calling, Function Result Generator, and Message Appender. We have also talked about how to create tool-calling pipelines and Agent Pipeline patterns. Happy building!1.1KViews2likes0CommentsIntroducing Assistant Tool Calling Pipelines
Introduction OpenAI and Azure OpenAI assistants can invoke models and utilize tools to accomplish tasks. This article primarily focuses on constructing pipelines to leverage the tool-calling capabilities of an existing assistant. Given the substantial similarity in assistant tool calling between OpenAI and Azure versions, the examples provided in this article are applicable to both platforms. In part 1, we'll provide a simple introduction to creating an assistant in OpenAI Dashboard and adding user-defined tools for subsequent pipeline use. We'll provide all the necessary data and files. In part 2, we'll demonstrate two questions and their corresponding assistant responses to illustrate the types of tools the assistant can call, or requires users to call, upon to answer queries. In part 3, we’ll introduce two new snaps: tool call router and submit tool outputs , along with upgrades to the existing two snaps: run thread and create and run thread . In part 4, we'll delve into the pipeline workflow and the specific configurations required for setting up snaps. Part 1: Prerequisite - Set Up An Assistant in OpenAI Dashboard OpenAI and Azure OpenAI assistants manage the system prompt, the model used to generate response, tools (including file search, code interpreter, and other user-defined tools), and model configuration such as temperature and response format. Here we will only introduce the most basic settings, and you can adjust them according to your needs. Please refer to OpenAI and Azure OpenAI documentations for more information. Navigate to the OpenAI Dashboard: Go to the OpenAI dashboard - assistants and click the " Create " button in the top right corner to initiate the process of creating a new assistant. Name Your Assistant: Provide a name for your new assistant. You can choose any name you prefer, such as " Test Assistant ". System Instruction (Optional): You can optionally provide a system instruction to guide the assistant's behavior. For now, let's skip this step. Select a Model: Choose the model you want to use for your assistant. In this case, we'll select " gpt-4o-mini ". Enable Tools: Enable the " file search " File search is an OpenAI-provided managed RAG service. Using this tool allows the model to retrieve information relevant to the query from the vector store and use it to answer. In this case, please create a new vector store, upload the wildfire_stats.pdf file to the vector store, and add the vector store to the assistant. Enable the" code interpreter " tools The code interpreter is also a built-in tool within the OpenAI assistant. It can run the code produced by the model directly and provide the output. Create three custom functions with the following schema: By providing these definitions, we are enabling the model to identify which user-defined functions it can call. While the model can suggest the necessary function, the responsibility of executing the function lies with the user. Function definition: get_weather Function definition: get_wiki_url Function definition: get_webpage In this way, we've successfully created the assistant we'll be using. It should look similar to the image below. Now you can directly go to the playground and ask some questions to see how the assistant responds. Up to this point, you should have created an assistant with three user-defined functions. The file search tool should have access to a vector store that contains a file. Part 2: Two Examples of Assistant Tool Calling To help you understand how the assistant works, we will use the following pipeline to ask the newly created assistants two questions in this section and examine their responses. You can find the construction details for this pipeline in part 4. For now, let's focus on the pipeline's execution results. Pipeline Overview The Driver Pipeline The Worker Pipeline Prompt One Our first question to the assistant is: "What is the weather and the wiki url of San Francisco? And what is the content of the wiki page?" Through this query, we're evaluating the assistant's capability to: 1) identify the necessary tools for a task - in this case, all three: get_weather, get_wiki_url, and get_webpage should be called; 2) understand the sequential dependencies between tools. For example, the assistant should recognize that get_wiki_url must be called before get_webpage to acquire the necessary URL. As shown below, the model's response is both reasonable and correct. Prompt Two Our second question to the assistant is: What is the number of federal fires from 2018 to 2022, and can you write a Python code to sort the years based on the number of fires in ascending order and tell me the weather in San Francisco? The question might seem a bit odd on its own, but our goal is to evaluate how the assistant handles built-in tools such as file search and code interpreter. Specifically, we want to determine if it can effectively combine these built-in tools with user-defined functions in providing an answer. To answer this question, the model needs to first invoke the file search tool to retrieve the first row of data from the first table on the first page of the Wildfire PDF. Then, it generates a Python code snippet for sorting and calls the second tool, the code interpreter, to execute this code. Finally, it calls the third user-defined tool, get_weather, to obtain the weather in San Francisco. Expected Data in Wildfire PDF: As shown below, the model responses as expected. Up to this point, you should understand that the assistant could utilize three different categories of tools to answer user questions. Part 3: Introduction of New Snaps We'll start by focusing on the new elements of the pipeline: two newly introduced snaps and the added attributes to the existing ones, before delving into the overall pipeline details. 1. Tool Call Router (new) The tool call router snap simplifies the assistant's response (the run object) for easier downstream processing. It combines the functionalities of copy , mapper , and JSON splitter . The first output view contains: the original assistant's response an empty list named tool_outputs to collect the results of all function executions in the subsequent message appender snap. The second output view provides a list of tools to call, extracted from the required actions section of the assistant's response 2. Submit Tool Outputs (new) This snap submits a list of function execution results to the assistant. The assistant will then generate the final response or request further tool calls. 3. Create and Run Thread (upgraded) We've added a new section to the Create And Run Thread configuration to specify detailed parameters for tool calls. The Tool choice option allows you to instruct the assistant to: automatically select tools ( AUTO ) use no tools ( NONE ) require at least one tool ( REQUIRED ) use a specific user-defined tool ( SPECIFY A FUNCTION , providing the function name ). The Parallel tool call option determines whether the assistant can call multiple tools simultaneously. 4. Run Thread (upgraded) Same configuration is added to the Run Thread snap as well. Part 4: Hands-on Pipeline Construction Pipeline workflow overview There are a total of 5 pipelines. Driver pipeline : Sends the initial prompt to the assistant. Receives a response containing tool call requests. Passes the response to the "pipeloop" snap to trigger the worker pipelines to execute the tools. Worker pipeline: Executes the function calls specified in the tool call requests. Collects the results of the function calls. Sends the results back to the assistant. This pipeline is executed repeatedly until there are no more tools to call. get_weather pipeline: Takes a city name as input. Queries a weather API to get the current weather for the specified city. Outputs the retrieved weather information. get_wiki_url pipeline: Takes a city name as input. Searches for the Wikipedia page URL for the specified city. Outputs the found URL. get_webpage pipeline: Takes a URL as input Fetch the webpage by visiting the URL Use a model to summarize the content of the webpage Outputs the summary The Driver Pipeline The driver pipeline can be constructed in two ways: either using a combined "create and run" operation or by performing the creation and running steps sequentially. Both methods achieve the same result in this scenario. The Worker Pipeline The get_weather Pipeline You can get a free API key by signing up on Free Weather API - WeatherAPI.com. The get_wiki_url Pipeline The get_webpage Pipeline Get Client: Access the webpage pointed to by the URL and retrieve the HTML content. HTML Parser: Parse the HTML content into text format. Summarize: Generate a user prompt and concatenate it with the webpage text. OpenAI Summarize: Use the model to generate a summary of the webpage content. Input and output of key snaps We'll illustrate the essential inputs and outputs of the intermediate process through a single tool call interaction. 1. Create and Run Thread This snap forwards the user's initial prompt to the assistant and returns a run object. The highlight of this run object is the required action , which outlines the necessary tool calls. Output of Create and Run Thread - a run object 2. Tool Call Router It's important to note that the first output view not only holds the assistant's response but also an empty "tool_outputs" list. This list serves as a container for storing function results as they are gathered in subsequent message appenders. Tool Call Router - 1st output view The second output view extracts the tool calls from the required actions and converts the argument values into JSON format, storing them in json_arguments . This eliminates the need for subsequent argument conversion by each tool. Tool Call Router - 2nd output view 3. Pipeline Execute Snap - Get Weather Function Get Weather Function - Input Get Weather Function - Output The tool's output provides a full HTTP response, however, we're solely interested in the "entity" content which will serve as the tool's output. This extraction will occur in the subsequent snap, "Function Result Generator". 4. Pipeline Execute Snap - Get Wiki URL Function Get Wiki URL Function - Input Get Wiki URL Function - Output The tool's output provides a full HTTP response, however, we're solely interested in the "entity" content which will serve as the tool's output. This extraction will occur in the subsequent snap, "Function Result Generator". 5. Message Appender The Message Appender’s output contains a run object from upstream, however, we're solely interested in the tool_outputs field which is a list of function results. Thus in the subsequent snap, "Submit Tool Outputs", we will only use the tool_outputs field. Message Appender - Output 6. Submit Tool Outputs This snap forwards function results to the assistant and receives a run object as a response. This object can either provide the final answer or dictate subsequent tool calls. In this example, the assistant's output specifies the next tool to be called, as indicated by the "required action". Submit Tool Outputs - Output - subsequent tool calls example In the following example, the assistant outputs the final result. There's an extra message list in the output which contains the result itself as well as the original user prompt. Submit Tool Outputs - Output - final answer example Snap settings This article particularly emphasizes the loop condition settings in the pipeloop . We've configured the loop to terminate when the assistant's response indicates no further tool calls are required (i.e., " required_action " is null). This is because if there's no need for additional tool calls, there's no reason to continue executing the worker using Pipeloop. Edge Case - When no tool call is needed The previous driver pipeline had a limitation: it couldn't handle cases where the model could directly answer the user's query without calling any user-defined functions. This was because the output of Create and Run Thread wouldn't contain the required_action field. Since the pipeloop snap follows a do-while logic, it would always run at least once before checking the stop condition. Consequently, when the assistant didn't require a tool call, submitting the tool call output to the assistant in the worker pipeline would result in an error. The following driver pipeline offers a simple solution to this problem by using a router to bypass the pipeloop for requests that can be answered directly.878Views2likes0CommentsMultimodal Processing in LLM
Multimodal processing in Generative AI represents a transformative leap in how AI systems extract and synthesize information from multiple data types—such as text, images, audio, and video—simultaneously. Unlike traditional single-modality AI models, which focus on one type of data, Multimodal systems integrate and process diverse data streams in parallel, creating a holistic understanding of complex scenarios. This integrated approach is critical for applications that require not just isolated insights from one modality, but a coherent synthesis across different data sources, leading to outputs that are contextually richer and more accurate. Generative AI, with multimodal processing, is redefining text extraction, surpassing traditional OCR by interpreting text within its visual and contextual environment. Unlike OCR, which only converts images to text, generative AI analyzes the surrounding image context, layout, and meaning, enhancing accuracy and depth. For instance, in complex documents, it can differentiate between headings, body text, and annotations, structuring information more intelligently. Additionally, it excels in low-quality or multilingual texts, making it invaluable in industries requiring precision and nuanced interpretation. In video analysis, a generative AI equipped with Multimodal processing can simultaneously interpret the visual elements of a scene, the audio (such as dialogue or background sounds), and any associated text (like subtitles or metadata). This allows the AI to produce a description or summary of the scene that is far more nuanced than what could be achieved by analyzing the video or audio alone. The interplay between these modalities ensures that the generated description reflects not only the visual and auditory content but also the deeper context and meaning derived from their combination. In tasks such as image captioning, Multimodal AI systems go beyond simply recognizing objects in a photo. They can interpret the semantic relationship between the image and accompanying text, enhancing the relevance and specificity of the generated captions. This capability is particularly useful in fields where the context provided by one modality significantly influences the interpretation of another, such as in journalism, where images and written reports must align meaningfully, or in education, where visual aids are integrated with instructional text. Multimodal processing enables AI to synthesize medical images (such as X-rays or MRIs) with patient history, clinical notes, and even live doctor-patient interactions in highly specialized applications like medical diagnostics. This comprehensive analysis allows the AI to provide more accurate diagnoses and treatment recommendations, addressing the complex interplay of symptoms, historical data, and visual diagnostics. Similarly, in customer service, Multimodal AI systems can improve communication quality by analyzing the textual content of a customer's inquiry and the tone and sentiment of their voice, leading to more empathetic and effective responses. Beyond individual use cases, Multimodal processing plays a crucial role in improving the learning and generalization capabilities of AI models. By training on a broader spectrum of data types, AI systems develop more robust, flexible models that can adapt to a wider variety of tasks and scenarios. This is especially important in real-world environments where data is often heterogeneous and requires cross-modal understanding to interpret fully. As Multimodal processing technologies continue to advance, they promise to unlock new capabilities across diverse sectors. In entertainment, Multimodal AI could enhance interactive media experiences by seamlessly integrating voice, visuals, and narrative elements. In education, it could revolutionize personalized learning by adapting content delivery to different sensory inputs. In healthcare, the fusion of Multimodal data could lead to breakthroughs in precision medicine. Ultimately, the ability to understand and generate contextually rich, Multimodal content positions Generative AI as a cornerstone technology in the next wave of AI-driven innovation. Multimodal Content Generator Snap The Multimodal Content Generator Snap encodes file or document inputs into the Snap's multimodal content format, preparing it for seamless integration. The output from this Snap must be connected to the Prompt Generator Snap to complete and format the message payload for further processing. This streamlined setup enables efficient multimodal content handling within the Snap ecosystem. The Snap Properties Type - Select the type of multimodal content. Content Type - Define the specific content type for data transmitted to the LLM. Content - Specify the content path to the multimodal content data for processing. Document Name - Name the document for reference and identification purposes. Aggregate Input - Enable this option to combine all inputs into a single content. Encode Base64 - Enable this option to convert the text input into Base64 encoding. Note: The Content property appears only if the input view is of the document type. The value assigned to Content must be in Base64 format for document inputs, while Snap will automatically use binary as content for binary input types. The Document Name can be set specifically for multimodal document types. The Encode Base64 property encodes text input into Base64 by default. If unchecked, the content will be passed through without encoding. Designing a Multimodal Prompt Workflow In this process, we will integrate multiple Snaps to create a seamless workflow for multimodal content generation and prompt delivery. By connecting the Multimodal Content Generator Snap to the Prompt Generator Snap, we configure it to handle multimodal content. The finalized message payload will then be sent to Claude by Anthropic Claude on AWS Messages. Steps: 1. Add the File Reader Snap: Drag and drop the File Reader Snap onto the designer canvas. Configure the File Reader Snap by accessing its settings panel, then select a file containing images (e.g., a PDF file). Download the sample image files at the bottom of this post if you have not already. Sample image file (Japan_flowers.jpg) 2. Add the Multimodal Content Generator Snap: Drag and drop the Multimodal Content Generator Snap onto the designer and connect it to the File Reader Snap. Open its settings panel, select the file type, and specify the appropriate content type. Here's a refined description of the output attributes from the Multimodal Content Generator: sl_content: Contains the actual content encoded in Base64 format. sl_contentType: Indicates the content type of the data. This is either selected from the configuration or, if the input is a binary, it extracts the contentType from the binary header. sl_type: Specifies the content type as defined in the Snap settings; in this case, it will display "image." 3. Add the Prompt Generator Snap: Add the Prompt Generator Snap to the designer and link it to the Multimodal Content Generator Snap. In the settings panel, enable the Advanced Prompt Output checkbox and configure the Content property to use the input from the Multimodal Content Generator Snap. Click “Edit Prompt” and input your instructions 4. Add and Configure the LLM Snap: Add the Anthropic Claude on AWS Message API Snap as the LLM. Connect this Snap to the Prompt Generator Snap. In the settings, select a model that supports multimodal content. Enable the Use Message Payload checkbox and input the message payload in the Message Payload field. 5. Verify the Result: Review the output from the LLM Snap to ensure the multimodal content has been processed correctly. Validate that the generated response aligns with the expected content and format requirements. If adjustments are needed, revisit the settings in previous Snaps to refine the configuration. Multimodal Models for Advanced Data Extraction Multimodal models are redefining data extraction by advancing beyond traditional OCR capabilities. Unlike OCR, which primarily converts images to text, these models directly analyze and interpret content within PDFs and images, capturing complex contextual information such as layout, formatting, and semantic relationships that OCR alone cannot achieve. By understanding both textual and visual structures, multimodal AI can manage intricate documents, including tables, forms, and embedded graphics, without requiring separate OCR processes. This approach not only enhances accuracy but also optimizes workflows by reducing dependency on traditional OCR tools. In today’s data-rich environment, information is often presented in varied formats, making the ability to analyze and derive insights from diverse data sources essential. Imagine managing a collection of invoices saved as PDFs or photos from scanners and smartphones, where a streamlined approach is needed to interpret their contents. Multimodal large language models (LLMs) excel in these scenarios, enabling seamless extraction of information across file types. These models support tasks such as automatically identifying key details, generating comprehensive summaries, and analyzing trends within invoices whether from scanned documents or images. Here’s a step-by-step guide to implementing this functionality within SnapLogic. Sample invoice files (download the files at the bottom of this post if you have not already) Invoice1.pdf Invoice2.pdf Invoice3.jpeg (Sometimes, the invoice image might be tilted) Upload the invoice files Open Manager page and go to your project that will be used to store the pipelines and related files Click the + (plus) sign and select File The Upload File dialog pops up. Click “Choose Files” to select all the invoice files both PDF and image formats (download the sample invoice files at the bottom of this post if you have not already) Click Upload button and the uploaded files will be shown. Building the pipeline Add the JSON Generator Snap: Drag and drop the JSON Generator onto the designer canvas. Click on the Snap to open settings, then click the "Edit JSON" button Highlight all the text from the template and delete it. Paste all invoice filenames in the format below. The editor should look like this. Click "OK" in the lower-right corner to save the prompt Save the settings and close the Snap Add the File Reader Snap: Drag and drop the File Reader Snap onto the designer canvas Click the Snap to open the configuration panel. Connect the Snap to the JSON Generator Snap by following these steps: Select Views tab Click plus(+) button on the Input pane to add the input view(input0) Save the configuration The Snap on the canvas will have the input view. Connecting it to the JSON Generator Snap In the configuration panel, select the Settings tab Set the File field by enabling expression by clicking the equal sign in front of the text input and set it to $filename to read all the files we specified in the JSON Generator Snap Validate the pipeline to see the File Reader output. Fields that will be used in the Multimodal Content Generator Snap Content-type shows file content type Content-location shows the file path and it will be used in the document name Add the Multimodal Content Generator Snap: Drag and drop the Multimodal Content Generator Snap onto the designer canvas and connect to the File Reader Snap Click the Snap to open the settings panel and configure the following fields: Type: enable the expression set the value to $['content-location'].endsWith('.pdf') ? 'document' : 'image' Document name enable the expression set the value to $['content-location'].snakeCase() Use the snake-case version of the file path as the document name to identify each file and make it compatible with the Amazon Bedrock Converse API. In snake case, words are lowercase and separated by underscores(_). Aggregate input check the checkbox Use this option to combine all input files into a single document. The settings should now look like the following Validate the pipeline to see the Multimodal Content Generator Snap output. The preview output should look like the below image. The sl_type will be document for the pdf file and image for the image file and the name will be the simplified file path. Add the Prompt Generator Snap: Drag and drop the Prompt Generator Snap onto the designer canvas and connect to the Multimodal Content Generator Snap Click the Snap to open the settings panel and configure the following fields: Enable the Advanced Prompt Output checkbox Set the Content to $content to use the content input from the Multimodal Content Generator Snap Click “Edit Prompt” and input your instructions. For example, Based on the total quantity across all invoices, which product has the highest and lowest purchase quantities, and in which invoices are these details found? Add and Configure the LLM Snap: Add the Amazon Bedrock Converse API Snap as the LLM Connect this Snap to the Prompt Generator Snap Click the Snap to open the configuration panel Select the Account tab and select your account Select the Settings tab Select a model that supports multimodal content. Enable the Use Message Payload checkbox Set the Message Payload to $messages to use the message from the Prompt Generator Snap Verify the result: Validate the pipeline and open the preview of the Amazon Bedrock Converse API Snap. The result should look like the following: In this example, the LLM successfully processes invoices in both PDF and image formats, demonstrating its ability to handle diverse inputs in a single workflow. By extracting and analyzing data across these formats, the LLM provides accurate responses and insights, showcasing the efficiency and flexibility of multimodal processing. You can adjust the queries in the Prompt Generator Snap to explore different results.1.6KViews4likes0CommentsIntroduction to PipeLoop
We all love the Pipeline Execute Snap, it greatly simplifies a complex pipeline by extracting sections into a sub-pipeline. But sometimes, we’d really want the ability to run a pipeline multiple times to perform some operations, like polling from an endpoint or performing LLM Tool calls. In this article, we will introduce the PipeLoop Snap, which adds iteration to the SnapLogic programming model. With PipeLoop, we can create new workflows that are previously hard to manage or even impossible. What is PipeLoop PipeLoop is a new Snap for iterative execution on a pipeline. For people who are familiar with iterations within programming languages, PipeLoop is essentially a do-while loop for pipelines. The user is required to provide an iteration limit as a hard cutoff to avoid resource depletion or infinite loop, and an optional stop condition to control the execution. Just like we can pass input documents to PipeExec, we can also pass input documents to PipeLoop, the difference between the two is that the output document of the pipeline executed with PipeLoop will be used as the next round of input to continue the execution until the stop condition is met or limit is reached. Due to this unique mechanism, the pipeline run by PipeLoop must have one unlinked input and one unlinked output to work properly. To put it simply, PipeLoop can be thought of as chaining a bunch of PipeExec Snaps with the same pipeline with variable length and a condition to exit early. PipeLoop execution flow 1. Input documents to PipeLoop are passed to the child pipeline for execution. 2. Child pipeline executes. 3. Child output is collected. 4. Evaluate stop condition based on document output. If true, exit and pass the output document to PipeLoop, otherwise continue. 5. Check if the iteration limit is reached. If true, exit and pass the output document to PipeLoop, otherwise continue. 6. Use the output document as the next round of input and continue (1.) PipeLoop execution walkthrough Let’s start with a very simple example. We’ll create a workflow using PipeLoop that increments a number from 1 to 3. For simplicity, we will refer to the pipeline with PipeLoop as the “Parent pipeline”, and the pipeline that is executed by PipeLoop as the “Child pipeline”. Parent pipeline setup The parent pipeline consists of one JSON Generator Snap with one document as input, and one PipeLoop Snap running the pipeline “child” with stop condition “$num >= 3”. We’ll also enable “Debug Iteration output” to see the output of each round in this walkthrough. Child pipeline setup The child pipeline consists of a single mapper snap that increments “$num” by 1, which satisfies the requirement “a pipeline with one unlinked input and one unlinked output” for a pipeline to be run by PipeLoop. Output The output of PipeLoop consists of two major sections when Debug mode is enabled: the output fields, and _iteration_documents. We can see the final output is “num”: 3, which means PipeLoop has successfully carried out the task. PipeLoop features There are multiple features in PipeLoop that can be helpful when building iterating pipelines. We’ll categorize them from where the features are located. Properties There are 4 main sections in the property of the PipeLoop Snap. Pipeline Pipeline Parameters Loop options Execution Options Pipeline The pipeline to be run. Pipeline Parameters We’ll take a deeper dive into this in the Pipeline Parameters section. Loop options Loop options are property settings that are related to iterations of this snap. Stop condition The Stop condition field allows the user to set an expression to be evaluated after the first execution has occurred. If the expression is evaluated to true, the iteration will be stopped. The stop condition can be also set to false if the user wishes to use this as a traditional for loop. There are cases where the user might pass an unintended value into the Stop condition field. In this scenario, PipeLoop generates a warning when the user provides a non-boolean String as the Stop condition, while the stop condition will be treated as false. Non-boolean Stop condition warning Iteration limit The Iteration limit field allows the user to limit the maximum number of iterations that could potentially occur. This field can also be used to limit the total number of executions if the Stop condition is set to false. Setting a large value for the Iteration limit with debug mode on could be dangerous. The accumulated documents could quickly deplete CPU and RAM resources. To prevent this, PipeLoop generates a warning in the Pipeline Validation Statistics tab when the Iteration limit is set to greater than or equal to 1000 with Debug mode set to enabled. Large iteration limit with debug mode enabled warning Debug iteration outputs This toggle field enables the output from the child pipelines for each iteration and the stop condition evaluation to be added into the final output as a separate field. Output example with Debug iteration outputs enabled Execution options Execute On To specify where the pipeline execution should take place. Currently only local executions (local snaplex, local node) are supported. Execution Label We’ll take a deeper dive into this in the Monitoring section. Pipeline Parameters For users that are familiar with Pipeline Parameters in PipeExec, feel free to skip to the next section as the instructions are identical. Introduction to Pipeline Parameters Before we take a look at the Pipeline Parameters support in the PipeLoop Snap, let’s take a step back and see what pipeline parameters are and how pipeline parameters can be leveraged. Pipeline parameters are String constants that can be defined in the Edit Pipeline Configuration settings. Users can use the parameters as a constant to be used anywhere in the pipeline. One major difference for Pipeline parameters and Pipeline variables is that Pipeline parameters are referred using an underscore prefix, whereas Pipeline variables are referred using a dollar sign prefix. Pipeline Parameters in Edit Pipeline Configuration Accessing Pipeline Parameters in an expression field Example Let’s take a look at Pipeline Parameters in action with PipeLoop. Our target here is to print out “Hello PipeLoop!” n times where n is the value of “num”. We’ll add two parameters in the child pipeline, param1 and param2. To demonstrate, we assign “value1” to param1 and keep it empty for param2. We’ll then add a message field with the value “Hello PipeLoop!” in the JSON Generator so that we can assign the String value to param2. Now we’re able to use param2 as a constant in the child pipeline. PipeLoop also has field name suggestions built in the Parameter name fields for ease of use. PipeLoop Pipeline Parameters in action For our child pipeline, we’ll add a new row in the Mapping table to print out “Hello PipeLoop!” repeatedly (followed with a new line character). One thing to bear in mind is that the order of the Mapping table does not affect the output (the number of “Hello PipeLoop!” printed in this case), as the output fields are updated after the execution of current iteration is finished. Child Pipeline configuration for our task Here’s the final result, we can see “Hello PipeLoop!” is being printed twice. Mission complete. Remarks Pipeline Parameters are String constants that can be set in Edit Pipeline Configuration. Users can pass a String to Pipeline Parameters defined in the Child pipeline in PipeLoop. Pipeline Parameters in PipeLoop will override previous pipeline parameter values defined in the Child pipeline if the parameters share the same name. Pipeline Parameters are constants, which means the values will not be modified during iterations even if the users did so. Monitoring When a snap in a pipeline is executed, there will not be any output until the execution is finished. Therefore, due to the nature of iterating pipeline execution as a single snap, it is slightly difficult to know where the execution is currently at, or which pipeline execution is corresponding to which input document. To deal with this, we have two extra features that can add more visibility to the PipeLoop execution. Pipeline Statistics progress bar During the execution of PipeLoop, a progress bar will be available in the Pipeline Validation Statistics tab, so that the user can get an idea of which iteration the PipeLoop is currently at. Note that the progress bar might not reflect the actual iteration index if the child pipeline executions are short, due to polling intervals. PipeLoop iteration progress bar Execution Label When a PipeLoop with multiple input documents is executed, the user will not be able to tell which pipeline execution is linked to which input document in the SnapLogic Monitor. Execution label is the answer to this problem. The user can pass in a value in the Execution label field that can differentiate input documents so that each input document will have its own label in the Snaplogic Monitor during Execution. Here’s an example of two input documents running on the child pipeline. We set the Execution label with the expression “child_label” + $num, so the execution for the first document will have the label “child_label0” and the second execution will have the label “child_label1”. Execution label settings SnapLogic Monitor View Summary In this article, we introduced PipeLoop, a new Snap for iterative execution workflows. The pipeline run by PipeLoop must have one unlinked input and one unlinked output. PipeLoop has the following features: Pipeline Parameters support Stop condition to exit early with warnings Iteration limit to avoid infinite loop with warnings Debug mode Execution label to differentiate runs in Monitor Progress bar for status tracking Happy Building!1.9KViews5likes0Comments