Gartner - 10 Best Practices for Scaling Generative AI
I recently came back from Gartner's Data and Analytics Summit in Orlando, Floria. As expected, GenAI was a big area of focus and interest. One of the sessions that I attended was "10 best practices for scaling Generative AI." The session highlighted the rapid adoption of generative AI, with 45% of organizations piloting and 10% already in production as of September 2023. While the benefits like workforce productivity, multi-domain applications, and competitive differentiation are evident, there are also significant risks around data loss, hallucinations, black box nature, copyright issues, and potential misuse. Through 2025, Gartner predicts at least 30% of generative AI projects will be abandoned after proof-of-concept due to issues like poor data quality, inadequate risk controls, escalating costs, or unclear business value. To successfully scale generative AI, the session outlined 10 best practices: Continuously prioritize use cases aligned to the organization's AI ambition and measure business value. Create a decision framework for build vs. buy, evaluating model training, security, integration, and pricing. Pilot use cases with an eye towards future scalability needs around data, privacy, security etc. Design a composable platform architecture to improve flexibility and avoid vendor lock-in. Put responsible AI principles at the forefront across fairness, ethics, privacy, compliance etc. Evaluate risk mitigation tools. Invest in data and AI literacy programs across functions and leadership. Instill robust data engineering practices like knowledge graphs and vector embeddings. Enable seamless human-AI collaboration with human-in-the-loop and communities of practice. Apply FinOps practices to monitor, audit and optimize generative AI costs. Adopt an agile, product-centric approach with continuous updates based on user feedback. The session stressed balancing individual and organizational needs while making responsible AI the cornerstone for scaling generative AI capabilities. Hope you found these useful. What are you thoughts on best practices for scaling GenAI?5.4KViews0likes1CommentGenAI App Builder Getting Started Series: Part 1 - HR Q&A example
👋 Welcome! Hello everyone, and welcome to our technical guide to getting started with GenAI App Builder on SnapLogic! At the time of publishing, GenAI App Builder is available for testing and will be generally available in our February release. For existing customers and partners, you can request access for testing GenAI App Builder by speaking to your Customer Success Manager or other member of your account team. If you're not yet a customer, you can speak to your Sales team about testing GenAI App Builder. 🤔 What is GenAI App Builder? Before we begin, let's take a moment to understand what GenAI App Builder is and at least a high-level talk about the components. GenAI App Builder is the latest offering in SnapLogic AI portfolio, focused on helping modern enterprises create applications with Generative AI faster, using a low-/no-code interface. That feels like a mouthful of buzzwords, so let me paint a picture (skip this if you're familiar with GenAI or watch our video, "Enabling employee and customer self-service"). Imagine yourself as a member of an HR team responsible for recruiting year round. Every new employee has an enrollment period after or just before their start date, and every existing employee has open enrollment once per year. During this time, employees need to choose between different medical insurance offerings, which usually involves a comparison of deductibles, networks, max-out-of-pocket, and other related features and limits. As you're thinking about all of this material, sorting out how to explain it all to your employees, you're interrupted by your Slack or Teams DM noise. Bing bong! Questions start flooding in: Hi, I'm a new employee and I'm wondering, when do I get paid? What happens payday is on a weekend or holiday? Speaking of holidays, what are company-recognized holidays this year? Hi, my financial account said I should change my insurance plan to one with an HSA. Can you help me figure out which plan(s) include an HSA and confirm the maximum contribution limits for a family this year? Hi, how does vacation accrual work? When does vacation rollover? Is unpaid vacation paid out or lost? All these questions and many others are answered in documents the HR team manages, including the employee handbook, insurance comparison charts, disability insurance sheets, life insurance sheets, other data sheets, etc. What if, instead of you having to answer all these questions, you would leverage a human-sounding large language model (LLM) to field these questions for you by making sure they referenced only the source documents you provide, so you don't have to worry about hallucinations?! Enter GenAI Builder! 🏗 Building an HR Q&A example Once you have access to test GenAI App Builder, you can use the following steps to start building out an HR Q&A example that will answer questions using only the employee handbook or whichever document that you provide. In this guide we will cover the two pipelines used, one that loads data and one that we will use to answer questions. We will not get into Snap customization or Snap details with this guide - it is just meant to show a quick use case. We do assume that you are familiar enough with SnapLogic to create a new Pipeline or import and existing one, search for Snaps, connect Snaps, and a few other simple steps. We will walk you through anything that is new to SnapLogic or that needs some additional context. We also assume you have some familiarity with Generative AI in this guide. We will also make a video with similar content in the near future, so I'll update or reply to this post once that content is available. Prerequisites In order to complete this guide, you will need the items below regardless of whether or not you use the Community-supported chatbot UI from SnapLogic. Access to a Pinecone instance (sign up for a free account at https://www.pinecone.io) with an existing index Access to Azure OpenAI or OpenAI You need a file to load, such as your company's employee handbook Loading data Our first step is to load data into the vector database using a Pipeline similar to the one below, which we will call the "Indexer" Pipeline since it helps populate the Pinecone Index. If you cannot find the patterns in the Pattern Library, you can find it attached below as "Indexer_Feb2024.slp". The steps below assume you have already imported the Pipeline or are building it as we go through. To add more color here, loading data into the vector database is only something that needs to be done when the files are updated. In the HR scenario, this might be once a year for open enrollment documents and maybe a few times a year for the employee handbook. We will explore some other use cases in the future where document updates would be much frequent. Click on the "File Reader" Snap to open its settings Click on the icon at the far right of the "File" field as shown in the screenshot below Click the "Upload" button in the upper-right corner of the window that pops up Select the PDF file from your local system that you want to index (we are using an employee handbook and you're welcome to do the same) to upload it, then make sure it is selected Save and close the "File Reader" Snap once your file is selected Leave the "PDF Parser" Snap with default settings Click on the "Chunker" Snap to open it, then mirror the settings in the screenshot below. Now open the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap (you may need to replace the embedder that came in the Pattern or import with the appropriate one you have an account with). Go to the "Account" tab and create a new account for the embedder you're using. You need to replace the variables {YOUR_ACCOUNT_LABEL} with a label for the account that makes sense for you, then replace {YOUR_ENDPOINT} with the appropriate snippet from your Azure OpenAI endpoint. Validate the account if you can to make sure it works. After you save your new account you can go back to the main "Settings" tab on the Snap. If the account setup was successful, you should now be able to click the chat bubble icon at the far right of the "Deployment ID" field to suggest a "Deployment ID" - in our environment shown in the screenshot below, you can see we have one named "Jump-emb-ada-002" which I can now select. Finally, make sure the "Text to embed" field is set as shown below, then save and close this Snap. Now open the "Mapper" Snap so we can map the output of the embedder Snap to the "Pinecone Upsert" Snap as shown in the screenshot below. If it is difficult to see the mappings in the screenshot above, here is a zoomed in version: For a little more context here, we're mapping the $embedding object coming out of the embedder Snap to the $values object in Pinecone, which is required. If that was all you mapped though, your Q&A example would always reply with something like "I don't know" because there is no data. To do that, we need to make use of the very flexible "metadata" object in Pinecone by mapping $original.chunk to $metadata.chunk. We also statically set $metadata.source to "Employee Handbook.pdf" which allows the retriever Pipeline to return the source file used in answering a question (in a real-world scenario, you would probably determine the source dynamically/programmatically such as using the filename so this pipeline could load other files too). Save and close the "Mapper" Snap Finally, open the "Pinecone Upsert" Snap then click the "Account" tab and create a new account with your Pinecone API Key and validate it to make sure it works before saving Back on the main "Settings" tab of the "Pinecone Upsert" Snap, you can now click on the chat bubble icon to suggest existing indexes in Pinecone. For example, in our screenshot below you can see we have four which have been obscured and one named "se-demo." Indexes cannot be created on the fly, so you will have to make sure the index is created in the Pinecone web interface. The last setting we'll talk about for the Indexer pipeline is the "Namespace" field in the "Pinecone Upsert" Snap. Setting a namespace is optional. Namespaces in Pinecone create a logical separation between vectors within an index and can be created on-the-fly during Pipeline execution. For example, you could create an index like "2024_enrollment" for all documents published in 2024 for open enrollment and another called "2024_employeehandbook" to separate those documents into separate namespaces. Although these can be used just for internal purposes of organization, you can also direct a chatbot to only use one namespace to answer questions. We'll talk about this more in the "Answering Questions" section below which covers the Retriever Pipeline. Save and close the "Pinecone Upsert" Snap You should now be able to validate the entire Pipeline to see what the data looks like as it flows through the Snaps, and when you're ready to commit the data to Pinecone, you can Execute the Pipeline. Answering Questions To answer questions using the data we just loaded into Pinecone, we're going to recreate or import the Retriever Pipeline (attached as "Retriever_Feb2024.slp"). If you import the Pipeline you may need to add additional "Mapper" Snaps as shown below. We will walk through that in the steps below, just know this is what we'll end up with at the end of our first article. The screenshot above shows what the pattern will look like when you import it. Since this first part of the series will only take us up to the point of testing in SnapLogic, our first few steps will involve some changes with that in mind. Right-click on the "HTTP Router" Snap, click "Disable Snap" Click the circle between "HTTP Router" and embedder Snap to disconnect them Drag the "HTTP Router" Snap somewhere out of the way on the canvas (you can also delete it if you're comfortable replacing it later); your Pipeline should now look like this: In the asset palette on the left, search for the "JSON Generator" (it should appear before you finish typing that all out): Drag a "JSON Generator" onto the canvas, connecting it to the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap Click on the "JSON Generator" to open it, then click on the "Edit JSON" button in the main Settings tab Highlight all the text from the template and delete it so we have a clean slate to work with Paste in this text, replacing "Your question here." with an actual question you want to ask that can be answered from the document you loaded with your Indexer Pipeline. For example, I loaded an employee handbook and I will ask the question, "When do I get paid?" [ { "prompt" : "Your question here." } ] Your "JSON Generator" should now look something like this but with your question: Click "OK" in the lower-right corner to save the prompt Click no the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap to view its settings Click on the Account tab, then use the drop-down box to select the account you created in the section above ("Loading Data", steps 8-9) Click on the chat bubble icon to suggest "Deployment IDs" and choose the same one you chose in "Loading Data", step 10 Set the "Text to embed" field to $prompt as shown in the screenshot below: Save and close the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap Click on the Mapper immediately after the embedder Snap Create a mapping for $embedding that maps to $vector Check the "Pass through" box; this Mapper Snap should now look like this: Save and close this "Mapper" Open the "Pinecone Query" Snap Click the Account tab, then use the drop-down to select the Pinecone account you created in "Loading Data", step 14 Use the chat bubble on the right side of the "Index name" field to select your existing Index [OPTIONAL] Use the chat bubble on the right side of the "Namespace" field to select your existing Namespace, if you created one; the "Pinecone Query" Snap should now look like this: Save and close the "Pinecone Query" Snap. Click on the "Mapper" Snap after the "Pinecone Query" Snap. In this "Mapper" we need to map the three items listed below, which are also shown in the following screenshot. If you're not familiar with the $original JSON key, it occurs when an upstream Snap has implicit pass through, or like the "Mapper" in step 17, we explicitly enable pass through, allowing us to access the original JSON document that went into the upstream Snap. (NOTE: If you're validating your pipeline along the way or making use of our Dynamic Validation, you may notice that no Target Schema shows up in this Mapper until after you complete steps 27-30.) Map $original.original.prompt to $prompt Map jsonPath($, "$matches[*].metadata.chunk") to jsonPath($, "$context[*].data") Map jsonPath($, "$matches[*].metadata.source") to jsonPath($, "$context[*].source") Save and close that "Mapper". Click on the "Azure OpenAI Prompt Generator" or "OpenAI Prompt Generator" so we can set our prompt. Click on the "Edit prompt" button and make sure your default prompt looks like the screenshot below. On lines 4-6 you can see we are using mustache templating like {{#context}} {{source}} {{/context}} which is the same as the jsonPath($, "$context[*].source") from the "Mapper" in step 25 above. We'll talk about this more in future articles - for now, just know this will be a way for you customize the prompt and data included in the future. Click "OK" in the lower-right corner Save and close the prompt generator Snap Click on the "Azure OpenAI Chat Completions" or "OpenAI Chat Completions" Snap Click the "Account" tab then use the drop-down box to select the account you created earlier Click the chat bubble icon to the far right of the "Deployment ID" field to suggest a deployment; this ID may be different than the one you've chosen in previous "Azure OpenAI" or "OpenAI" Snaps since we're selecting an LLM this team instead of an embedding model Set the "Prompt" field to $prompt; your Snap should look something like this: Save and close the chat completions Snap Testing our example Now it's time to validate our pipeline and take a look at the output! Once validated the Pipeline should look something like this: If you click the preview data output on the last Snap, the chat completions Snap, you should see output that looks like this: The answer to our prompt is under $choices[0].message.content. For the test above, I asked the question "When do I get paid?" against an employee handbook and the answer was this: Employees are paid on a semi-monthly basis (24 pay periods per year), with payday on the 15th and the last day of the month. If a regular payday falls on a Company-recognized holiday or on a weekend, paychecks will be distributed the preceding business day. The related context is retrieved from the following sources: [Employee Handbook.pdf] Wrapping up Stay tuned for further articles in the "GenAI App Builder Getting Started Series" for more use cases, closer looks at individual Snaps and their settings, and even how to connect a chat interface! Most if not all of these articles will also have an associated video if you learn better that way! If you have issues with the setup, find a missing step or detail, please reply to this thread to let us know!4.1KViews3likes1CommentSnapGPT - Security and Data Handling Protocols
Authors: Aaron Kesler, Jump Thanawut, Scott Monteith Security and Data Handling Protocols for SnapGPT SnapLogic acknowledges and respects the data concerns of our customers. The purpose of this document is to present our data handling and global data protection standards for SnapGPT. Overview & SnapLogic’s Approach to AI / LLM: SnapLogic utilizes high-quality Enterprise Language Learning Models (LLMs), selecting the most appropriate one for each specific task. Current support includes Azure OpenAI GPT, Anthropic Claude on Amazon Bedrock, and Google Vertex PaLM. Product & Data: Product Features & Scope: SnapGPT offers a range of features, each designed to enhance user experience and productivity in various aspects of pipeline and SQL query generation: Input Prompts: This feature allows customers to interact directly with the LLM by providing input prompts. These prompts are the primary method through which users can specify their requirements or ask questions to the LLM. Describe Pipeline: This skill enables users to obtain a comprehensive description of an existing pipeline. It helps in understanding and documenting the pipeline's structure and functionality. Analyze Pipeline: This feature ingests the entire pipeline configuration and analyzes it to make suggestions for optimization and improvement. It assists users in enhancing the efficiency and effectiveness of their pipelines. Mapper Configuration: Facilitates the configuration of the mapper snap by generating expressions to simplify the process of mapping input to output. Pipeline Generation: Users can create prototype pipelines using simple input prompts. This feature is geared towards streamlining the pipeline creation process, making it more accessible and less time-consuming. SQL Generation without Schema: Tailored for situations where the schema information is not available or cannot be shared, this feature generates SQL queries based solely on the customer's prompt, offering flexibility and convenience. SQL Generation with Schema (coming feb 2024): This advanced feature generates SQL queries by taking into account the schema of the input database. It is particularly useful for creating contextually accurate and efficient SQL queries. Data Usage & Opt-Out Options: At SnapLogic, we recognize the importance of data security and user privacy in the rapidly evolving Generative AI space. SnapGPT has been designed with these principles at its core, ensuring that customers can leverage the power of AI and machine learning while maintaining control over their data. Our approach prioritizes transparency, giving users the ability to opt-out of data sharing, and aligning with industry best practices for data handling. This commitment reflects our dedication to not only providing advanced AI solutions but also ensuring that these solutions align with the highest standards of privacy and data protection. Data Usage in SnapGPT: SnapGPT is designed to handle customer data with the utmost care and precision, ensuring that data usage is aligned with the functionality of each feature: Customer Input and Interaction: Customer inputs, such as prompts or pipeline configurations, are key to the functionality of SnapGPT. This data is used solely for the purpose of processing specific requests and generating responses or suggestions relevant to the user's query. No data is retained for model training purposes. Feature-Specific Data Handling: Each feature/skill of SnapGPT, like pipeline analysis or SQL generation, uses customer data differently. See the table below for details on each skill. Skill Name Description of the Skill Data Transferred to LLM Input Prompts Direct input prompts from customers are transferred to the LLM and tracked by SnapLogic analytics. Prompt details only; these are not stored or used for training by the LLM. Describe & Analyze Pipeline Allows customers to describe a pipeline, with the entire pipeline configuration relayed to the LLM. Entire pipeline configuration excluding account credential information. Mapper Configuration Enables sending input schema information within the prompt to the LLM for the “Mapper configuration” feature. Input schema information without account credential information. Pipeline Generation Uses input prompts to create pipeline prototypes by transmitting them to the LLM. Input prompts only; not stored or used for training by the LLM. SQL Generation W/out Schema Generates SQL queries based only on the customer's prompt in situations where schema information cannot be shared. Only the customer's prompt; no schema information is used. SQL Generation W/ Schema (Feb 2024) Generates accurate SQL queries by considering the schema of the input database. Schema of the input database excluding any account credentials, enhancing query accuracy. Future Adaptations: In the near future, we intend to offer customers opt-out options. Choosing to opt-out of including any environment-specific data in SnapGPT prompts can impact the quality of response from SnapGPT as it will lack additional context. As of the current version, usage of SnapGPT will include sending the data from the features listed above to the LLMs. We recommend that customers who are not comfortable with the described data transfers to wait for the opt-out option to become available. Impact of Opting Out: Choosing to opt-out of data sharing may impact the functionality and effectiveness of SnapGPT. For example, opting out of schema retrieval in SQL Generation may lead to less precise query outputs. Users are advised to consider these impacts when setting their data sharing preferences. Data Processing: Architecture: Data Flow: Data Retention & Residency: SnapLogic is committed to ensuring the secure handling and appropriate residency of customer data. Our data retention policies are designed to respect customer privacy while providing the necessary functionality of SnapGPT: Data Retention: No Retention for Model Training: SnapGPT is designed to prioritize user privacy. Therefore, no customer data processed by SnapGPT is retained for the purpose of model training. This ensures that user data is not used in any way to train or refine the underlying AI models. Storing Usage Data for Adoption Tracking: While we do not retain data for model training, SnapLogic stores usage data related to SnapGPT in Heap Analytics. This is strictly for the purpose of tracking product adoption and usage patterns. The collection of usage data helps us understand how our customers interact with SnapGPT, enabling us to continuously improve the product and tailor it to user needs. Data Residency: Location-Based Data Storage: Our control planes in the United States and the EMEA region adhere to the specific data residency policies of these locations. We ensure compliance with regional data protection and privacy laws, offering customers the assurance that their data is managed in accordance with local regulations. Controls – Admin, Groups, Users: SnapLogic provides robust control mechanisms for administrators, while ensuring that group and user-level controls align with organizational policies: Administrators have granular control over the use of SnapGPT within their organization. They can determine what data is shared with the LLM and have the ability to opt out of data sharing to meet specific data retention and sharing policies. Additionally, admins can control user access to various features and skills, ensuring alignment with organizational needs and security policies. Group Controls: Currently, groups do not have specific controls over SnapGPT. Group-level policies are managed by administrators to ensure consistency and security across the organization. User Controls: Users can access and utilize the features and skills of SnapGPT to which they are entitled. User entitlements are managed by administrators, ensuring that each user has access to the necessary tools for their role while maintaining data security and compliance. Guidelines for Secure and Compliant use of SnapGPT At SnapLogic, we understand the critical importance of data security and compliance in today’s digital landscape. As such, we are dedicated to providing our customers with the tools and knowledge necessary to utilize SnapGPT in a way that aligns with their internal information security (InfoSec) and privacy policies. This section offers guidelines to help ensure that your interaction with SnapGPT is both secure and compliant with your organizational standards. Customer Data Control: Customers are encouraged to actively manage and control the data they share with SnapGPT. By understanding and utilizing the available admin and user controls, customers can ensure that their use of SnapGPT aligns with their internal InfoSec and privacy policies. Best Practices for Data Sharing: We recommend that customers review and follow best practices for data sharing, especially when working with sensitive or confidential information. This includes using anonymization or pseudonymization techniques where appropriate, and sharing only the data in prompts and pipelines that is necessary for the task at hand. Integrating with Internal Policies: Customers should integrate their use of SnapGPT with their existing InfoSec and privacy frameworks. This integration ensures that data handling through SnapGPT remains consistent with the organization’s overall data protection strategy. Regular Review and Adjustment: Customers are advised to regularly review their data sharing settings and practices with SnapGPT, adjusting them as necessary to remain aligned with evolving InfoSec and privacy requirements. Training and Awareness: We also suggest that customers provide regular training and awareness programs to their users about the responsible and secure use of AI tools like SnapGPT, emphasizing the importance of data privacy and protection. Compliance: For detailed information on SnapLogic’s commitment to compliance with various regulatory standards and data security measures, please visit our comprehensive overview at SnapLogic Security & Compliance (https://www.snaplogic.com/security-standards). This resource provides an in-depth look at how we adhere to global data protection regulations, manage data security, and ensure the highest standards of compliance across all our products, including SnapGPT. For specific compliance inquiries or more information on how we handle compliance in relation to SnapGPT, please contact the SnapLogic Compliance Team at Security@snaplogic.com. For further details or inquiries regarding SnapGPT or any other SnapLogic AI services, please contact our SnapLogic AI Services Team ( ai-services@snaplogic.com). For more information on SnapLogic Security and Compliance: https://www.snaplogic.com/security-standards3.9KViews2likes0CommentsEmbeddings and Vector Databases
What are embeddings Embeddings are numerical representations of real-world objects, like text, images or audio. They are generated by machine learning models as vectors, an array of numbers, where the distance between vectors can be seens as the degree of similarity between objects. While an embedding model may have its own meaning for each of the dimensions, there’s no guarantee between embedding models of the meaning for each of the dimensions used by the embedding models. For example, the word “cat”, “dog” and “apple” might be embedded into the following vectors: cat -> (1, -1, 2) dog -> (1.5, -1.5, 1.8) apple -> (-1, 2, 0) These vectors are made-up for a simpler example. Real vectors are much larger, see the Dimension section for details. Visualizing these vectors as points in a 3D space, we can see that "cat" and "dog" are closer, while "apple" is positioned further away. Figure 1. Vectors as points in a 3D space By embedding words and contexts into vectors, we enable systems to assess how related two embedded items are to each other via vector comparison. Dimension of embeddings The dimension of embeddings refers to the length of the vector representing the object. In the previous example, we embedded each word into a 3-dimensional vector. However, a 3-dimensional embedding inevitably leads to a massive loss of information. In reality, word embeddings typically require hundreds or thousands of dimensions to capture the nuances of language. For example, OpenAI's text-embedding-ada-002 model outputs a 1536-dimensional vector Google Gemini's text-embedding-004 model outputs a 768-dimensional vector Amazon Titan's amazon.titan-embed-text-v2:0 model outputs a default 1024-dimensional vector Figure 2. Using text-embedding-ada-002 to embed the sentence “I have a calico cat.” In short, an embedding is a vector that represents a real-world object. The distance between these vectors indicates the similarity between the objects. Limitation of embedding models Embedding models are subject to a crucial limitation: the token limit, where a token can be a word, punctuation mark, or subword part. This constraint defines the maximum amount of text a model can process in a single input. For instance, the Amazon Titan Text Embeddings models can handle up to 8,192 tokens. When input text exceeds the limit, the model typically truncates it, discarding the remaining information. This can lead to a loss of context and diminished embedding quality, as crucial details might be omitted. To address this, several strategies can help mitigate its impact: Text Summarization or Chunking: Long texts can be summarized or divided into smaller, manageable chunks before embedding. Model Selection: Different embedding models have varying token limits. Choosing a model with a higher limit can accommodate longer inputs. What is a Vector Database Vector databases are optimized for storing embeddings, enabling fast retrieval and similarity search. By calculating the similarity between the query vector and the other vectors in the database, the system returns the vectors with the highest similarity, indicating the most relevant content. The following diagram illustrates a vector database search. A query vector 'favorite sport' is compared to a set of stored vectors, each representing a text phrase. The nearest neighbor, 'I like football', is returned as the top result. Figure 3. Vector Query Example Figure 4. Store Vectors into Database Figure 5. Retrieve Vectors from Database When working with vector databases, two key parameters come into play: Top K and similarity measure (or distance function). Top K When querying a vector database, the goal is often to retrieve the most similar items to a given query vector. This is where the Top K concept comes into play. Top K refers to retrieving the top K most similar items based on a similarity metric. For instance, if you're building a product recommendation system, you might want to find the top 10 products similar to the one a user is currently viewing. In this case, K would be 10. The vector database would return the 10 product vectors closest to the query product's vector. Similarity Measures To determine the similarity between vectors, various distance metrics are employed, including: Cosine Similarity: This measures the cosine of the angle between two vectors. It is often used for text-based applications as it captures semantic similarity well. A value closer to 1 indicates higher similarity. Euclidean Distance: This calculates the straight-line distance between two points in Euclidean space. It is sensitive to magnitude differences between vectors. Manhattan Distance: Also known as L1 distance, it calculates the sum of the absolute differences between corresponding elements of two vectors. It is less sensitive to outliers compared to Euclidean distance. Figure 6. Similarity Measures There are many other similarity measures not listed here. The choice of distance metric depends on the specific application and the nature of the data. It is recommended to experiment with various similarity metrics to see which one produces better results. What embedders are supported in SnapLogic As of October 2024, SnapLogic has supported embedders for major models and continues to expand its support. Supported embedders include: Amazon Titan Embedder OpenAI Embedder Azure OpenAi Embedder Google Gemini Embedder What vector databases are supported in SnapLogic Pinecone OpenSearch MongoDB Snowflake Postgres AlloyDB Pipeline examples Embed a text file Read the file using the File Reader snap. Convert the binary input to a document format using the Binary to Document snap, as all embedders require document input. Embed the document using your chosen embedder snap. Figure 7. Embed a File Figure 8. Output of the Embedder Snap Store a Vector Utilize the JSON Generator snap to simulate a document as input, containing the original text to be stored in the vector database. Vectorize the original text using the embedder snap. Employ a mapper snap to format the structure into the format required by Pinecone - the vector field is named "values", and the original text and other relevant data are placed in the "metadata" field. Store the data in the vector database using the vector database's upsert/insert snap. Figure 9. Store a Vector into Database Figure 10. A Vector in the Pinecone Database Retrieve Vectors Utilize the JSON Generator snap to simulate the text to be queried. Vectorize the original text using the embedder snap. Employ a mapper snap to format the structure into the format required by Pinecone, naming the query vector as "vector". Retrieve the top 1 vector, which is the nearest neighbor. Figure 11. Retrieve Vectors from a Database [ { "content" : "favorite sport" } ] Figure 12. Query Text Figure 13. All Vectors in the Database { "matches": [ { "id": "db873b4d-81d9-421c-9718-5a2c2bd9e720", "score": 0.547461033, "values": [], "metadata": { "content": "I like football." } } ] } Figure 14. Pipeline Output: the Closest Neighbor to the Query Embedder and vector databases are widely used in applications such as Retrieval Augmented Generation (RAG) and building chat assistants. Multimodal Embeddings While the focus thus far has been on text embeddings, the concept extends beyond words and sentences. Multimodal embeddings represent a powerful advancement, enabling the representation of various data types, such as images, audio, and video, within a unified vector space. By projecting different modalities into a shared semantic space, complex relationships and interactions between these data types can be explored. For instance, an image of a cat and the word "cat" might be positioned closely together in a multimodal embedding space, reflecting their semantic similarity. This capability opens up a vast array of possibilities, including image search with text queries, video content understanding, and advanced recommendation systems that consider multiple data modalities.3.2KViews5likes0CommentsRecipes for Success with SnapLogic’s GenAI App Builder: From Integration to Automation
For this episode of the Enterprise Alchemists podcast, Guy and Dominic invited Aaron Kesler and Roger Sramkoski to join them to discuss why SnapLogic's GenAI App Builder is the key to success with AI projects. Aaron is the Senior Product Manager for all things AI at SnapLogic, and Roger is a Senior Technical Product Marketing Manager focused on AI. We kept things concrete, discussing real-world results that early adopters have already been able to deliver by using SnapLogic's integration capabilities to power their new AI-driven experiences.2.3KViews4likes2CommentsIntroduction to PipeLoop
We all love the Pipeline Execute Snap, it greatly simplifies a complex pipeline by extracting sections into a sub-pipeline. But sometimes, we’d really want the ability to run a pipeline multiple times to perform some operations, like polling from an endpoint or performing LLM Tool calls. In this article, we will introduce the PipeLoop Snap, which adds iteration to the SnapLogic programming model. With PipeLoop, we can create new workflows that are previously hard to manage or even impossible. What is PipeLoop PipeLoop is a new Snap for iterative execution on a pipeline. For people who are familiar with iterations within programming languages, PipeLoop is essentially a do-while loop for pipelines. The user is required to provide an iteration limit as a hard cutoff to avoid resource depletion or infinite loop, and an optional stop condition to control the execution. Just like we can pass input documents to PipeExec, we can also pass input documents to PipeLoop, the difference between the two is that the output document of the pipeline executed with PipeLoop will be used as the next round of input to continue the execution until the stop condition is met or limit is reached. Due to this unique mechanism, the pipeline run by PipeLoop must have one unlinked input and one unlinked output to work properly. To put it simply, PipeLoop can be thought of as chaining a bunch of PipeExec Snaps with the same pipeline with variable length and a condition to exit early. PipeLoop execution flow 1. Input documents to PipeLoop are passed to the child pipeline for execution. 2. Child pipeline executes. 3. Child output is collected. 4. Evaluate stop condition based on document output. If true, exit and pass the output document to PipeLoop, otherwise continue. 5. Check if the iteration limit is reached. If true, exit and pass the output document to PipeLoop, otherwise continue. 6. Use the output document as the next round of input and continue (1.) PipeLoop execution walkthrough Let’s start with a very simple example. We’ll create a workflow using PipeLoop that increments a number from 1 to 3. For simplicity, we will refer to the pipeline with PipeLoop as the “Parent pipeline”, and the pipeline that is executed by PipeLoop as the “Child pipeline”. Parent pipeline setup The parent pipeline consists of one JSON Generator Snap with one document as input, and one PipeLoop Snap running the pipeline “child” with stop condition “$num >= 3”. We’ll also enable “Debug Iteration output” to see the output of each round in this walkthrough. Child pipeline setup The child pipeline consists of a single mapper snap that increments “$num” by 1, which satisfies the requirement “a pipeline with one unlinked input and one unlinked output” for a pipeline to be run by PipeLoop. Output The output of PipeLoop consists of two major sections when Debug mode is enabled: the output fields, and _iteration_documents. We can see the final output is “num”: 3, which means PipeLoop has successfully carried out the task. PipeLoop features There are multiple features in PipeLoop that can be helpful when building iterating pipelines. We’ll categorize them from where the features are located. Properties There are 4 main sections in the property of the PipeLoop Snap. Pipeline Pipeline Parameters Loop options Execution Options Pipeline The pipeline to be run. Pipeline Parameters We’ll take a deeper dive into this in the Pipeline Parameters section. Loop options Loop options are property settings that are related to iterations of this snap. Stop condition The Stop condition field allows the user to set an expression to be evaluated after the first execution has occurred. If the expression is evaluated to true, the iteration will be stopped. The stop condition can be also set to false if the user wishes to use this as a traditional for loop. There are cases where the user might pass an unintended value into the Stop condition field. In this scenario, PipeLoop generates a warning when the user provides a non-boolean String as the Stop condition, while the stop condition will be treated as false. Non-boolean Stop condition warning Iteration limit The Iteration limit field allows the user to limit the maximum number of iterations that could potentially occur. This field can also be used to limit the total number of executions if the Stop condition is set to false. Setting a large value for the Iteration limit with debug mode on could be dangerous. The accumulated documents could quickly deplete CPU and RAM resources. To prevent this, PipeLoop generates a warning in the Pipeline Validation Statistics tab when the Iteration limit is set to greater than or equal to 1000 with Debug mode set to enabled. Large iteration limit with debug mode enabled warning Debug iteration outputs This toggle field enables the output from the child pipelines for each iteration and the stop condition evaluation to be added into the final output as a separate field. Output example with Debug iteration outputs enabled Execution options Execute On To specify where the pipeline execution should take place. Currently only local executions (local snaplex, local node) are supported. Execution Label We’ll take a deeper dive into this in the Monitoring section. Pipeline Parameters For users that are familiar with Pipeline Parameters in PipeExec, feel free to skip to the next section as the instructions are identical. Introduction to Pipeline Parameters Before we take a look at the Pipeline Parameters support in the PipeLoop Snap, let’s take a step back and see what pipeline parameters are and how pipeline parameters can be leveraged. Pipeline parameters are String constants that can be defined in the Edit Pipeline Configuration settings. Users can use the parameters as a constant to be used anywhere in the pipeline. One major difference for Pipeline parameters and Pipeline variables is that Pipeline parameters are referred using an underscore prefix, whereas Pipeline variables are referred using a dollar sign prefix. Pipeline Parameters in Edit Pipeline Configuration Accessing Pipeline Parameters in an expression field Example Let’s take a look at Pipeline Parameters in action with PipeLoop. Our target here is to print out “Hello PipeLoop!” n times where n is the value of “num”. We’ll add two parameters in the child pipeline, param1 and param2. To demonstrate, we assign “value1” to param1 and keep it empty for param2. We’ll then add a message field with the value “Hello PipeLoop!” in the JSON Generator so that we can assign the String value to param2. Now we’re able to use param2 as a constant in the child pipeline. PipeLoop also has field name suggestions built in the Parameter name fields for ease of use. PipeLoop Pipeline Parameters in action For our child pipeline, we’ll add a new row in the Mapping table to print out “Hello PipeLoop!” repeatedly (followed with a new line character). One thing to bear in mind is that the order of the Mapping table does not affect the output (the number of “Hello PipeLoop!” printed in this case), as the output fields are updated after the execution of current iteration is finished. Child Pipeline configuration for our task Here’s the final result, we can see “Hello PipeLoop!” is being printed twice. Mission complete. Remarks Pipeline Parameters are String constants that can be set in Edit Pipeline Configuration. Users can pass a String to Pipeline Parameters defined in the Child pipeline in PipeLoop. Pipeline Parameters in PipeLoop will override previous pipeline parameter values defined in the Child pipeline if the parameters share the same name. Pipeline Parameters are constants, which means the values will not be modified during iterations even if the users did so. Monitoring When a snap in a pipeline is executed, there will not be any output until the execution is finished. Therefore, due to the nature of iterating pipeline execution as a single snap, it is slightly difficult to know where the execution is currently at, or which pipeline execution is corresponding to which input document. To deal with this, we have two extra features that can add more visibility to the PipeLoop execution. Pipeline Statistics progress bar During the execution of PipeLoop, a progress bar will be available in the Pipeline Validation Statistics tab, so that the user can get an idea of which iteration the PipeLoop is currently at. Note that the progress bar might not reflect the actual iteration index if the child pipeline executions are short, due to polling intervals. PipeLoop iteration progress bar Execution Label When a PipeLoop with multiple input documents is executed, the user will not be able to tell which pipeline execution is linked to which input document in the SnapLogic Monitor. Execution label is the answer to this problem. The user can pass in a value in the Execution label field that can differentiate input documents so that each input document will have its own label in the Snaplogic Monitor during Execution. Here’s an example of two input documents running on the child pipeline. We set the Execution label with the expression “child_label” + $num, so the execution for the first document will have the label “child_label0” and the second execution will have the label “child_label1”. Execution label settings SnapLogic Monitor View Summary In this article, we introduced PipeLoop, a new Snap for iterative execution workflows. The pipeline run by PipeLoop must have one unlinked input and one unlinked output. PipeLoop has the following features: Pipeline Parameters support Stop condition to exit early with warnings Iteration limit to avoid infinite loop with warnings Debug mode Execution label to differentiate runs in Monitor Progress bar for status tracking Happy Building!2KViews5likes0CommentsGuide for Advanced GenAI App Patterns
In the rapidly evolving field of Generative AI (GenAI), foundational knowledge can take you far, but it's the mastery of advanced patterns that truly empowers you to build sophisticated, scalable, and efficient applications. As the complexity of AI-driven tasks grows, so does the need for robust strategies that can handle diverse scenarios—from maintaining context in multi-turn conversations to dynamically generating content based on user inputs. This guide delves into these advanced patterns, offering a deep dive into the strategies that can elevate your GenAI applications. Whether you're an admin seeking to optimize your AI systems or a developer aiming to push the boundaries of what's possible, understanding and implementing these patterns will enable you to manage and solve complex challenges with confidence. 1. Advanced Prompt Engineering 1.1 Comprehensive Control of Response Format In GenAI applications, controlling the output format is crucial for ensuring that responses align with specific user requirements. Advanced prompt engineering allows you to craft prompts that provide precise instructions on how the AI should structure its output. This approach not only improves the consistency of responses but also makes them more aligned with the desired objectives. For instance, you can design prompts with a detailed structure that includes multiple elements such as Context, Objective, Style, Audience, and desired Response Length. This method allows for granular control over the output. A sample prompt might look like this: Context: Provide background information on the topic to set the stage. Objective: Clearly define the purpose of the response. Style: Specify whether the response should be formal, informal, technical, or creative. Audience: Identify the target audience, which influences the language and depth of explanation. Response format: Instruct the AI to generate a response that takes approximately 3 minutes to read, ensuring depth and comprehensiveness, typically spanning 4-5 paragraphs. This level of detail in prompt engineering ensures that the AI-generated content meets specific needs, making it suitable for various use cases, such as generating educational material, detailed reports, or customer communications. 1.2 Few-Shot Learning Few-shot learning is an advanced technique where the AI model is provided with a small number of examples (often just a few) within the prompt to guide its output. This method is particularly powerful when dealing with tasks that require the model to understand and replicate complex patterns or formats with minimal input. By incorporating examples directly into the prompt, you can train the model to produce more accurate and contextually appropriate responses even in scenarios where large amounts of training data are not available. This approach is invaluable in customizing the AI's behavior for niche applications where generalized training data may not suffice. Example Prompt: "Generate a product description for the following items. The description should be concise, highlight key features, and appeal to a target audience interested in quality and innovation. Example 1: Product Name: SmartHome WiFi Thermostat Description: "The SmartHome WiFi Thermostat is your ultimate solution for energy-efficient temperature control. With seamless integration into your smart home ecosystem, it offers intuitive controls, customizable settings, and real-time energy usage reports. Designed for modern living, it ensures comfort while helping you save on energy bills." Example 2: Product Name: EcoBrew Stainless Steel Coffee Maker Description: "The EcoBrew Stainless Steel Coffee Maker combines sustainability with top-tier performance. Made from durable stainless steel, this eco-friendly coffee maker brews the perfect cup every time, with minimal environmental impact. Ideal for the conscious coffee lover, it offers precision brewing and a sleek design that complements any kitchen." New Item: Product Name: UltraLight Travel Backpack Description: AI-Generated Output: "UltraLight Travel Backpack: The UltraLight Travel Backpack is engineered for the modern adventurer. Weighing just under a pound, it offers unmatched portability without sacrificing storage capacity. Featuring water-resistant fabric, multiple compartments, and ergonomic shoulder straps, this backpack is designed for those who need durability and convenience on the go. Perfect for day hikes, weekend getaways, or everyday use, the UltraLight Travel Backpack is your reliable companion on any journey." Explanation: In this example, the AI was given two product descriptions that demonstrate the desired style, tone, and structure of the output. When asked to generate a new description for the "UltraLight Travel Backpack," the AI used the patterns from the provided examples to create a similar, contextually appropriate product description. Despite only being shown two examples, the model effectively captured the key elements needed to generate a new, high-quality output. This approach is highly beneficial when you need the AI to produce consistent results across similar tasks, especially in scenarios where creating extensive training data is impractical. By providing just a few examples, you guide the AI's understanding, allowing it to apply the learned pattern to new, but related, tasks. 1.3 Chain of Thought The chain of thought patterns encourages the AI to generate responses that follow a logical sequence, mirroring human reasoning. This technique is particularly useful in complex scenarios where the AI needs to make decisions, solve problems, or explain concepts step-by-step. By structuring prompts that lead the AI through a series of thought processes, you can guide it to produce more coherent and rational outputs. This is especially effective in applications requiring detailed explanations, such as scientific reasoning, technical problem-solving, or any situation where the AI needs to justify its conclusions.For instance, a prompt might instruct the AI to break down a complex problem into smaller, manageable parts and tackle each one sequentially. The AI would first identify the key components of the problem, then work through each one, explaining its reasoning at each step. This method not only enhances the clarity of the response but also improves the accuracy and relevance of the AI’s conclusions. 2. Multi-modal Processing Multi-modal processing in Generative AI is a cutting-edge approach that allows AI systems to integrate and process multiple types of data—such as text, images, audio, and video—simultaneously. This capability is crucial for applications that require a deep understanding of content across different modalities, leading to more accurate and contextually enriched outputs. For instance, in a scenario where an AI is tasked with generating a description of a scene from a video, multi-modal processing enables it to analyze both the visual elements and the accompanying audio to produce a description that reflects not just what is seen but also the context provided by sound. Similarly, when processing text and images together, such as in a captioning task, the AI can better understand the relationship between the words and the visual content, leading to more precise and relevant captions. This advanced pattern is particularly beneficial in complex environments where understanding the nuances across different data types is key to delivering high-quality outputs. For example, in medical diagnostics, AI systems using multi-modal processing can analyze medical images alongside patient records and spoken notes to offer more accurate diagnoses. In customer service, AI can interpret and respond to customer queries by simultaneously analyzing text and voice tone, improving the quality of interactions. Moreover, multi-modal processing enhances the AI's ability to learn from varied data sources, allowing it to build more robust models that generalize better across different tasks. This makes it an essential tool in the development of AI applications that need to operate in real-world scenarios where data is rarely homogeneous. By leveraging multi-modal processing, AI systems can generate responses that are not only more comprehensive but also tailored to the specific needs of the task at hand, making them highly effective in a wide range of applications. As this technology continues to evolve, it promises to unlock new possibilities in fields as diverse as entertainment, education, healthcare, and beyond. Example In many situations, data may include both images and text that need to be analyzed together to gain comprehensive insights. To effectively process and integrate these different data types, you can utilize a multi-modal processing pipeline in SnapLogic. This approach allows the Generative AI model to simultaneously analyze data from both sources, maintaining the integrity of each modality. This pipeline is composed of two distinct stages. The first stage focuses on extracting images from the source data and converting them into base64 format. The second stage involves generating a prompt using advanced prompt engineering techniques, which is then fed into the Large Language Model (LLM). The visual representation of this process is divided into two parts, as shown in the picture above. Extract the image from the source Add the File Reader Snap: Drag and drop the “File Reader” Snap onto the designer. Configure the File Reader Snap: Click on the “File Reader” Snap to access its settings panel. Then, select a file that contains images. In this case, we select a pdf file. Add the PDF Parser Snap: Drag and drop the “PDF Parser” Snap onto the designer and set the parser type to be “Pages to images converter” Configure views: Click on the “Views” tab and then select the output to be “Binary”. Convert to Base64: Add and connect “Binary to Document” snap to the PDF Parser snap. Then, configure the encoding to ENCODE_BASE64. Construct the prompt and send it to the GenAI Add a JSON Generator Snap: Drag the JSON Generator Snap and connect it to the preceding Mapper Snap. Then, click “Edit JSON” to modify the JSON string in the JSON editor mode. AWS Claude on Message allows you to send images via the prompt by configuring the source attribute within the content. You can construct the image prompt as demonstrated in the screenshot. Provide instruction with Prompt Generator: Add the prompt generator Snap and connect it to the JSON Generator Snap. Next, select the “Advanced Prompt Output” checkbox to enable the advanced prompt payload. Finally, click “Edit Prompt” to enter your specific instructions. The advanced prompt output will be structured as an array of messages, as illustrated in the screenshot below. Send to GenAI: Add the AWS Claude on AWS Message Snap and enter your credentials to access the AWS Bedrock service. Ensure that the “Use Message Payload” checkbox is selected, and then configure the message payload using $messages, which is the output from the previous Snap. After completing these steps, you can process the image using the LLM independently. This approach allows the LLM to focus on extracting detailed information from the image. Once the image has been processed, you can then combine this data with other sources, such as text or structured data, to generate a more comprehensive and accurate analysis. This multi-modal integration ensures that the insights derived from different data types are effectively synthesized, leading to richer and more precise results. 3. Semantic Caching To optimize both the cost and response time associated with using Large Language Models (LLMs), implementing a semantic caching mechanism is a highly effective strategy. Semantic caching involves storing responses generated by the model and reusing them when the system encounters queries with the same or similar meanings. This approach not only enhances the overall efficiency of the system but also significantly reduces the operational costs tied to model usage. The fundamental principle behind semantic caching is that many user queries are often semantically similar, even if they are phrased differently. By identifying and caching the responses to these semantically equivalent queries, the system can bypass the need to repeatedly invoke the LLM, which is resource-intensive. Instead, the system can quickly retrieve and return the cached response, leading to faster response times and a more seamless user experience. From a cost perspective, semantic caching directly translates into savings. Each time the system serves a response from the cache rather than querying the LLM, it avoids the computational expense associated with generating a new response. This reduction in the number of LLM invocations directly correlates with lower service costs, making the solution more economically viable, particularly in environments with high query volumes. Additionally, semantic caching contributes to system scalability. As the demand on the LLM grows, the caching mechanism helps manage the load more effectively, ensuring that response times remain consistent even as the system scales. This is crucial for maintaining the quality of service, especially in real-time applications where latency is a critical factor. Implementing semantic caching as part of the LLM deployment strategy offers a dual benefit: optimizing response times for end-users and minimizing the operational costs of model usage. This approach not only enhances the performance and scalability of AI-driven systems but also ensures that they remain cost-effective and responsive as user demand increases. Implementation Concept for Semantic Caching Semantic caching is a strategic approach designed to optimize both response time and computational efficiency in AI-driven systems. The implementation of semantic caching involves the following key steps: Query Submission and Vectorization: When a user submits a query, the system first processes this input by converting it into an embedding—a vectorized representation of the query. This embedding captures the semantic meaning of the query, enabling efficient comparison with previously stored data. Cache Lookup and Matching: The system then performs a lookup in the vector cache, which contains embeddings of previous queries along with their corresponding responses. During this lookup, the system searches for an existing embedding that closely matches the new query's embedding. Matching Threshold: A critical component of this process is the match threshold, which can be adjusted to control the sensitivity of the matching algorithm. This threshold determines how closely the new query needs to align with a stored embedding for the cache to consider it a match. Cache Hit and Response Retrieval: If the system identifies a match within the defined threshold, it retrieves the corresponding response from the cache. This "cache hit" allows the system to deliver the response to the user rapidly, bypassing the need for further processing. By serving responses directly from the cache, the system conserves computational resources and reduces response times. Cache Miss and LLM Processing: In cases where no suitable match is found in the cache—a "cache miss"—the system forwards the query to the Large Language Model (LLM). The LLM processes the query and generates a new response, ensuring that the user receives a relevant and accurate answer even for novel queries. Response Storage and Cache Management: After the LLM generates a new response, the system not only delivers this response to the user but also stores the response along with its associated query embedding back into the vector cache. This step ensures that if a similar query is submitted in the future, the system can serve the response directly from the cache, further optimizing the system’s efficiency. Time-to-Live (TTL) Adjustment: To maintain the relevance and accuracy of cached responses, the system can adjust the Time-to-Live (TTL) for each entry in the cache. The TTL determines how long a response remains valid in the cache before it is considered outdated and automatically removed. By fine-tuning the TTL settings, the system ensures that only up-to-date and contextually appropriate responses are served, thereby preventing the use of stale or irrelevant data. Implement Semantic Caching in Snaplogic The concept of semantic caching can be effectively implemented within SnapLogic, leveraging its robust pipeline capabilities. Below is an outline of how this implementation can be achieved: Embedding the Query: The process begins with the embedding of the user’s query (prompt). Using SnapLogic's capabilities, an embedder, such as the Amazon Titan Embedder, is employed to convert the prompt into a vectorized representation. This embedding captures the semantic meaning of the prompt, making it suitable for comparison with previously stored embeddings. Vector Cache Lookup: Once the prompt has been embedded, the system proceeds to search for a matching entry in the vector cache. In this implementation, the Snowflake Vector Database serves as the vector cache, storing embeddings of past queries along with their corresponding responses. This lookup is crucial for determining whether a similar query has been processed before. Flow Routing with Router Snap: After the lookup, the system uses a Router Snap to manage the flow based on whether a match (cache hit) is found or not (cache miss). The Router Snap directs the workflow as follows: Cache Hit: If a matching embedding is found in the vector cache, the Router Snap routes the process to immediately return the cached response to the user. This ensures rapid response times by avoiding unnecessary processing. Cache Miss: If no match is found, the Router Snap directs the workflow to request a new response from the Large Language Model (LLM). The LLM processes the prompt and generates a new, relevant response. Storing and Responding: In the event of a cache miss, after the LLM generates a new response, the system not only sends this response to the user but also stores the new embedding and response in the Snowflake Vector Database for future use. This step enhances the efficiency of subsequent queries, as similar prompts can be handled directly from the cache. 4. Multiplexing AI Agents Multiplexing AI agents refers to a strategy where multiple generative AI models, each specialized in a specific task, are utilized in parallel to address complex queries. This approach is akin to assembling a panel of experts, where each agent contributes its expertise to provide a comprehensive solution. Here is the key feature of using multiplexing AI Agents Specialization: A central advantage of multiplexing AI agents is the specialization of each agent in handling specific tasks or domains. Multiplexing ensures that responses are more relevant and accurate by assigning each AI model to a particular area of expertise. For example, one agent might be optimized for natural language understanding, another for technical problem-solving, and a third for summarizing complex data. This allows the system to handle multi-dimensional queries effectively, as each agent focuses on what it does best. This specialization significantly reduces the likelihood of errors or irrelevant responses, as the AI agents are tailored to their specific tasks. In scenarios where a query spans multiple domains—such as asking a technical question with a business aspect—the system can route different parts of the query to the appropriate agent. This structured approach allows for extracting more relevant and accurate information, leading to a solution that addresses all facets of the problem. Parallel Processing: Multiplexing AI agents take full advantage of parallel processing capabilities. By running multiple agents simultaneously, the system can tackle different aspects of a query at the same time, speeding up the overall response time. This parallel approach enhances both performance and scalability, as the workload is distributed among multiple agents rather than relying on a single model to process the entire task. For example, in a customer support application, one agent could handle the analysis of a customer’s previous interactions while another agent generates a response to a technical issue, and yet another creates a follow-up action plan. Each agent works on its respective task in parallel, and the system integrates its outputs into a cohesive response. This method not only accelerates problem-solving but also ensures that different dimensions of the problem are addressed simultaneously. Dynamic Task Allocation: In a multiplexing system, dynamic task allocation is crucial for efficiently distributing tasks among the specialized agents. A larger, general-purpose model, such as AWS Claude 3 Sonet, can act as an orchestrator, assessing the context of the query and determining which parts of the task should be delegated to smaller, more specialized agents. The orchestrator ensures that each task is assigned to the model best equipped to handle it. For instance, if a user submits a complex query about legal regulations and data security, the general model can break down the query, sending legal-related questions to an AI agent specialized in legal analysis and security-related queries to a security-focused agent like TinyLlama or a similar model. This dynamic delegation allows for the most relevant models to be used at the right time, improving both the efficiency and accuracy of the overall response. Integration of Outputs: Once the specialized agents have processed their respective tasks, the system must integrate their outputs to form a cohesive and comprehensive response. This integration is a critical feature of multiplexing, as it ensures that all aspects of a query are addressed without overlap or contradiction. The system combines the insights generated by each agent, creating a final output that reflects the full scope of the user’s request. In many cases, the integration process also includes filtering or refining the outputs to remove any inconsistencies or redundancies, ensuring that the response is logical and cohesive. This collaborative approach increases the reliability of the system, as it allows different agents to complement one another’s knowledge and expertise. Additionally, multiplexing reduces the likelihood of hallucinations—incorrect or nonsensical outputs that can sometimes occur with single, large-scale models. By dividing tasks among specialized agents, the system ensures that each part of the problem is handled by an AI that is specifically trained for that domain, minimizing the chance of erroneous or out-of-context responses. Improved Accuracy and Contextual Understanding: Multiplexing AI agents contribute to improved overall accuracy by distributing tasks to models that are more finely tuned to specific contexts or subjects. This approach ensures that the AI system can better understand and address the nuances of a query, particularly when the input involves complex or highly specialized information. Each agent’s deep focus on a specific task leads to a higher level of precision, resulting in a more accurate final output. Furthermore, multiplexing allows the system to build a more detailed contextual understanding. Since different agents are responsible for different elements of a task, the system can synthesize more detailed and context-aware responses. This holistic view is crucial for ensuring that the solution provided is not only accurate but also relevant to the specific situation presented by the user. In SnapLogic, we offer comprehensive support for building advanced workflows by integrating our GenAI Builder Snap. This feature allows users to incorporate generative AI capabilities into their workflow automation processes seamlessly. By leveraging the GenAI Builder Snap, users can harness the power of artificial intelligence to automate complex decision-making, data processing, and content generation tasks within their existing workflows. This integration provides a streamlined approach to embedding AI-driven functionalities, enhancing both efficiency and precision across various operational domains. For instance, users can design workflows where the GenAI Builder Snap collaborates with other SnapLogic components, such as data pipelines and transformation processes, to deliver intelligent, context-aware automation tailored to their unique business needs. In the example pipelines, the system sends a prompt simultaneously to multiple AI agents, each with its specialized area of expertise. These agents independently process the specific aspects of the prompt related to their specialization. Once the agents generate their respective outputs, the results are then joined together to form a cohesive response. To further enhance the clarity and conciseness of the final output, a summarization agent is employed. This summarization agent aggregates and refines the detailed responses from each specialized agent, distilling the information into a concise, unified summary that captures the key points from all the agents, ensuring a coherent and well-structured final response. 5. Multi-agent conversation Multi-agent conversation refers to the interaction and communication between multiple autonomous agents, typically AI systems, working together to achieve a shared goal. This framework is widely used in areas like collaborative problem-solving, multi-user systems, and complex task coordination where multiple perspectives or expertise areas are required. Unlike a single-agent conversation, where one AI handles all inputs and outputs, a multi-agent system divides tasks among several specialized agents, allowing for greater efficiency, deeper contextual understanding, and enhanced problem-solving capabilities. Here are the key features of using multi-agent conversations. Specialization and Expertise: Each agent in a multi-agent system is designed with a specific role or domain of expertise. This allows the system to leverage agents with specialized capabilities to handle different aspects of a task. For example, one agent might focus on natural language processing (NLP) to understand input, while another might handle complex calculations or retrieve data from external sources. This division of labor ensures that tasks are processed by the most capable agents, leading to more accurate and efficient results. Specialization reduces the likelihood of errors and allows for a deeper, domain-specific understanding of the problem. Collaboration and Coordination: In a multi-agent conversation, agents don’t work in isolation—they collaborate to achieve a shared goal. Each agent contributes its output to the broader conversation, sharing information and coordinating actions to ensure that the overall task is completed successfully. This collaboration is crucial when handling complex problems that require input from multiple domains. Effective coordination ensures that agents do not duplicate work or cause conflicts. Through predefined protocols or negotiation mechanisms, agents are able to work together harmoniously, producing a coherent solution that integrates their various inputs. Scalability: Multi-agent systems are inherently scalable, making them ideal for handling increasingly complex tasks. As the system grows in complexity or encounters new challenges, additional agents with specific skills can be introduced without overloading the system. Each agent can work independently, and the system's modular design allows for smooth expansion. Scalability ensures that the system can handle larger datasets, more diverse inputs, or more complex tasks as the environment evolves. This adaptability is essential in dynamic environments where workloads or requirements change over time. Distributed Decision-Making: In a multi-agent system, decision-making is often decentralized, meaning each agent has the autonomy to make decisions based on its expertise and the information available to it. This distributed decision-making process allows agents to handle tasks in parallel, without needing constant oversight from a central controller. Since agents can operate independently, decisions are made more quickly, and bottlenecks are avoided. This decentralized approach also enhances the system's resilience, as it avoids over-reliance on a single decision point and enables more adaptive and localized problem-solving. Fault Tolerance and Redundancy: Multi-agent systems are naturally resilient to errors and failures. Since each agent operates independently, the failure of one agent does not disrupt the entire system. Other agents can continue their tasks or, if necessary, take over the work of a failed agent. This built-in redundancy ensures the system can continue functioning even when some agents encounter issues. Fault tolerance is particularly valuable in complex systems, as it enhances reliability and minimizes downtime, allowing the system to maintain performance even under adverse conditions. SnapLogic provides robust capabilities for integrating workflow automation with Generative AI (GenAI), allowing users to seamlessly build advanced multi-agent conversation systems by combining the GenAI Snap with other Snaps within their pipeline. This integration enables users to create sophisticated workflows where multiple AI agents, each with their specialization, collaborate to process complex queries and tasks. In this example, we demonstrate a simple implementation of a multi-agent conversation system, leveraging a manager agent to oversee and control the workflow. The process begins by submitting a prompt to a large foundational model, which, in this case, is AWS Claude 3 Sonet. This model acts as the manager agent responsible for interpreting the prompt and determining the appropriate routing for different parts of the task. Based on the content and context of the prompt, the manager agent makes decisions on how to distribute the workload across specialized agents. After the initial prompt is processed, we utilize the Router Snap to dynamically route the output to the corresponding specialized agents. Each agent is tailored to handle a specific domain or task, such as data analysis, natural language processing, or knowledge retrieval, ensuring that the most relevant and specialized agent addresses each part of the query. Once the specialized agents have completed their respective tasks, their outputs are gathered and consolidated. The system then sends the final, aggregated result to the output destination. This approach ensures that all aspects of the query are addressed efficiently and accurately, with each agent contributing its expertise to the overall solution. The flexibility of SnapLogic’s platform, combined with the integration of GenAI models and Snaps, makes it easy for users to design, scale, and optimize complex multi-agent conversational workflows. By automating task routing and agent collaboration, SnapLogic enables more intelligent, scalable, and context-aware solutions for addressing a wide range of use cases, from customer service automation to advanced data processing. 6. Retrieval Augment Generation (RAG) To enhance the specificity and relevance of responses generated by a Generative AI (GenAI) model, it is crucial to provide the model with sufficient context. Contextual information helps the model understand the nuances of the task at hand, enabling it to generate more accurate and meaningful outputs. However, in many cases, the amount of context needed to fully inform the model exceeds the token limit that the model can process in a single prompt. This is where a technique known as Retrieval-Augmented Generation (RAG) becomes particularly valuable. RAG is designed to optimize the way context is fed into the GenAI model. Rather than attempting to fit all the necessary information into the limited input space, RAG utilizes a retrieval mechanism that dynamically sources relevant information from an external knowledge base. This approach allows users to overcome the token limit challenge by fetching only the most pertinent information at the time of query generation, ensuring that the context provided to the model remains focused and concise. The RAG framework can be broken down into two primary phases: Embedding Knowledge into a Vector Database: In the initial phase, the relevant content is embedded into a vector space using a machine learning model that transforms textual data into a format conducive to similarity matching. This embedding process effectively converts text into vectors, making it easier to store and retrieve later based on its semantic meaning. Once embedded, the knowledge is stored in a vector database for future access. In SnapLogic, embedding knowledge into a vector database can be accomplished through a streamlined pipeline designed for efficiency and scalability. The process begins with reading a PDF file using the File Reader Snap, followed by extracting the content with the PDF Parser Snap, which converts the document into a structured text format. Once the text is available, the Chunker Snap is used to intelligently segment the content into smaller, manageable chunks. These chunks are specifically sized to align with the input constraints of the model, ensuring optimal performance during later stages of retrieval. After chunking the text, each segment is processed and embedded into a vector representation, which is then stored in the vector database. This enables efficient similarity-based retrieval, allowing the system to quickly access relevant pieces of information as needed. By utilizing this pipeline in SnapLogic, users can easily manage and store large volumes of knowledge in a way that supports high-performance, context-driven AI applications. Retrieving Context through Similarity Matching: When a query is received, the system performs similarity matching to retrieve the most relevant content from the vector database. By evaluating the similarity between the embedded query and the stored vectors, RAG identifies the most pertinent pieces of information, which are then used to augment the input prompt. This step ensures that the GenAI model receives focused and contextually enriched data, allowing it to generate more insightful and accurate responses. To retrieve relevant context from the vector database in SnapLogic, users can leverage an embedder snap, such as the AWS Titan Embedder, to transform the incoming prompt into a vector representation. This vector serves as the key for performing a similarity-based search within the vector database where the previously embedded knowledge is stored. The vector search mechanism efficiently identifies the most relevant pieces of information, ensuring that only the most contextually appropriate content is retrieved. Once the pertinent knowledge is retrieved, it can be seamlessly integrated into the overall prompt-generation process. This is typically achieved by feeding the retrieved context into a prompt generator snap, which structures the information in a format optimized for use by the Generative AI model. In this case, the final prompt, enriched with the relevant context, is sent to the GenAI Snap, such as Anthropic Claude within the AWS Messages Snap. This approach ensures that the model receives highly specific and relevant information, ultimately enhancing the accuracy and relevance of its generated responses. By implementing RAG, users can fully harness the potential of GenAI models, even when dealing with complex queries that demand a significant amount of context. This approach not only enhances the accuracy of the model's responses but also ensures that the model remains efficient and scalable, making it a powerful tool for a wide range of real-world applications. 7. Tool Calling and Contextual instruction Traditional GenAI models are limited by the data they were trained on. Once trained, these models cannot access new or updated information unless they are retrained. This limitation means that without external input, models can only generate responses based on the static content within their training corpus. However, in a world where data is constantly evolving, relying on static knowledge is often inadequate, especially for tasks that require current or real-time information. In many real-world applications, Generative AI (GenAI) models need access to real-time data to generate contextually accurate and relevant responses. For example, if a user asks for the current weather in a particular location, the model cannot rely solely on pre-trained knowledge, as this data is dynamic and constantly changing. In such scenarios, traditional prompt engineering techniques are insufficient, as they primarily rely on static information that was available at the time of the model's training. This is where the tool-calling technique becomes invaluable. Tool calling refers to the ability of a GenAI model to interact with external tools, APIs, or databases to retrieve specific information in real-time. Instead of relying on its internal knowledge, which may be outdated or incomplete, the model can request up-to-date data from external sources and use it to generate a response that is both accurate and contextually relevant. This process significantly expands the capabilities of GenAI, allowing it to move beyond static, pre-trained content and incorporate dynamic, real-world data into its responses. For instance, when a user asks for live weather updates, stock market prices, or traffic conditions, the GenAI model can trigger a tool call to an external API—such as a weather service, financial data provider, or mapping service—to fetch the necessary data. This fetched data is then integrated into the model’s response, enabling it to provide an accurate and timely answer that would not have been possible using static prompts alone. Contextual instruction plays a critical role in the tool calling process. Before calling an external tool, the GenAI model must understand the nature of the user’s request and identify when external data is needed. For example, if a user asks, "What is the weather like in Paris right now?" the model recognizes that the question requires real-time weather information and that this cannot be answered based on internal knowledge alone. The model is thus programmed to trigger a tool call to a relevant weather service API, retrieve the live weather data for Paris, and incorporate it into the final response. This ability to understand and differentiate between static knowledge (which can be answered with pre-trained data) and dynamic, real-time information (which requires external tool calling) is essential for GenAI models to operate effectively in complex, real-world environments. Use Cases for Tool Calling Real-Time Data Retrieval: GenAI models can call external APIs to retrieve real-time data such as weather conditions, stock prices, news updates, or live sports scores. These tool calls ensure that the AI provides up-to-date and accurate responses that reflect the latest information. Complex Calculations and Specialized Tasks: Tool calling allows AI models to handle tasks that require specific calculations or domain expertise. For instance, an AI model handling a financial query can call an external financial analysis tool to perform complex calculations or retrieve historical stock market data. Integration with Enterprise Systems: In business environments, GenAI models can interact with external systems such as CRM platforms, ERP systems, or databases to retrieve or update information in real time. For example, a GenAI-driven customer service bot can pull account information from a CRM system or check order statuses from an external order management tool. Access to Specialized Knowledge: Tool calling allows AI models to fetch specialized information from databases or knowledge repositories that fall outside their domain of training. For example, a medical AI assistant could call an external database of medical research papers to provide the most current treatment options for a particular condition. Implementation of Tool Calling in Generative AI Systems Tool calling has become an integral feature in many advanced Generative AI (GenAI) models, allowing them to extend their functionality by interacting with external systems and services. For instance, AWS Anthropic Claude supports tool calling via the Message API, providing developers with a structured way to integrate external data and functionality directly into the model's response workflow. This capability allows the model to enhance its responses by incorporating real-time information, performing specific functions, or utilizing external APIs that provide specialized data beyond the model's training. To implement tool calling with AWS Anthropic Claude, users can leverage the Message API, which allows for seamless integration with external systems. The tool calling mechanism is activated by sending a message with a specific "tools" parameter. This parameter defines how the external tool or API will be called, using a JSON schema to structure the function call. This approach enables the GenAI model to recognize when external input is required and initiate a tool call based on the instructions provided. Implementation process Defining the Tool Schema: To initiate a tool call, users need to send a request with the "tools" parameter. This parameter is defined in a structured JSON schema, which includes details about the external tool or API that the GenAI model will call. The JSON schema outlines how the tool should be used, including the function name, parameters, and any necessary inputs for making the call. For example, if the tool is a weather API, the schema might define parameters such as location and time, allowing the model to query the API with these inputs to retrieve current weather data. Message Structure and Request Initiation: Once the tool schema is defined, the user can send a message to AWS Anthropic Claude containing the "tools" parameter alongside the prompt or query. The model will then interpret the request and, based on the context of the conversation or task, determine if it needs to call the external tool specified in the schema. If a tool call is required, the model will respond with a "stop_reason" value of "tool_use". This response indicates that the model is pausing its generation to call the external tool, rather than completing the response using only its internal knowledge. Tool Call Execution: When the model responds with "stop_reason": "tool_use", it signals that the external API or function should be called with the inputs provided. At this point, the external API (as specified in the JSON schema) is triggered to fetch the required data or perform the designated task. For example, if the user asks, "What is the weather in New York right now?", and the JSON schema defines a weather API tool, the model will pause and call the API with the location parameter set to "New York" and the time parameter set to "current." Handling the API Response: After the external tool processes the request and returns the result, the user (or system) sends a follow-up message containing the "tool_result". This message includes the output from the tool call, which can then be integrated into the ongoing conversation or task. In practice, this might look like a weather API returning a JSON object with temperature, humidity, and weather conditions. The response is passed back to the GenAI model via a user message, which contains the "tool_result" data. Final Response Generation: Once the model receives the "tool_result", it processes the data and completes the response. This allows the GenAI model to provide a final answer that incorporates real-time or specialized information retrieved from the external system. In our weather example, the final response might be, "The current weather in New York is 72°F with clear skies." Currently, SnapLogic does not yet provide native support for tool calling within the GenAI Snap Pack. However, we recognize the immense potential and value this feature can bring to users, enabling seamless integration with external systems and services for real-time data and advanced functionalities. We are actively working on incorporating tool calling capabilities into future updates of the platform. This enhancement will further empower users to build more dynamic and intelligent workflows, expanding the possibilities of automation and AI-driven solutions. We are excited about the potential it holds and look forward to sharing these innovations soon 8. Memory Cognition for LLMs Most large language models (LLMs) operate within a context window limitation, meaning they can only process and analyze a finite number of tokens (words, phrases, or symbols) at any given time. This limitation poses significant challenges, particularly when dealing with complex tasks, extended dialogues, or interactions that require long-term contextual understanding. For example, if a conversation or task extends beyond the token limit, the model loses awareness of earlier portions of the interaction, leading to responses that may become disconnected, repetitive, or contextually irrelevant. This limitation becomes especially problematic in applications where maintaining continuity and coherence across long interactions is crucial. In customer service scenarios, project management tools, or educational applications, it is often necessary to remember detailed information from earlier exchanges or to track progress over time. However, traditional models constrained by a fixed token window struggle to maintain relevance in such situations, as they are unable to "remember" or access earlier parts of the conversation once the context window is exceeded. To address these limitations and enable LLMs to handle longer and more complex interactions, we employ a technique known as memory cognition. This technique extends the capabilities of LLMs by introducing mechanisms that allow the model to retain, recall, and dynamically integrate past interactions or information, even when those interactions fall outside the immediate context window. Memory Cognition Components in Generative AI Applications To successfully implement memory cognition in Generative AI (GenAI) applications, a comprehensive and structured approach is required. This involves integrating various memory components that work together to enable the AI system to retain, retrieve, and utilize relevant information across different interactions. Memory cognition enables the AI model to go beyond stateless, short-term processing, creating a more context-aware, adaptive, and intelligent system capable of long-term interaction and decision-making. Here are the key components of memory cognition that must be considered when developing a GenAI application: Short-Term Memory (Session Memory) Short-term memory, commonly referred to as session memory, encompasses the model's capability to retain context and information during a single interaction or session. This component is vital for maintaining coherence in multi-turn conversations and short-term tasks. It enables the model to sustain continuity in its responses by referencing earlier parts of the conversation, thereby preventing the user from repeating previously provided information. Typically, short-term memory is restricted to the duration of the interaction. Once the session concludes or a new session begins, the memory is either reset or gradually decayed. This ensures the model can recall relevant details from earlier in the same session, creating a more seamless and fluid conversational experience. For example, in a customer service chatbot, short-term memory allows the AI to remember a customer’s issue throughout the conversation, ensuring that the problem is consistently addressed without needing the user to restate it multiple times. However, in large language models, short-term memory is often limited by the model's context window, which is constrained by the maximum number of tokens it can process in a single prompt. As new input is added during the conversation, older dialogue parts may be discarded or forgotten, depending on the token limit. This necessitates careful management of short-term memory to ensure that critical information is retained throughout the session. Long-Term Memory Long-term memory significantly enhances the model's capability by allowing it to retain information beyond the scope of a single session. Unlike short-term memory, which is confined to a single interaction, long-term memory persists across multiple interactions, enabling the AI to recall important information about users, their preferences, past conversations, or task-specific details, regardless of the time elapsed between sessions. This type of memory is typically stored in an external database or knowledge repository, ensuring it remains accessible over time and does not expire when a session ends. Long-term memory is especially valuable in applications that require the retention of critical or personalized information, such as user preferences, history, or recurring tasks. It allows for highly personalized interactions, as the AI can reference stored information to tailor its responses based on the user's previous interactions. For example, in virtual assistant applications, long-term memory enables the AI to remember a user's preferences—such as their favorite music or regular appointment times—and use this information to provide customized responses and recommendations. In enterprise environments, such as customer support systems, long-term memory enables the AI to reference previous issues or inquiries from the same user, allowing it to offer more informed and tailored assistance. This capability enhances the user experience by reducing the need for repetition and improving the overall efficiency and effectiveness of the interaction. Long-term memory, therefore, plays a crucial role in enabling AI systems to deliver consistent, contextually aware, and personalized responses across multiple sessions. Memory Management Dynamic memory management refers to the AI model’s ability to intelligently manage and prioritize stored information, continuously adjusting what is retained, discarded, or retrieved based on its relevance to the task at hand. This capability is crucial for optimizing both short-term and long-term memory usage, ensuring that the model remains responsive and efficient without being burdened by irrelevant or outdated information. Effective dynamic memory management allows the AI system to adapt its memory allocation in real-time, based on the immediate requirements of the conversation or task. In practical terms, dynamic memory management enables the AI to prioritize important information, such as key facts, user preferences, or contextually critical data, while discarding or de-prioritizing trivial or outdated details. For example, during an ongoing conversation, the system may focus on retaining essential pieces of information that are frequently referenced or highly relevant to the user’s current query, while allowing less pertinent information to decay or be removed. This process ensures that the AI can maintain a clear focus on what matters most, enhancing both accuracy and efficiency. To facilitate this, the system often employs relevance scoring mechanisms to evaluate and rank the importance of stored memories. Each piece of memory can be assigned a priority score based on factors such as how frequently it is referenced or its importance to the current task. Higher-priority memories are retained for longer periods, while lower-priority or outdated entries may be marked for removal. This scoring system helps prevent memory overload by ensuring that only the most pertinent information is retained over time. Dynamic memory management also includes memory decay mechanisms, wherein older or less relevant information gradually "fades" or is automatically removed from storage, preventing memory bloat. This ensures that the AI retains only the most critical data, avoiding inefficiencies and ensuring optimal performance, especially in large-scale applications that involve substantial amounts of data or memory-intensive operations. To further optimize resource usage, automated processes can be implemented to "forget" memory entries that have not been referenced for a significant amount of time or are no longer relevant to ongoing tasks. These processes ensure that memory resources, such as storage and processing power, are allocated efficiently, particularly in environments with large-scale memory requirements. By dynamically managing memory, the AI can continue to provide contextually accurate and timely responses while maintaining a balanced and efficient memory system. Implementation of memory cognition in Snaplogic SnapLogic provides robust capabilities for integrating with databases and storage systems, making it an ideal platform for creating workflows to manage memory cognition in AI applications. In the following example, we demonstrate a basic memory cognition pattern using SnapLogic to handle both short-term and long-term memory. Overview of the Workflow The workflow begins by embedding the prompt into a vector representation. This vector is then used to retrieve relevant memories from long-term memory storage. Long-term memory can be stored in a vector database, which is well-suited for similarity-based retrieval, or in a traditional database or key-value store, depending on the application requirements. Similarly, short-term memory can be stored in a regular database or a key-value store to keep track of recent interactions. Retrieving Memories Once the prompt is embedded, we retrieve relevant information from both short-term and long-term memory systems. The retrieval process is based on similarity scoring, where the similarity score indicates the relevance of the stored memory to the current prompt. For long-term memory, this typically involves querying a vector database, while short-term memory may be retrieved from a traditional relational database or key-value store. After retrieving the relevant memories from both systems, the data is fed into a memory management module. In this example, we implement a simple memory management mechanism using a script within SnapLogic. Memory Management The memory management module employs a sliding window technique, which is a straightforward yet effective way to manage memory. As new memory is added, older memories gradually fade out until they are removed from the memory stack. This ensures that the AI retains the most recent and relevant information while discarding outdated or less useful memories. The sliding window mechanism prioritizes newer or more relevant memories, placing them at the top of the memory stack, while older memories are pushed out over time. Generating the Final Prompt and Interacting with the LLM Once the memory management module has constructed the full context by combining short-term and long-term memory, the system generates the final prompt. This prompt is then sent to the language model for processing. In this case, we use AWS Claude through the Message API as the large language model (LLM) to generate a response based on the provided context. Updating Memory Upon receiving a response from the LLM, the workflow proceeds to update both short-term and long-term memory systems to ensure continuity and relevance in future interactions: Long-Term Memory: The long-term memory is refreshed by associating the original prompt with the LLM's response. In this context, the query key corresponds to the initial prompt, while the value is the response generated by the model. This update enables the system to store pertinent knowledge that can be accessed during future interactions, allowing for more informed and contextually aware responses over time. Short-Term Memory: The short-term memory is updated by appending the LLM's response to the most recent memory stack. This process ensures that the immediate context of the current conversation is maintained, allowing for seamless transitions and consistency in subsequent interactions within the session. This example demonstrates how SnapLogic can be effectively used to manage memory cognition in AI applications. By integrating with databases and leveraging SnapLogic’s powerful workflow automation, we can create an intelligent memory management system that handles both short-term and long-term memory. The sliding window mechanism ensures that the AI remains contextually aware while avoiding memory overload, and AWS Claude provides the processing power to generate responses based on rich contextual understanding. This approach offers a scalable and flexible solution for managing memory cognition in AI-driven workflows.1.8KViews4likes0CommentsGenAI App Builder Getting Started Series: Part 2 - Purchase Order Processing
👋 Welcome! Hello everyone and welcome to our second guide in the GenAI App Builder Getting Started Series! First things first, GenAI App Builder is now generally available for all customers to purchase or test in SnapLabs. If you are a customer or partner who wants access to SnapLabs, please reach out to your Customer Success Manager and they can grant you access. If you are not yet a customer, you can check out our GenAI App Builder videos then when you’re ready to take the next step, request a demo with our sales team! 🤔 What is GenAI App Builder? If you’re coming here from Part 1, you may notice that GenAI Builder is now GenAI App Builder. Thank you to our customers who shared feedback on how we could improve the name to better align with the purpose. The original name had led to some confusion that its purpose was to train LLMs. 📑 Purchase Order Processing Example In this example we will demonstrate how to use GenAI in a SnapLogic Pipeline to act like a function written in natural language to extract information from a PDF. The slide below shows an example of how we use natural language to extract the required fields in JSON format that would allow us to make this small pattern part of a larger app or data integration workflow. ✅ Prerequisites In order to following along with this guide, you will need the items below to complete this guide: Access to GenAI App Builder (in your company’s organization or in SnapLabs) Your own API account with access to Azure OpenAI, OpenAI, Amazon Bedrock Claude. ⬆️ Import the pipeline At the bottom of this post you will find several files if you want to just use a pattern to see this in action in your own environment and explore it further. If you are familiar with SnapLogic and want to build the Pipeline on your own you can do that as well and just download the example PDF or try your own! PurchaseOrderExample.pdf InvoiceProcessing_CommunityArticlePipeline_2024_06_28.slp (zipped) Once you are signed in to SnapLogic or SnapLabs you can start with the steps below to import the Pipeline: In Designer, click the icon shown in the screenshot below to import the Pipeline: Select the file in the File Browser window that pops up In the Add New Pipeline panel that opens you can change name and project location if desired Press the Save button in the lower-right corner 🚧 Parsing the file If you imported the pipeline using the steps above, then your pipeline should look like the one below. The steps below assume you imported the pipeline. If you are familiar enough with SnapLogic to build this on your own you can drag the Snaps shown below to create the Pipeline then follow along with us. 🔈 NOTE: The instructions here are completed with the Amazon Bedrock Prompt Generator and the Anthropic Claude on AWS for the last two Snaps in the Pipeline. You can swap these out for Azure OpenAI or OpenAI Snaps if you prefer to use those LLMs. Click the File Reader Snap to open its settings Click the icon at the far right of the File field as shown in the screenshot below Click the Upload File button in the upper-right corner of the window that pops up Select the PDF file from your file browser (download the file “” at the bottom of this post if you have not already) Save and close the File Reader Snap once your file is selected No edits are needed for the PDF Parser Snap, so we'll skip over that one Click the Mapper Snap Add $text in the Expression field and $context in the Target path fields as shown below Save and close the Mapper Snap Click on the fourth Snap, the Prompt Generator Snap (we will demonstrate here with the Amazon Bedrock Prompt Generator Snap - you do not have to use Amazon Bedrock though, you can any of the other LLM Prompt Generators we have like Azure OpenAI, OpenAI, etc.) Click the Edit Prompt button as shown in the screenshot below so we can modify the prompt used for the LLM You should see a pre-generated prompted like the one below: Copy the prompt below and replace the default prompt: Instruction: Your task is to pull out the company name, the date created, date shipped, invoice number, P.O. number, vendor from vendor details, recipient name from recipient details, subtotal, 'Shipping & handling', tax rate, sales tax, and total from the context below. Give the results back in JSON. Context: {{context}} The Prompt Generator text should now look like the screenshot below: Click the Ok button in the lower-right corner to save our prompt changes Click on the last Snap, the Chat Completions Snap (we will demonstrate here with the Anthropic Claude on AWS Chat Completions Snap - you do not have to use Anthropic Claude on AWS though, you can any of the other LLM Chat Completions Snaps we have like Azure OpenAI, OpenAI, etc.) Click the Account tab Click Add Account; if you have an existing LLM account to use you can select that here and skip to step 22 below Select the type of account you want then press Continue - available options will depend on which LLM Chat Completions Snap you chose Enter in the required credentials for the LLM account you chose; here is an example of the Amazon Bedrock Account Press the Apply button when done entering the credentials Verify your account is now selected in the Account tab Click on the Settings Click on the Suggest icon to the right of the Model name field as shown in the screenshot below and select the model you want to use Type $prompt in the Prompt field as shown in the screenshot below: Expand the Model Parameters section by clicking on it (if you are using OpenAI or Azure OpenAI, you can leave Maximum Tokens blank; for Anthropic Claude on AWS you will need to increase Maximum Tokens from 200 to something higher; you can see where we set 50,000 below) Save and close the Chat Completions Snap 🎬 Testing our example At this point we are ready to test our Pipeline and observe the results! The screenshot below shows you where you can click to Validate the Pipeline, which should have every Snap turn green with preview output as shown below. If you have any errors or questions, please reply to share them with us! Here is the JSON output after the Anthropic Claude on AWS Chat Completions Snap (note that other LLMs will have different API output structures): Extras! Want to play with this further? Try adding a Copy Snap after the Mapper and sending the file to multiple LLMs at once then review the results. Try changing {{context}} in the Prompt Generator to something else so you can drop the Mapper from the pipeline 🏁 Wrapping up Congratulations, you have now completed at least one GenAI App Builder integration in SnapLogic! 😎 Stay tuned to the SnapLabs channel here in the Integration Nation for more content on GenAI App Builder in the future! Please share any thoughts, comments, concerns, or feedback in a reply or DM RogerSramkoski!1.8KViews4likes0CommentsUnlock the Future of AI: Discover Project SnapChain and Build Your Own RAG Chatbot
To say we've journeyed through a realm of groundbreaking advancements since the release of SnapGPT in August (has it already been 4 months?!) is just scratching the surface. At AWS re:Invent 2023 not only did we showcase SnapGPT, but we also unveiled our revolutionary generative AI capability - Project SnapChain. Our customers have been thrilled with how SnapGPT has transformed their pipeline creation and documentation processes. But the excitement doesn't stop there - they're eager to delve into building their own generative AI applications using their unique data and documents. We're inviting you to a special event - this Wednesday, December 6th, at 11 AM ET (8 AM PT) for an exclusive behind-the-scenes look at Project SnapChain in action. In this interactive webinar, we're not just sharing insights; we're guiding you on how to construct a RAG-based chatbot using nothing but Snaps, along with your data and documents. What's more, you'll have the chance to put this knowledge into practice in our SnapLabs environment! Join us to be part of this innovative journey and unlock the power to create. Reserve your spot now and be at the forefront of AI innovation. We can't wait to see you there! Sign up here: https://www.snaplogic.com/resources/webcasts/snaplabs-corner-december-20231.7KViews1like0CommentsMultimodal Processing in LLM
Multimodal processing in Generative AI represents a transformative leap in how AI systems extract and synthesize information from multiple data types—such as text, images, audio, and video—simultaneously. Unlike traditional single-modality AI models, which focus on one type of data, Multimodal systems integrate and process diverse data streams in parallel, creating a holistic understanding of complex scenarios. This integrated approach is critical for applications that require not just isolated insights from one modality, but a coherent synthesis across different data sources, leading to outputs that are contextually richer and more accurate. Generative AI, with multimodal processing, is redefining text extraction, surpassing traditional OCR by interpreting text within its visual and contextual environment. Unlike OCR, which only converts images to text, generative AI analyzes the surrounding image context, layout, and meaning, enhancing accuracy and depth. For instance, in complex documents, it can differentiate between headings, body text, and annotations, structuring information more intelligently. Additionally, it excels in low-quality or multilingual texts, making it invaluable in industries requiring precision and nuanced interpretation. In video analysis, a generative AI equipped with Multimodal processing can simultaneously interpret the visual elements of a scene, the audio (such as dialogue or background sounds), and any associated text (like subtitles or metadata). This allows the AI to produce a description or summary of the scene that is far more nuanced than what could be achieved by analyzing the video or audio alone. The interplay between these modalities ensures that the generated description reflects not only the visual and auditory content but also the deeper context and meaning derived from their combination. In tasks such as image captioning, Multimodal AI systems go beyond simply recognizing objects in a photo. They can interpret the semantic relationship between the image and accompanying text, enhancing the relevance and specificity of the generated captions. This capability is particularly useful in fields where the context provided by one modality significantly influences the interpretation of another, such as in journalism, where images and written reports must align meaningfully, or in education, where visual aids are integrated with instructional text. Multimodal processing enables AI to synthesize medical images (such as X-rays or MRIs) with patient history, clinical notes, and even live doctor-patient interactions in highly specialized applications like medical diagnostics. This comprehensive analysis allows the AI to provide more accurate diagnoses and treatment recommendations, addressing the complex interplay of symptoms, historical data, and visual diagnostics. Similarly, in customer service, Multimodal AI systems can improve communication quality by analyzing the textual content of a customer's inquiry and the tone and sentiment of their voice, leading to more empathetic and effective responses. Beyond individual use cases, Multimodal processing plays a crucial role in improving the learning and generalization capabilities of AI models. By training on a broader spectrum of data types, AI systems develop more robust, flexible models that can adapt to a wider variety of tasks and scenarios. This is especially important in real-world environments where data is often heterogeneous and requires cross-modal understanding to interpret fully. As Multimodal processing technologies continue to advance, they promise to unlock new capabilities across diverse sectors. In entertainment, Multimodal AI could enhance interactive media experiences by seamlessly integrating voice, visuals, and narrative elements. In education, it could revolutionize personalized learning by adapting content delivery to different sensory inputs. In healthcare, the fusion of Multimodal data could lead to breakthroughs in precision medicine. Ultimately, the ability to understand and generate contextually rich, Multimodal content positions Generative AI as a cornerstone technology in the next wave of AI-driven innovation. Multimodal Content Generator Snap The Multimodal Content Generator Snap encodes file or document inputs into the Snap's multimodal content format, preparing it for seamless integration. The output from this Snap must be connected to the Prompt Generator Snap to complete and format the message payload for further processing. This streamlined setup enables efficient multimodal content handling within the Snap ecosystem. The Snap Properties Type - Select the type of multimodal content. Content Type - Define the specific content type for data transmitted to the LLM. Content - Specify the content path to the multimodal content data for processing. Document Name - Name the document for reference and identification purposes. Aggregate Input - Enable this option to combine all inputs into a single content. Encode Base64 - Enable this option to convert the text input into Base64 encoding. Note: The Content property appears only if the input view is of the document type. The value assigned to Content must be in Base64 format for document inputs, while Snap will automatically use binary as content for binary input types. The Document Name can be set specifically for multimodal document types. The Encode Base64 property encodes text input into Base64 by default. If unchecked, the content will be passed through without encoding. Designing a Multimodal Prompt Workflow In this process, we will integrate multiple Snaps to create a seamless workflow for multimodal content generation and prompt delivery. By connecting the Multimodal Content Generator Snap to the Prompt Generator Snap, we configure it to handle multimodal content. The finalized message payload will then be sent to Claude by Anthropic Claude on AWS Messages. Steps: 1. Add the File Reader Snap: Drag and drop the File Reader Snap onto the designer canvas. Configure the File Reader Snap by accessing its settings panel, then select a file containing images (e.g., a PDF file). Download the sample image files at the bottom of this post if you have not already. Sample image file (Japan_flowers.jpg) 2. Add the Multimodal Content Generator Snap: Drag and drop the Multimodal Content Generator Snap onto the designer and connect it to the File Reader Snap. Open its settings panel, select the file type, and specify the appropriate content type. Here's a refined description of the output attributes from the Multimodal Content Generator: sl_content: Contains the actual content encoded in Base64 format. sl_contentType: Indicates the content type of the data. This is either selected from the configuration or, if the input is a binary, it extracts the contentType from the binary header. sl_type: Specifies the content type as defined in the Snap settings; in this case, it will display "image." 3. Add the Prompt Generator Snap: Add the Prompt Generator Snap to the designer and link it to the Multimodal Content Generator Snap. In the settings panel, enable the Advanced Prompt Output checkbox and configure the Content property to use the input from the Multimodal Content Generator Snap. Click “Edit Prompt” and input your instructions 4. Add and Configure the LLM Snap: Add the Anthropic Claude on AWS Message API Snap as the LLM. Connect this Snap to the Prompt Generator Snap. In the settings, select a model that supports multimodal content. Enable the Use Message Payload checkbox and input the message payload in the Message Payload field. 5. Verify the Result: Review the output from the LLM Snap to ensure the multimodal content has been processed correctly. Validate that the generated response aligns with the expected content and format requirements. If adjustments are needed, revisit the settings in previous Snaps to refine the configuration. Multimodal Models for Advanced Data Extraction Multimodal models are redefining data extraction by advancing beyond traditional OCR capabilities. Unlike OCR, which primarily converts images to text, these models directly analyze and interpret content within PDFs and images, capturing complex contextual information such as layout, formatting, and semantic relationships that OCR alone cannot achieve. By understanding both textual and visual structures, multimodal AI can manage intricate documents, including tables, forms, and embedded graphics, without requiring separate OCR processes. This approach not only enhances accuracy but also optimizes workflows by reducing dependency on traditional OCR tools. In today’s data-rich environment, information is often presented in varied formats, making the ability to analyze and derive insights from diverse data sources essential. Imagine managing a collection of invoices saved as PDFs or photos from scanners and smartphones, where a streamlined approach is needed to interpret their contents. Multimodal large language models (LLMs) excel in these scenarios, enabling seamless extraction of information across file types. These models support tasks such as automatically identifying key details, generating comprehensive summaries, and analyzing trends within invoices whether from scanned documents or images. Here’s a step-by-step guide to implementing this functionality within SnapLogic. Sample invoice files (download the files at the bottom of this post if you have not already) Invoice1.pdf Invoice2.pdf Invoice3.jpeg (Sometimes, the invoice image might be tilted) Upload the invoice files Open Manager page and go to your project that will be used to store the pipelines and related files Click the + (plus) sign and select File The Upload File dialog pops up. Click “Choose Files” to select all the invoice files both PDF and image formats (download the sample invoice files at the bottom of this post if you have not already) Click Upload button and the uploaded files will be shown. Building the pipeline Add the JSON Generator Snap: Drag and drop the JSON Generator onto the designer canvas. Click on the Snap to open settings, then click the "Edit JSON" button Highlight all the text from the template and delete it. Paste all invoice filenames in the format below. The editor should look like this. Click "OK" in the lower-right corner to save the prompt Save the settings and close the Snap Add the File Reader Snap: Drag and drop the File Reader Snap onto the designer canvas Click the Snap to open the configuration panel. Connect the Snap to the JSON Generator Snap by following these steps: Select Views tab Click plus(+) button on the Input pane to add the input view(input0) Save the configuration The Snap on the canvas will have the input view. Connecting it to the JSON Generator Snap In the configuration panel, select the Settings tab Set the File field by enabling expression by clicking the equal sign in front of the text input and set it to $filename to read all the files we specified in the JSON Generator Snap Validate the pipeline to see the File Reader output. Fields that will be used in the Multimodal Content Generator Snap Content-type shows file content type Content-location shows the file path and it will be used in the document name Add the Multimodal Content Generator Snap: Drag and drop the Multimodal Content Generator Snap onto the designer canvas and connect to the File Reader Snap Click the Snap to open the settings panel and configure the following fields: Type: enable the expression set the value to $['content-location'].endsWith('.pdf') ? 'document' : 'image' Document name enable the expression set the value to $['content-location'].snakeCase() Use the snake-case version of the file path as the document name to identify each file and make it compatible with the Amazon Bedrock Converse API. In snake case, words are lowercase and separated by underscores(_). Aggregate input check the checkbox Use this option to combine all input files into a single document. The settings should now look like the following Validate the pipeline to see the Multimodal Content Generator Snap output. The preview output should look like the below image. The sl_type will be document for the pdf file and image for the image file and the name will be the simplified file path. Add the Prompt Generator Snap: Drag and drop the Prompt Generator Snap onto the designer canvas and connect to the Multimodal Content Generator Snap Click the Snap to open the settings panel and configure the following fields: Enable the Advanced Prompt Output checkbox Set the Content to $content to use the content input from the Multimodal Content Generator Snap Click “Edit Prompt” and input your instructions. For example, Based on the total quantity across all invoices, which product has the highest and lowest purchase quantities, and in which invoices are these details found? Add and Configure the LLM Snap: Add the Amazon Bedrock Converse API Snap as the LLM Connect this Snap to the Prompt Generator Snap Click the Snap to open the configuration panel Select the Account tab and select your account Select the Settings tab Select a model that supports multimodal content. Enable the Use Message Payload checkbox Set the Message Payload to $messages to use the message from the Prompt Generator Snap Verify the result: Validate the pipeline and open the preview of the Amazon Bedrock Converse API Snap. The result should look like the following: In this example, the LLM successfully processes invoices in both PDF and image formats, demonstrating its ability to handle diverse inputs in a single workflow. By extracting and analyzing data across these formats, the LLM provides accurate responses and insights, showcasing the efficiency and flexibility of multimodal processing. You can adjust the queries in the Prompt Generator Snap to explore different results.1.6KViews4likes0Comments