Gartner - 10 Best Practices for Scaling Generative AI
I recently came back from Gartner's Data and Analytics Summit in Orlando, Floria. As expected, GenAI was a big area of focus and interest. One of the sessions that I attended was "10 best practices for scaling Generative AI." The session highlighted the rapid adoption of generative AI, with 45% of organizations piloting and 10% already in production as of September 2023. While the benefits like workforce productivity, multi-domain applications, and competitive differentiation are evident, there are also significant risks around data loss, hallucinations, black box nature, copyright issues, and potential misuse. Through 2025, Gartner predicts at least 30% of generative AI projects will be abandoned after proof-of-concept due to issues like poor data quality, inadequate risk controls, escalating costs, or unclear business value. To successfully scale generative AI, the session outlined 10 best practices: Continuously prioritize use cases aligned to the organization's AI ambition and measure business value. Create a decision framework for build vs. buy, evaluating model training, security, integration, and pricing. Pilot use cases with an eye towards future scalability needs around data, privacy, security etc. Design a composable platform architecture to improve flexibility and avoid vendor lock-in. Put responsible AI principles at the forefront across fairness, ethics, privacy, compliance etc. Evaluate risk mitigation tools. Invest in data and AI literacy programs across functions and leadership. Instill robust data engineering practices like knowledge graphs and vector embeddings. Enable seamless human-AI collaboration with human-in-the-loop and communities of practice. Apply FinOps practices to monitor, audit and optimize generative AI costs. Adopt an agile, product-centric approach with continuous updates based on user feedback. The session stressed balancing individual and organizational needs while making responsible AI the cornerstone for scaling generative AI capabilities. Hope you found these useful. What are you thoughts on best practices for scaling GenAI?5.4KViews0likes1CommentGenAI App Builder Getting Started Series: Part 1 - HR Q&A example
👋 Welcome! Hello everyone, and welcome to our technical guide to getting started with GenAI App Builder on SnapLogic! At the time of publishing, GenAI App Builder is available for testing and will be generally available in our February release. For existing customers and partners, you can request access for testing GenAI App Builder by speaking to your Customer Success Manager or other member of your account team. If you're not yet a customer, you can speak to your Sales team about testing GenAI App Builder. 🤔 What is GenAI App Builder? Before we begin, let's take a moment to understand what GenAI App Builder is and at least a high-level talk about the components. GenAI App Builder is the latest offering in SnapLogic AI portfolio, focused on helping modern enterprises create applications with Generative AI faster, using a low-/no-code interface. That feels like a mouthful of buzzwords, so let me paint a picture (skip this if you're familiar with GenAI or watch our video, "Enabling employee and customer self-service"). Imagine yourself as a member of an HR team responsible for recruiting year round. Every new employee has an enrollment period after or just before their start date, and every existing employee has open enrollment once per year. During this time, employees need to choose between different medical insurance offerings, which usually involves a comparison of deductibles, networks, max-out-of-pocket, and other related features and limits. As you're thinking about all of this material, sorting out how to explain it all to your employees, you're interrupted by your Slack or Teams DM noise. Bing bong! Questions start flooding in: Hi, I'm a new employee and I'm wondering, when do I get paid? What happens payday is on a weekend or holiday? Speaking of holidays, what are company-recognized holidays this year? Hi, my financial account said I should change my insurance plan to one with an HSA. Can you help me figure out which plan(s) include an HSA and confirm the maximum contribution limits for a family this year? Hi, how does vacation accrual work? When does vacation rollover? Is unpaid vacation paid out or lost? All these questions and many others are answered in documents the HR team manages, including the employee handbook, insurance comparison charts, disability insurance sheets, life insurance sheets, other data sheets, etc. What if, instead of you having to answer all these questions, you would leverage a human-sounding large language model (LLM) to field these questions for you by making sure they referenced only the source documents you provide, so you don't have to worry about hallucinations?! Enter GenAI Builder! 🏗 Building an HR Q&A example Once you have access to test GenAI App Builder, you can use the following steps to start building out an HR Q&A example that will answer questions using only the employee handbook or whichever document that you provide. In this guide we will cover the two pipelines used, one that loads data and one that we will use to answer questions. We will not get into Snap customization or Snap details with this guide - it is just meant to show a quick use case. We do assume that you are familiar enough with SnapLogic to create a new Pipeline or import and existing one, search for Snaps, connect Snaps, and a few other simple steps. We will walk you through anything that is new to SnapLogic or that needs some additional context. We also assume you have some familiarity with Generative AI in this guide. We will also make a video with similar content in the near future, so I'll update or reply to this post once that content is available. Prerequisites In order to complete this guide, you will need the items below regardless of whether or not you use the Community-supported chatbot UI from SnapLogic. Access to a Pinecone instance (sign up for a free account at https://www.pinecone.io) with an existing index Access to Azure OpenAI or OpenAI You need a file to load, such as your company's employee handbook Loading data Our first step is to load data into the vector database using a Pipeline similar to the one below, which we will call the "Indexer" Pipeline since it helps populate the Pinecone Index. If you cannot find the patterns in the Pattern Library, you can find it attached below as "Indexer_Feb2024.slp". The steps below assume you have already imported the Pipeline or are building it as we go through. To add more color here, loading data into the vector database is only something that needs to be done when the files are updated. In the HR scenario, this might be once a year for open enrollment documents and maybe a few times a year for the employee handbook. We will explore some other use cases in the future where document updates would be much frequent. Click on the "File Reader" Snap to open its settings Click on the icon at the far right of the "File" field as shown in the screenshot below Click the "Upload" button in the upper-right corner of the window that pops up Select the PDF file from your local system that you want to index (we are using an employee handbook and you're welcome to do the same) to upload it, then make sure it is selected Save and close the "File Reader" Snap once your file is selected Leave the "PDF Parser" Snap with default settings Click on the "Chunker" Snap to open it, then mirror the settings in the screenshot below. Now open the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap (you may need to replace the embedder that came in the Pattern or import with the appropriate one you have an account with). Go to the "Account" tab and create a new account for the embedder you're using. You need to replace the variables {YOUR_ACCOUNT_LABEL} with a label for the account that makes sense for you, then replace {YOUR_ENDPOINT} with the appropriate snippet from your Azure OpenAI endpoint. Validate the account if you can to make sure it works. After you save your new account you can go back to the main "Settings" tab on the Snap. If the account setup was successful, you should now be able to click the chat bubble icon at the far right of the "Deployment ID" field to suggest a "Deployment ID" - in our environment shown in the screenshot below, you can see we have one named "Jump-emb-ada-002" which I can now select. Finally, make sure the "Text to embed" field is set as shown below, then save and close this Snap. Now open the "Mapper" Snap so we can map the output of the embedder Snap to the "Pinecone Upsert" Snap as shown in the screenshot below. If it is difficult to see the mappings in the screenshot above, here is a zoomed in version: For a little more context here, we're mapping the $embedding object coming out of the embedder Snap to the $values object in Pinecone, which is required. If that was all you mapped though, your Q&A example would always reply with something like "I don't know" because there is no data. To do that, we need to make use of the very flexible "metadata" object in Pinecone by mapping $original.chunk to $metadata.chunk. We also statically set $metadata.source to "Employee Handbook.pdf" which allows the retriever Pipeline to return the source file used in answering a question (in a real-world scenario, you would probably determine the source dynamically/programmatically such as using the filename so this pipeline could load other files too). Save and close the "Mapper" Snap Finally, open the "Pinecone Upsert" Snap then click the "Account" tab and create a new account with your Pinecone API Key and validate it to make sure it works before saving Back on the main "Settings" tab of the "Pinecone Upsert" Snap, you can now click on the chat bubble icon to suggest existing indexes in Pinecone. For example, in our screenshot below you can see we have four which have been obscured and one named "se-demo." Indexes cannot be created on the fly, so you will have to make sure the index is created in the Pinecone web interface. The last setting we'll talk about for the Indexer pipeline is the "Namespace" field in the "Pinecone Upsert" Snap. Setting a namespace is optional. Namespaces in Pinecone create a logical separation between vectors within an index and can be created on-the-fly during Pipeline execution. For example, you could create an index like "2024_enrollment" for all documents published in 2024 for open enrollment and another called "2024_employeehandbook" to separate those documents into separate namespaces. Although these can be used just for internal purposes of organization, you can also direct a chatbot to only use one namespace to answer questions. We'll talk about this more in the "Answering Questions" section below which covers the Retriever Pipeline. Save and close the "Pinecone Upsert" Snap You should now be able to validate the entire Pipeline to see what the data looks like as it flows through the Snaps, and when you're ready to commit the data to Pinecone, you can Execute the Pipeline. Answering Questions To answer questions using the data we just loaded into Pinecone, we're going to recreate or import the Retriever Pipeline (attached as "Retriever_Feb2024.slp"). If you import the Pipeline you may need to add additional "Mapper" Snaps as shown below. We will walk through that in the steps below, just know this is what we'll end up with at the end of our first article. The screenshot above shows what the pattern will look like when you import it. Since this first part of the series will only take us up to the point of testing in SnapLogic, our first few steps will involve some changes with that in mind. Right-click on the "HTTP Router" Snap, click "Disable Snap" Click the circle between "HTTP Router" and embedder Snap to disconnect them Drag the "HTTP Router" Snap somewhere out of the way on the canvas (you can also delete it if you're comfortable replacing it later); your Pipeline should now look like this: In the asset palette on the left, search for the "JSON Generator" (it should appear before you finish typing that all out): Drag a "JSON Generator" onto the canvas, connecting it to the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap Click on the "JSON Generator" to open it, then click on the "Edit JSON" button in the main Settings tab Highlight all the text from the template and delete it so we have a clean slate to work with Paste in this text, replacing "Your question here." with an actual question you want to ask that can be answered from the document you loaded with your Indexer Pipeline. For example, I loaded an employee handbook and I will ask the question, "When do I get paid?" [ { "prompt" : "Your question here." } ] Your "JSON Generator" should now look something like this but with your question: Click "OK" in the lower-right corner to save the prompt Click no the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap to view its settings Click on the Account tab, then use the drop-down box to select the account you created in the section above ("Loading Data", steps 8-9) Click on the chat bubble icon to suggest "Deployment IDs" and choose the same one you chose in "Loading Data", step 10 Set the "Text to embed" field to $prompt as shown in the screenshot below: Save and close the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap Click on the Mapper immediately after the embedder Snap Create a mapping for $embedding that maps to $vector Check the "Pass through" box; this Mapper Snap should now look like this: Save and close this "Mapper" Open the "Pinecone Query" Snap Click the Account tab, then use the drop-down to select the Pinecone account you created in "Loading Data", step 14 Use the chat bubble on the right side of the "Index name" field to select your existing Index [OPTIONAL] Use the chat bubble on the right side of the "Namespace" field to select your existing Namespace, if you created one; the "Pinecone Query" Snap should now look like this: Save and close the "Pinecone Query" Snap. Click on the "Mapper" Snap after the "Pinecone Query" Snap. In this "Mapper" we need to map the three items listed below, which are also shown in the following screenshot. If you're not familiar with the $original JSON key, it occurs when an upstream Snap has implicit pass through, or like the "Mapper" in step 17, we explicitly enable pass through, allowing us to access the original JSON document that went into the upstream Snap. (NOTE: If you're validating your pipeline along the way or making use of our Dynamic Validation, you may notice that no Target Schema shows up in this Mapper until after you complete steps 27-30.) Map $original.original.prompt to $prompt Map jsonPath($, "$matches[*].metadata.chunk") to jsonPath($, "$context[*].data") Map jsonPath($, "$matches[*].metadata.source") to jsonPath($, "$context[*].source") Save and close that "Mapper". Click on the "Azure OpenAI Prompt Generator" or "OpenAI Prompt Generator" so we can set our prompt. Click on the "Edit prompt" button and make sure your default prompt looks like the screenshot below. On lines 4-6 you can see we are using mustache templating like {{#context}} {{source}} {{/context}} which is the same as the jsonPath($, "$context[*].source") from the "Mapper" in step 25 above. We'll talk about this more in future articles - for now, just know this will be a way for you customize the prompt and data included in the future. Click "OK" in the lower-right corner Save and close the prompt generator Snap Click on the "Azure OpenAI Chat Completions" or "OpenAI Chat Completions" Snap Click the "Account" tab then use the drop-down box to select the account you created earlier Click the chat bubble icon to the far right of the "Deployment ID" field to suggest a deployment; this ID may be different than the one you've chosen in previous "Azure OpenAI" or "OpenAI" Snaps since we're selecting an LLM this team instead of an embedding model Set the "Prompt" field to $prompt; your Snap should look something like this: Save and close the chat completions Snap Testing our example Now it's time to validate our pipeline and take a look at the output! Once validated the Pipeline should look something like this: If you click the preview data output on the last Snap, the chat completions Snap, you should see output that looks like this: The answer to our prompt is under $choices[0].message.content. For the test above, I asked the question "When do I get paid?" against an employee handbook and the answer was this: Employees are paid on a semi-monthly basis (24 pay periods per year), with payday on the 15th and the last day of the month. If a regular payday falls on a Company-recognized holiday or on a weekend, paychecks will be distributed the preceding business day. The related context is retrieved from the following sources: [Employee Handbook.pdf] Wrapping up Stay tuned for further articles in the "GenAI App Builder Getting Started Series" for more use cases, closer looks at individual Snaps and their settings, and even how to connect a chat interface! Most if not all of these articles will also have an associated video if you learn better that way! If you have issues with the setup, find a missing step or detail, please reply to this thread to let us know!4.1KViews3likes1CommentSnapGPT - Security and Data Handling Protocols
Authors: Aaron Kesler, Jump Thanawut, Scott Monteith Security and Data Handling Protocols for SnapGPT SnapLogic acknowledges and respects the data concerns of our customers. The purpose of this document is to present our data handling and global data protection standards for SnapGPT. Overview & SnapLogic’s Approach to AI / LLM: SnapLogic utilizes high-quality Enterprise Language Learning Models (LLMs), selecting the most appropriate one for each specific task. Current support includes Azure OpenAI GPT, Anthropic Claude on Amazon Bedrock, and Google Vertex PaLM. Product & Data: Product Features & Scope: SnapGPT offers a range of features, each designed to enhance user experience and productivity in various aspects of pipeline and SQL query generation: Input Prompts: This feature allows customers to interact directly with the LLM by providing input prompts. These prompts are the primary method through which users can specify their requirements or ask questions to the LLM. Describe Pipeline: This skill enables users to obtain a comprehensive description of an existing pipeline. It helps in understanding and documenting the pipeline's structure and functionality. Analyze Pipeline: This feature ingests the entire pipeline configuration and analyzes it to make suggestions for optimization and improvement. It assists users in enhancing the efficiency and effectiveness of their pipelines. Mapper Configuration: Facilitates the configuration of the mapper snap by generating expressions to simplify the process of mapping input to output. Pipeline Generation: Users can create prototype pipelines using simple input prompts. This feature is geared towards streamlining the pipeline creation process, making it more accessible and less time-consuming. SQL Generation without Schema: Tailored for situations where the schema information is not available or cannot be shared, this feature generates SQL queries based solely on the customer's prompt, offering flexibility and convenience. SQL Generation with Schema (coming feb 2024): This advanced feature generates SQL queries by taking into account the schema of the input database. It is particularly useful for creating contextually accurate and efficient SQL queries. Data Usage & Opt-Out Options: At SnapLogic, we recognize the importance of data security and user privacy in the rapidly evolving Generative AI space. SnapGPT has been designed with these principles at its core, ensuring that customers can leverage the power of AI and machine learning while maintaining control over their data. Our approach prioritizes transparency, giving users the ability to opt-out of data sharing, and aligning with industry best practices for data handling. This commitment reflects our dedication to not only providing advanced AI solutions but also ensuring that these solutions align with the highest standards of privacy and data protection. Data Usage in SnapGPT: SnapGPT is designed to handle customer data with the utmost care and precision, ensuring that data usage is aligned with the functionality of each feature: Customer Input and Interaction: Customer inputs, such as prompts or pipeline configurations, are key to the functionality of SnapGPT. This data is used solely for the purpose of processing specific requests and generating responses or suggestions relevant to the user's query. No data is retained for model training purposes. Feature-Specific Data Handling: Each feature/skill of SnapGPT, like pipeline analysis or SQL generation, uses customer data differently. See the table below for details on each skill. Skill Name Description of the Skill Data Transferred to LLM Input Prompts Direct input prompts from customers are transferred to the LLM and tracked by SnapLogic analytics. Prompt details only; these are not stored or used for training by the LLM. Describe & Analyze Pipeline Allows customers to describe a pipeline, with the entire pipeline configuration relayed to the LLM. Entire pipeline configuration excluding account credential information. Mapper Configuration Enables sending input schema information within the prompt to the LLM for the “Mapper configuration” feature. Input schema information without account credential information. Pipeline Generation Uses input prompts to create pipeline prototypes by transmitting them to the LLM. Input prompts only; not stored or used for training by the LLM. SQL Generation W/out Schema Generates SQL queries based only on the customer's prompt in situations where schema information cannot be shared. Only the customer's prompt; no schema information is used. SQL Generation W/ Schema (Feb 2024) Generates accurate SQL queries by considering the schema of the input database. Schema of the input database excluding any account credentials, enhancing query accuracy. Future Adaptations: In the near future, we intend to offer customers opt-out options. Choosing to opt-out of including any environment-specific data in SnapGPT prompts can impact the quality of response from SnapGPT as it will lack additional context. As of the current version, usage of SnapGPT will include sending the data from the features listed above to the LLMs. We recommend that customers who are not comfortable with the described data transfers to wait for the opt-out option to become available. Impact of Opting Out: Choosing to opt-out of data sharing may impact the functionality and effectiveness of SnapGPT. For example, opting out of schema retrieval in SQL Generation may lead to less precise query outputs. Users are advised to consider these impacts when setting their data sharing preferences. Data Processing: Architecture: Data Flow: Data Retention & Residency: SnapLogic is committed to ensuring the secure handling and appropriate residency of customer data. Our data retention policies are designed to respect customer privacy while providing the necessary functionality of SnapGPT: Data Retention: No Retention for Model Training: SnapGPT is designed to prioritize user privacy. Therefore, no customer data processed by SnapGPT is retained for the purpose of model training. This ensures that user data is not used in any way to train or refine the underlying AI models. Storing Usage Data for Adoption Tracking: While we do not retain data for model training, SnapLogic stores usage data related to SnapGPT in Heap Analytics. This is strictly for the purpose of tracking product adoption and usage patterns. The collection of usage data helps us understand how our customers interact with SnapGPT, enabling us to continuously improve the product and tailor it to user needs. Data Residency: Location-Based Data Storage: Our control planes in the United States and the EMEA region adhere to the specific data residency policies of these locations. We ensure compliance with regional data protection and privacy laws, offering customers the assurance that their data is managed in accordance with local regulations. Controls – Admin, Groups, Users: SnapLogic provides robust control mechanisms for administrators, while ensuring that group and user-level controls align with organizational policies: Administrators have granular control over the use of SnapGPT within their organization. They can determine what data is shared with the LLM and have the ability to opt out of data sharing to meet specific data retention and sharing policies. Additionally, admins can control user access to various features and skills, ensuring alignment with organizational needs and security policies. Group Controls: Currently, groups do not have specific controls over SnapGPT. Group-level policies are managed by administrators to ensure consistency and security across the organization. User Controls: Users can access and utilize the features and skills of SnapGPT to which they are entitled. User entitlements are managed by administrators, ensuring that each user has access to the necessary tools for their role while maintaining data security and compliance. Guidelines for Secure and Compliant use of SnapGPT At SnapLogic, we understand the critical importance of data security and compliance in today’s digital landscape. As such, we are dedicated to providing our customers with the tools and knowledge necessary to utilize SnapGPT in a way that aligns with their internal information security (InfoSec) and privacy policies. This section offers guidelines to help ensure that your interaction with SnapGPT is both secure and compliant with your organizational standards. Customer Data Control: Customers are encouraged to actively manage and control the data they share with SnapGPT. By understanding and utilizing the available admin and user controls, customers can ensure that their use of SnapGPT aligns with their internal InfoSec and privacy policies. Best Practices for Data Sharing: We recommend that customers review and follow best practices for data sharing, especially when working with sensitive or confidential information. This includes using anonymization or pseudonymization techniques where appropriate, and sharing only the data in prompts and pipelines that is necessary for the task at hand. Integrating with Internal Policies: Customers should integrate their use of SnapGPT with their existing InfoSec and privacy frameworks. This integration ensures that data handling through SnapGPT remains consistent with the organization’s overall data protection strategy. Regular Review and Adjustment: Customers are advised to regularly review their data sharing settings and practices with SnapGPT, adjusting them as necessary to remain aligned with evolving InfoSec and privacy requirements. Training and Awareness: We also suggest that customers provide regular training and awareness programs to their users about the responsible and secure use of AI tools like SnapGPT, emphasizing the importance of data privacy and protection. Compliance: For detailed information on SnapLogic’s commitment to compliance with various regulatory standards and data security measures, please visit our comprehensive overview at SnapLogic Security & Compliance (https://www.snaplogic.com/security-standards). This resource provides an in-depth look at how we adhere to global data protection regulations, manage data security, and ensure the highest standards of compliance across all our products, including SnapGPT. For specific compliance inquiries or more information on how we handle compliance in relation to SnapGPT, please contact the SnapLogic Compliance Team at Security@snaplogic.com. For further details or inquiries regarding SnapGPT or any other SnapLogic AI services, please contact our SnapLogic AI Services Team ( ai-services@snaplogic.com). For more information on SnapLogic Security and Compliance: https://www.snaplogic.com/security-standards3.9KViews2likes0CommentsEmbeddings and Vector Databases
What are embeddings Embeddings are numerical representations of real-world objects, like text, images or audio. They are generated by machine learning models as vectors, an array of numbers, where the distance between vectors can be seens as the degree of similarity between objects. While an embedding model may have its own meaning for each of the dimensions, there’s no guarantee between embedding models of the meaning for each of the dimensions used by the embedding models. For example, the word “cat”, “dog” and “apple” might be embedded into the following vectors: cat -> (1, -1, 2) dog -> (1.5, -1.5, 1.8) apple -> (-1, 2, 0) These vectors are made-up for a simpler example. Real vectors are much larger, see the Dimension section for details. Visualizing these vectors as points in a 3D space, we can see that "cat" and "dog" are closer, while "apple" is positioned further away. Figure 1. Vectors as points in a 3D space By embedding words and contexts into vectors, we enable systems to assess how related two embedded items are to each other via vector comparison. Dimension of embeddings The dimension of embeddings refers to the length of the vector representing the object. In the previous example, we embedded each word into a 3-dimensional vector. However, a 3-dimensional embedding inevitably leads to a massive loss of information. In reality, word embeddings typically require hundreds or thousands of dimensions to capture the nuances of language. For example, OpenAI's text-embedding-ada-002 model outputs a 1536-dimensional vector Google Gemini's text-embedding-004 model outputs a 768-dimensional vector Amazon Titan's amazon.titan-embed-text-v2:0 model outputs a default 1024-dimensional vector Figure 2. Using text-embedding-ada-002 to embed the sentence “I have a calico cat.” In short, an embedding is a vector that represents a real-world object. The distance between these vectors indicates the similarity between the objects. Limitation of embedding models Embedding models are subject to a crucial limitation: the token limit, where a token can be a word, punctuation mark, or subword part. This constraint defines the maximum amount of text a model can process in a single input. For instance, the Amazon Titan Text Embeddings models can handle up to 8,192 tokens. When input text exceeds the limit, the model typically truncates it, discarding the remaining information. This can lead to a loss of context and diminished embedding quality, as crucial details might be omitted. To address this, several strategies can help mitigate its impact: Text Summarization or Chunking: Long texts can be summarized or divided into smaller, manageable chunks before embedding. Model Selection: Different embedding models have varying token limits. Choosing a model with a higher limit can accommodate longer inputs. What is a Vector Database Vector databases are optimized for storing embeddings, enabling fast retrieval and similarity search. By calculating the similarity between the query vector and the other vectors in the database, the system returns the vectors with the highest similarity, indicating the most relevant content. The following diagram illustrates a vector database search. A query vector 'favorite sport' is compared to a set of stored vectors, each representing a text phrase. The nearest neighbor, 'I like football', is returned as the top result. Figure 3. Vector Query Example Figure 4. Store Vectors into Database Figure 5. Retrieve Vectors from Database When working with vector databases, two key parameters come into play: Top K and similarity measure (or distance function). Top K When querying a vector database, the goal is often to retrieve the most similar items to a given query vector. This is where the Top K concept comes into play. Top K refers to retrieving the top K most similar items based on a similarity metric. For instance, if you're building a product recommendation system, you might want to find the top 10 products similar to the one a user is currently viewing. In this case, K would be 10. The vector database would return the 10 product vectors closest to the query product's vector. Similarity Measures To determine the similarity between vectors, various distance metrics are employed, including: Cosine Similarity: This measures the cosine of the angle between two vectors. It is often used for text-based applications as it captures semantic similarity well. A value closer to 1 indicates higher similarity. Euclidean Distance: This calculates the straight-line distance between two points in Euclidean space. It is sensitive to magnitude differences between vectors. Manhattan Distance: Also known as L1 distance, it calculates the sum of the absolute differences between corresponding elements of two vectors. It is less sensitive to outliers compared to Euclidean distance. Figure 6. Similarity Measures There are many other similarity measures not listed here. The choice of distance metric depends on the specific application and the nature of the data. It is recommended to experiment with various similarity metrics to see which one produces better results. What embedders are supported in SnapLogic As of October 2024, SnapLogic has supported embedders for major models and continues to expand its support. Supported embedders include: Amazon Titan Embedder OpenAI Embedder Azure OpenAi Embedder Google Gemini Embedder What vector databases are supported in SnapLogic Pinecone OpenSearch MongoDB Snowflake Postgres AlloyDB Pipeline examples Embed a text file Read the file using the File Reader snap. Convert the binary input to a document format using the Binary to Document snap, as all embedders require document input. Embed the document using your chosen embedder snap. Figure 7. Embed a File Figure 8. Output of the Embedder Snap Store a Vector Utilize the JSON Generator snap to simulate a document as input, containing the original text to be stored in the vector database. Vectorize the original text using the embedder snap. Employ a mapper snap to format the structure into the format required by Pinecone - the vector field is named "values", and the original text and other relevant data are placed in the "metadata" field. Store the data in the vector database using the vector database's upsert/insert snap. Figure 9. Store a Vector into Database Figure 10. A Vector in the Pinecone Database Retrieve Vectors Utilize the JSON Generator snap to simulate the text to be queried. Vectorize the original text using the embedder snap. Employ a mapper snap to format the structure into the format required by Pinecone, naming the query vector as "vector". Retrieve the top 1 vector, which is the nearest neighbor. Figure 11. Retrieve Vectors from a Database [ { "content" : "favorite sport" } ] Figure 12. Query Text Figure 13. All Vectors in the Database { "matches": [ { "id": "db873b4d-81d9-421c-9718-5a2c2bd9e720", "score": 0.547461033, "values": [], "metadata": { "content": "I like football." } } ] } Figure 14. Pipeline Output: the Closest Neighbor to the Query Embedder and vector databases are widely used in applications such as Retrieval Augmented Generation (RAG) and building chat assistants. Multimodal Embeddings While the focus thus far has been on text embeddings, the concept extends beyond words and sentences. Multimodal embeddings represent a powerful advancement, enabling the representation of various data types, such as images, audio, and video, within a unified vector space. By projecting different modalities into a shared semantic space, complex relationships and interactions between these data types can be explored. For instance, an image of a cat and the word "cat" might be positioned closely together in a multimodal embedding space, reflecting their semantic similarity. This capability opens up a vast array of possibilities, including image search with text queries, video content understanding, and advanced recommendation systems that consider multiple data modalities.3.2KViews5likes0CommentsRecipes for Success with SnapLogic’s GenAI App Builder: From Integration to Automation
For this episode of the Enterprise Alchemists podcast, Guy and Dominic invited Aaron Kesler and Roger Sramkoski to join them to discuss why SnapLogic's GenAI App Builder is the key to success with AI projects. Aaron is the Senior Product Manager for all things AI at SnapLogic, and Roger is a Senior Technical Product Marketing Manager focused on AI. We kept things concrete, discussing real-world results that early adopters have already been able to deliver by using SnapLogic's integration capabilities to power their new AI-driven experiences.2.3KViews4likes2CommentsUnlock the Future of AI: Discover Project SnapChain and Build Your Own RAG Chatbot
To say we've journeyed through a realm of groundbreaking advancements since the release of SnapGPT in August (has it already been 4 months?!) is just scratching the surface. At AWS re:Invent 2023 not only did we showcase SnapGPT, but we also unveiled our revolutionary generative AI capability - Project SnapChain. Our customers have been thrilled with how SnapGPT has transformed their pipeline creation and documentation processes. But the excitement doesn't stop there - they're eager to delve into building their own generative AI applications using their unique data and documents. We're inviting you to a special event - this Wednesday, December 6th, at 11 AM ET (8 AM PT) for an exclusive behind-the-scenes look at Project SnapChain in action. In this interactive webinar, we're not just sharing insights; we're guiding you on how to construct a RAG-based chatbot using nothing but Snaps, along with your data and documents. What's more, you'll have the chance to put this knowledge into practice in our SnapLabs environment! Join us to be part of this innovative journey and unlock the power to create. Reserve your spot now and be at the forefront of AI innovation. We can't wait to see you there! Sign up here: https://www.snaplogic.com/resources/webcasts/snaplabs-corner-december-20231.7KViews1like0CommentsLLM response logging for analytics
Why do we need LLM Observability? GenAI applications are great, they answer like how a human does. But how do you know if GPT isn’t being “too creative” to you when results from the LLM shows “Company finances are facing issues due to insufficient sun coverage”? As the scope of GenAI apps broaden, the vulnerability expands, and since LLM outputs are non-deterministic, a setup that once worked isn’t guaranteed to always work. Here’s an example of comparing the reasons why an LLM prompt fails vs why a RAG application fails. What could go wrong in the configuration? LLM prompts Suboptimal model parameters Temperature too high / tokens too small Uninformative System prompts RAG Indexing The data wasn’t chunked with the right size, information is sparse yet the window is small. Wrong distance was used. Used Euclidean distance instead of cosine Dimension was too small / too large Retrieval Top K too big, too much irrelevant context fetched Top K too small, not enough relevant context to generate result Filter misused And everything in LLM Prompts Although observability does not magically solve all problems, it gives us a good chance to figure out what might have gone wrong. LLM Observability provides methodologies to help developers better understand LLM applications, model performances, biases, and can help resolve issues before they reach the end users. What are common issues and how observability helps? Observability helps understanding in many ways, from performance bottlenecks to error detection, security and debugging. Here’s a list of common questions we might ask ourselves and how observability may come in handy. How long does it take to generate an answer? Monitor LLM response times and database query times helps identify potential bottlenecks of the application. Is the context retrieved from the Vector Database relevant? Logging database query and results retrieved helps identify better performing queries. Can assist on chunk size configuration based on retrieved results. How many tokens are used in a call? Monitor token usage can help determine the cost of each LLM call. How much better/worse is my new configuration setup doing? Parameter monitoring and response logging helps compare the performance of different models and model configurations. How is the GenAI application performing overall? Tracing stages of the application and evaluation helps identify the performance of the application What are users asking? Logging and analyzing user prompts help understand user needs and can help evaluate if optimizations can be introduced to reduce costs. Helps identify security vulnerabilities by monitoring malicious attempts and help proactively respond to mitigate threats. What should be tracked? GenAI applications involve components chained together. Depending on the use case, there are events and input/output parameters that we want to capture and analyze. A list of components to consider: Vector Database metadata Vector dimension: The vector dimension used to in the vector database Distance function: The way two vectors are compared in the vector database Vector Indexing parameters Chunk configuration: How a chunk is configured, including the size of the chunk, the unit of chunks, etc. This affects information density in a chunk. Vector Query parameters Query: The query used to retrieve context from the Vector Database Top K: The maximum number of vectors to retrieve from the Vector Database Prompt templates System prompt: The prompt to be used throughout the application Prompt Template: The template used to construct a prompt. Prompts work differently in different models and LLM providers LLM request metadata Prompt: The input sent to the LLM model from each end-user, combined with the template Model name: The LLM model used for generation, which affects the capability of the application Tokens: The number of tokens limit for a single request Temperature: The parameter for setting the creativity and randomness of the model Top P: The range of selection of words, the smaller the value the narrower the word selection is sampled from. LLM response metadata Tokens: The number of tokens used in input and output generation, affects costs Request details: May include information such as guardrails, id of the request, etc. Execution Metrics Execution time: Time taken to process individual requests Pipeline examples Logging a Chat completions pipeline We're using MongoDB to store model parameters and LLM responses as JSON documents for easy processing. Logging a RAG pipeline In this case, we're storing parameters to the RAG system (Agent Retrieve in this case) and the model. We're using JSON Generator Snaps to parameterize all input parameters to the RAG system and the LLM models. We then concat the response from the Vector Database, LLM model, and the parameters we provided for the requests.1.3KViews3likes1CommentDiscover Project SnapChain: Build your own Chatbot with Snaps and pipelines!
Hey SnapLabs Community! I hope you're ready for our next experiment. Since you loved SnapGPT so much, we have been hard at work figuring out the easiest way for you to build your own chatbot with your own data for your organization to use internally. Checkout the post below and sign up for our SnapLabs corner webinar happening tomorrow (Wednesday December 6th) at 11AM ET (8 AM PT). See you there! Unlock the Future of AI: Discover Project SnapChain and Build Your Own RAG Chatbot1.3KViews0likes0CommentsWhat is Retrieval-Augmented Generation (RAG)?
What is Retrieval-Augmented Generation (RAG)? Retrieval-Augmented Generation (RAG) is the process of enhancing the reference data used by language models (LLMs) through integrating them with traditional information retrieval systems. This hybrid approach allows LLMs to access and utilize external knowledge bases, databases, and other authoritative sources of information, thereby improving the accuracy, relevance, and currency of the generated responses without requiring extensive retraining. Without RAG, LLMs generate responses based on the information they were trained on. With RAG, the response generation process is enriched by integrating external information into the generation. How does Retrieval-Augmented Generation work? Retrieval-Augmented Generation works through bringing multiple systems or services to generate the prompt to the LLM. This means there will be required setup to support the different systems and services to feed the appropriate data for a RAG workflow. This involves several key steps: 1. External Data Source Creation: External data refers to information outside the original training data of the LLM. This data can come from a variety of sources such as APIs, databases, document repositories, and web pages. The data is pre-processed and converted into numerical representations (embeddings) using embedding models, and then stored in a searchable vector database along with reference to the data that was used to generate the embedding. This forms a knowledge library that can be used to augment a prompt when calling into the LLM for generation of a response to a given input. 2. Retrieval of Relevant Information: When a user inputs a query, it is embedded into a vector representation and matched against the entries in the vector database. The vector database retrieves the most relevant documents or data based on semantic similarity. For example, a query about company leave policies would retrieve both the general leave policy document and the specific role leave policies. 3. Augmentation of LLM Prompt: The retrieved information is then integrated into the prompt to send to the LLM using prompt engineering techniques. This fully formed prompt is sent to the LLM, providing additional context and relevant data that enables the model to generate more accurate and contextually appropriate responses. 4. Generation of Response: The LLM processes the augmented prompt and generates a response that is coherent, contextually appropriate, and enriched with accurate, up-to-date information. The following diagram illustrates the flow of data when using RAG with LLMs. Why use Retrieval-Augmented Generation? RAG addresses several inherent challenges of using LLMs by leveraging external data sources: 1. Enhanced Accuracy and Relevance: By accessing up-to-date and authoritative information, RAG ensures that the generated responses are accurate, specific, and relevant to the user's query. This is particularly important for applications requiring precise and current information, such as specific company details, release dates and release items, new features available for a product, individual product details, etc.. 2. Cost-Effective Implementation: RAG enables organizations to enhance the performance of LLMs without the need for expensive and time-consuming fine-tuning or custom model training. By incorporating external knowledge libraries, RAG provides a more efficient way to update and expand the model's basis of knowledge. 3. Improved User Trust: With RAG, responses can include citations or references to the original sources of information, increasing transparency and trust. Users can verify the source of the information, which enhances the credibility and trust of an AI system. 4. Greater Developer Control: Developers can easily update and manage the external knowledge sources used by the LLM, allowing for flexible adaptation to changing requirements or specific domain needs. This control includes the ability to restrict sensitive information retrieval and ensure the correctness of generated responses. Doing this in conjunction with an evaluation framework (link to evaluation pipeline article) can help to roll out newer content more rapidly to downstream consumers. Snaplogic GenAI App Builder: Building RAG with Ease Snaplogic GenAI App Builder empowers business users to create large language model (LLM) powered solutions without requiring any coding skills. This tool provides the fastest path to developing generative enterprise applications by leveraging services from industry leaders such as OpenAI, Azure OpenAI, Amazon Bedrock, Anthropic Claude on AWS, and Google Gemini. Users can effortlessly create LLM applications and workflows using this robust platform. With Snaplogic GenAI App Builder, you can construct both an indexing pipeline and a Retrieval-Augmented Generation (RAG) pipeline with minimal effort. Indexing Pipeline This pipeline is designed to store the contents of a PDF file into a knowledge library, making the content readily accessible for future use. Snaps used: File Reader, PDF Parser, Chunker, Amazon Titan Embedder, Mapper, OpenSearch Upsert. After running this pipeline, we would be able to view these vectors in OpenSearch. RAG Pipeline This pipeline enables the creation of a chatbot capable of answering questions based on the information stored in the knowledge library. Snap used: HTTP Router, Amazon Titan Embedder, Mapper, OpenSearch Query, Amazon Bedrock Prompt Generator, Anthropic Claude on AWS Messages. To implement these pipelines, the solution utilizes the Amazon Bedrock Snap Pack and the OpenSearch Snap Pack. However, users have the flexibility to employ other LLM and vector database Snaps to achieve similar functionality.1.3KViews4likes0CommentsA Comparison of Assistant and Non-Assistant Tool Calling Pipelines
Introduction At a high level, the logic behind assistant tool calling and non-assistant tool calling is fundamentally the same: the model instructs the user to call specific function(s) in order to answer the user's query. The user then executes the function and returns the result to the model, which uses it to generate an answer. This process is identical for both. However, since the assistant specifies the function definitions and access to tools as part of the Assistant configuration within the OpenAI or Azure OpenAI dashboard rather than within your pipelines, there will be major differences in the pipeline configuration. Additionally submitting tool responses to an Assistant comes with significant changes and challenges since the Assistant owns the conversational history rather than the pipeline. This article focuses on contrasting these differences. For a detailed understanding of assistant pipelines and non-assistant pipelines, please refer to the following article: Non-assistant pipelines: Introducing Tool Calling Snaps and LLM Agent Pipelines Assistant pipelines: Introducing Assistant Tool Calling Pipelines Part 1: Which System to Use: Non-Assistant or Assistant? When to Use Non-Assistant Tool Calling Pipelines: Non-Assistant Tool Calling Pipelines offer greater flexibility and control over the tool calling process, making them suitable for the following specific scenarios. When preferring a “run-time“ approach: Non-Assistant pipelines exhibit greater flexibility in function definition, offering a more "runtime" approach. You can dynamically adjust the available functions by simply adding or removing Function Generator snaps within the pipeline. In contrast, Assistant Tool Calling Pipelines necessitate a "design-time" approach. All available functions must be pre-defined within the Assistant configuration, requiring modifications to the Assistant definition in the OpenAI/Azure OpenAI dashboard. When wanting detailed chat history: Non-Assistant pipelines provide a comprehensive history of the interaction between the model and the tools in the output message list. The message list within the Non-Assistant pipeline preserves every model response and the results of each function execution. This detailed logging allows for thorough debugging, analysis, and auditing of the tool calling process. In contrast, Assistant pipelines maintain a more concise message history, focusing on key steps and omitting some intermediate details. While this can simplify the overall view of the message list, it can also make it more difficult to trace the exact sequence of events or diagnose issues that may arise during tool execution in child pipelines. When needing easier debugging and iterative development: Non-Assistant pipelines facilitate more granular debugging and iterative development. You can easily simulate individual steps of the agent by making calls to the model with specific function call histories. This allows for more precise control and experimentation during development, enabling you to isolate and address issues more effectively. For example, by providing three messages, we can "force" the model to call the second tool, allowing us to inspect the tool calling process and its result against our expectations. In contrast, debugging and iterating with Assistant pipelines can be more cumbersome. Since Assistants manage the conversation history internally, to simulate a specific step, you often need to replay the entire interaction from the beginning, potentially requiring multiple iterations to reach the desired state. This internal management of history makes it less straightforward to isolate and debug specific parts of the interaction. To simulate calling the third tool, we need to start a new thread from scratch and then call tool1 and tool2, repeating the preceding process. The current thread cannot be reused. When to Use Assistant Tool Calling Pipelines: Assistant Tool Calling Pipelines also offer a streamlined approach to integrating LLMs with external tools, prioritizing ease of use and built-in functionalities. Consider using Assistant pipelines in the following situations: For simplified pipeline design: Assistant pipelines reduce pipeline complexity by eliminating the need for Tool Generator snaps. In Non-Assistant pipelines, these snaps are essential for dynamically generating tool definitions within the pipeline itself. With Assistant pipelines, tool definitions are configured beforehand within the Assistant settings in the OpenAI/Azure OpenAI dashboard. This pre-configuration results in shorter, more manageable pipelines, simplifying development and maintenance. When leveraging built-in tools is required: If your use case requires functionalities like searching external files or executing code, Assistant pipelines offer these capabilities out-of-the-box through their built-in File Search and Code Interpreter tools (see Part 5 for more details). These tools provide a convenient and efficient way to extend the LLM's capabilities without requiring custom implementation within the pipeline. Part 2: A brief introduction to two pipelines Non-assistant tool calling pipelines Key points: Functions are defined in the worker. The worker pipeline's Tool Calling snap manages all model interactions. Function results are collected and sent to the model in the next iteration via the Tool Calling snap. Assistant tool calling pipelines Key points: No need to define functions in any pipeline. Functions are pre-defined in the assistant. Two snaps : interact with the model: Create and Run Thread, and Submit Tool Outputs. Function results are collected and sent to the model immediately during the current iteration. Part 3: Comparison between two pipelines Here are two primary reasons why the assistant and non-assistant pipelines differ, listed in decreasing order of importance: Distinct methods of submitting tool results: For non-assistant pipelines, tool results are appended to the message history list and subsequently forwarded to the model during the next iteration. Non-assistant pipelines exhibit a "while-loop" behavior, where the worker interacts with the model at the beginning of the iteration, and while any tools need to be called, the worker executes those tool(s). In contrast, for assistants, tool results are specifically sent to a dedicated endpoint designed to handle tool call results within the current iteration. The assistant pipelines operate more like a "do-while-loop." The driver initiates the interaction by sending the prompt to the model. Subsequently, the worker execute the tool(s) first and interacts with the model at the end of the iteration to deliver tool results. Predefined and stored tool definitions for assistants: Unlike non-assistant pipelines, assistants have the capability to predefine and store function definitions. This eliminates the need for the three Function Generator snaps to repeatedly transmit tool definitions to the model with each request. Consequently, the worker pipeline for assistants appears shorter. Due to the aforementioned differences, non-assistant pipelines have only one interaction point with the model, located in the worker. In contrast, assistant pipelines involve two interaction points: the driver sends the initial prompt to the model, while the worker sends tool results back to the model. Part 4: Differences in snap settings Stop condition of Pipeloop A key difference in snap settings lies in the stop condition of the pipeloop. Assistant pipeline’s stop condition: $run.required_action == null . Non-assistant pipeline’s stop condition: $finish_reason != "tool_calls" . Assistant’s output Example when tool calls are required: Example when tool calls are NOT required: Non-assistant’s output Example when tool calls are required: Example when tool calls are NOT required: Part 5: Assistant’s two built-in tools The assistant not only supports all functions that can be defined in non-assistant pipelines but also provides two special built-in functions, file search and code interpreter, for user convenience. If the model determines that either of these tools is required, it will automatically call and execute the tool within the assistant without requiring manual user intervention. You don't need a tool call pipeline to experiment with file search and code interpreter. A simple create and run thread snap is sufficient. File search File Search augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. OpenAI automatically parses and chunks your documents, creates and stores the embeddings, and use both vector and keyword search to retrieve relevant content to answer user queries. Example Prompt: What is the number of federal fires between 2018 and 2022? The assistant’s response is as below: The assistant’s response is correct. As the answer to the prompt is in the first row of a table on the first page of wildfire_stats.pdf, a document accessible to the assistant via a vector store. Answer to the prompt: The file is stored in a vector store used by the assistant: Code Interpreter Code Interpreter allows Assistants to write and run Python code in a sandboxed execution environment. This tool can process files with diverse data and formatting, and generate files with data and images of graphs. Code Interpreter allows your Assistant to run code iteratively to solve challenging code and math problems. When your Assistant writes code that fails to run, it can iterate on this code by attempting to run different code until the code execution succeeds. Example Prompt: Find the number of federal fires between 2018 and 2022 and use Matplotlib to draw a line chart. * Matplotlib is a python library for creating plots. The assistant’s response is as below: From the response, we can see that the assistant indicated it used file search to find 5 years of data and then generated an image file. This file can be downloaded from the assistant's dashboard under storage-files. Simply add a file extension like .png to see the image. Image file generated by assistant: Part 6: Key Differences Summarized Feature Non-Assistant Tool Calling Pipelines Assistant Tool Calling Pipelines Function Definition Defined within the worker pipeline using Function Generator snaps. Pre-defined and stored within the Assistant configuration in the OpenAI/Azure OpenAI dashboard. Tool Result Submission Appended to the message history and sent to the model in the next iteration. Sent to a dedicated endpoint within the current iteration. Model Interaction Points One (in the worker pipeline). Two (driver sends initial prompt, worker sends tool results). Built-in Tools None. File Search and Code Interpreter. Pipeline Complexity More complex pipeline structure due to function definition within the pipeline. Simpler pipeline structure as functions are defined externally.794Views4likes0Comments