Agent Creator

23 Topics

Embeddings and Vector Databases
What are embeddings Embeddings are numerical representations of real-world objects, like text, images or audio. They are generated by machine learning models as vectors, an array of numbers, where the distance between vectors can be seens as the degree of similarity between objects. While an embedding model may have its own meaning for each of the dimensions, there’s no guarantee between embedding models of the meaning for each of the dimensions used by the embedding models. For example, the word “cat”, “dog” and “apple” might be embedded into the following vectors: cat -> (1, -1, 2) dog -> (1.5, -1.5, 1.8) apple -> (-1, 2, 0) These vectors are made-up for a simpler example. Real vectors are much larger, see the Dimension section for details. Visualizing these vectors as points in a 3D space, we can see that "cat" and "dog" are closer, while "apple" is positioned further away. Figure 1. Vectors as points in a 3D space By embedding words and contexts into vectors, we enable systems to assess how related two embedded items are to each other via vector comparison. Dimension of embeddings The dimension of embeddings refers to the length of the vector representing the object. In the previous example, we embedded each word into a 3-dimensional vector. However, a 3-dimensional embedding inevitably leads to a massive loss of information. In reality, word embeddings typically require hundreds or thousands of dimensions to capture the nuances of language. For example, OpenAI's text-embedding-ada-002 model outputs a 1536-dimensional vector Google Gemini's text-embedding-004 model outputs a 768-dimensional vector Amazon Titan's amazon.titan-embed-text-v2:0 model outputs a default 1024-dimensional vector Figure 2. Using text-embedding-ada-002 to embed the sentence “I have a calico cat.” In short, an embedding is a vector that represents a real-world object. The distance between these vectors indicates the similarity between the objects. Limitation of embedding models Embedding models are subject to a crucial limitation: the token limit, where a token can be a word, punctuation mark, or subword part. This constraint defines the maximum amount of text a model can process in a single input. For instance, the Amazon Titan Text Embeddings models can handle up to 8,192 tokens. When input text exceeds the limit, the model typically truncates it, discarding the remaining information. This can lead to a loss of context and diminished embedding quality, as crucial details might be omitted. To address this, several strategies can help mitigate its impact: Text Summarization or Chunking: Long texts can be summarized or divided into smaller, manageable chunks before embedding. Model Selection: Different embedding models have varying token limits. Choosing a model with a higher limit can accommodate longer inputs. What is a Vector Database Vector databases are optimized for storing embeddings, enabling fast retrieval and similarity search. By calculating the similarity between the query vector and the other vectors in the database, the system returns the vectors with the highest similarity, indicating the most relevant content. The following diagram illustrates a vector database search. A query vector 'favorite sport' is compared to a set of stored vectors, each representing a text phrase. The nearest neighbor, 'I like football', is returned as the top result. Figure 3. Vector Query Example Figure 4. Store Vectors into Database Figure 5. Retrieve Vectors from Database When working with vector databases, two key parameters come into play: Top K and similarity measure (or distance function). Top K When querying a vector database, the goal is often to retrieve the most similar items to a given query vector. This is where the Top K concept comes into play. Top K refers to retrieving the top K most similar items based on a similarity metric. For instance, if you're building a product recommendation system, you might want to find the top 10 products similar to the one a user is currently viewing. In this case, K would be 10. The vector database would return the 10 product vectors closest to the query product's vector. Similarity Measures To determine the similarity between vectors, various distance metrics are employed, including: Cosine Similarity: This measures the cosine of the angle between two vectors. It is often used for text-based applications as it captures semantic similarity well. A value closer to 1 indicates higher similarity. Euclidean Distance: This calculates the straight-line distance between two points in Euclidean space. It is sensitive to magnitude differences between vectors. Manhattan Distance: Also known as L1 distance, it calculates the sum of the absolute differences between corresponding elements of two vectors. It is less sensitive to outliers compared to Euclidean distance. Figure 6. Similarity Measures There are many other similarity measures not listed here. The choice of distance metric depends on the specific application and the nature of the data. It is recommended to experiment with various similarity metrics to see which one produces better results. What embedders are supported in SnapLogic As of October 2024, SnapLogic has supported embedders for major models and continues to expand its support. Supported embedders include: Amazon Titan Embedder OpenAI Embedder Azure OpenAi Embedder Google Gemini Embedder What vector databases are supported in SnapLogic Pinecone OpenSearch MongoDB Snowflake Postgres AlloyDB Pipeline examples Embed a text file Read the file using the File Reader snap. Convert the binary input to a document format using the Binary to Document snap, as all embedders require document input. Embed the document using your chosen embedder snap. Figure 7. Embed a File Figure 8. Output of the Embedder Snap Store a Vector Utilize the JSON Generator snap to simulate a document as input, containing the original text to be stored in the vector database. Vectorize the original text using the embedder snap. Employ a mapper snap to format the structure into the format required by Pinecone - the vector field is named "values", and the original text and other relevant data are placed in the "metadata" field. Store the data in the vector database using the vector database's upsert/insert snap. Figure 9. Store a Vector into Database Figure 10. A Vector in the Pinecone Database Retrieve Vectors Utilize the JSON Generator snap to simulate the text to be queried. Vectorize the original text using the embedder snap. Employ a mapper snap to format the structure into the format required by Pinecone, naming the query vector as "vector". Retrieve the top 1 vector, which is the nearest neighbor. Figure 11. Retrieve Vectors from a Database [ { "content" : "favorite sport" } ] Figure 12. Query Text Figure 13. All Vectors in the Database { "matches": [ { "id": "db873b4d-81d9-421c-9718-5a2c2bd9e720", "score": 0.547461033, "values": [], "metadata": { "content": "I like football." } } ] } Figure 14. Pipeline Output: the Closest Neighbor to the Query Embedder and vector databases are widely used in applications such as Retrieval Augmented Generation (RAG) and building chat assistants. Multimodal Embeddings While the focus thus far has been on text embeddings, the concept extends beyond words and sentences. Multimodal embeddings represent a powerful advancement, enabling the representation of various data types, such as images, audio, and video, within a unified vector space. By projecting different modalities into a shared semantic space, complex relationships and interactions between these data types can be explored. For instance, an image of a cat and the word "cat" might be positioned closely together in a multimodal embedding space, reflecting their semantic similarity. This capability opens up a vast array of possibilities, including image search with text queries, video content understanding, and advanced recommendation systems that consider multiple data modalities.
Luna
2 years ago Place SnapLogic Technical Blog
3.3KViews
5likes
0Comments
Using Snaplogic to Extract Data
Using Snaplogic to Extract Data This guide provides a step-by-step approach to using SnapLogic to extract data from formats such as PDFs, CSVs, JSON, and HTML. Whether you are new to SnapLogic or looking to refine your data extraction processes, this guide will help you create and manage effective pipelines. Data Formats in SnapLogic SnapLogic is capable of reading data from a wide range of sources and can convert this data into two primary formats: Document Format: A document in SnapLogic is a JSON-like format used for internal data processing. This format allows users to interact with data attributes in a manner similar to JSON. It also supports SnapLogic’s internal functions, such as text processing, enabling users to apply expressions for data manipulation and transformation. These transformations can be performed before the data is sent to downstream connectors, ensuring that the data is appropriately structured and formatted for subsequent processing. Binary Format: The binary format in SnapLogic is used to handle raw data streams, such as images, PDFs, or any other file types that do not conform to structured data formats. This format is particularly useful when dealing with non-text data or when the data needs to be passed through the pipeline without alteration. SnapLogic can convert binary data to other formats when necessary, allowing for processing, storage, or transformation before being transmitted to downstream components. This ensures flexibility and efficiency in handling various types of data within the pipeline. Extracting Data from Files in SnapLogic SnapLogic offers robust capabilities for extracting data from various file types, making it a versatile tool in data integration workflows. To facilitate this process, SnapLogic provides the SnapLogic File System (SLFS), a built-in storage solution where users can upload and manage files directly within the SnapLogic environment. However, there are important considerations and best practices to keep in mind when utilizing SLFS for data extraction. Using SnapLogic File System (SLFS) The SnapLogic File System (SLFS) allows users to store and access files for processing within their pipelines. To extract data from a file using SnapLogic, users first need to upload the file to SLFS. Once uploaded, the file can be accessed and processed by various Snaps designed to read and manipulate file content. However, SLFS has a file size limitation, allowing users to upload files up to a maximum of 100MB per file. This limitation can present challenges when dealing with larger datasets or files. If the file exceeds this size limit, users would need to split the content into multiple smaller files before uploading them to SLFS. While this approach may work for smaller or segmented data, it is generally not recommended for handling large-scale data processing tasks. Best Practices for Handling Large Files For processing files larger than 100MB, it is advisable to leverage external storage systems rather than relying on SLFS. External file systems provide greater flexibility and scalability, allowing you to handle significantly larger files without the need for manual segmentation. SnapLogic seamlessly integrates with various external storage solutions, enabling efficient data extraction and processing. SFTP Servers: Users can utilize Secure File Transfer Protocol (SFTP) servers for storing and accessing large files. SnapLogic supports SFTP integration, allowing you to securely read and process files directly from the server within your pipelines. Object Storage Services (e.g., Amazon S3): Cloud-based object storage services like Amazon S3 offer virtually unlimited storage capacity, making them ideal for managing large datasets. SnapLogic’s native support for S3 allows users to read, write, and manage files stored in S3 buckets, ensuring smooth data processing without the constraints of SLFS. By integrating SnapLogic with these external storage solutions, users can efficiently manage large files, streamline data extraction workflows, and maintain optimal performance across their pipelines. This approach not only overcomes the limitations of SLFS but also aligns with best practices for scalable and reliable data processing. Reading File Data and process in SnapLogic In this section, we'll delve into the process of reading data from various file formats using SnapLogic. This guide outlines the key steps, configuration options, and best practices to ensure efficient and effective data extraction. To read data from a file using SnapLogic, you can follow these two essential steps: 1. Configuring the File Reader Snap Configuring the File Reader Snap is crucial for setting up a SnapLogic pipeline that can accurately and efficiently access data from various file systems and storage solutions. This Snap serves as the gateway for bringing data into your pipeline, and its proper configuration is key to seamless data extraction. Key Considerations: Identifying the File Location: SnapLogic supports a wide range of file storage protocols, enabling access to files stored both locally and in remote environments SLFS (SnapLogic File System): Ideal for smaller files stored temporarily within SnapLogic. Amazon S3: A scalable cloud storage service best managed with the S3 File Reader Snap for optimized performance and ease of use. SFTP Servers: For secure file transfers over a network using SFTP. HTTP/HTTPS: Allows direct access to files hosted on web servers. Azure Blob Storage: Optimized for handling large amounts of unstructured data in the cloud. Selecting the Appropriate Protocol: Choosing the correct protocol ensures efficient file access based on the storage location. Each protocol requires specific configurations to enable secure and accurate data retrieval. Below are the guidelines for configuring various storage protocols. SLFS: Specify the file path relative to the SLFS root. Amazon S3: It is recommended to use the S3 File Reader Snap for S3-specific features. If using the File Reader Snap, ensure the correct bucket name, file path, and AWS credentials are provided. SFTP: Input the server address, file path, and authentication details (username, password, or SSH key). HTTP/HTTPS: Enter the full URL, including any necessary authentication tokens. Azure Blob Storage: Provide the container name, blob path, and relevant credentials. File Path Configuration: Properly formatting the file path is crucial for successful data access. Each protocol requires specific path structures SLFS: Use a relative path (e.g., /myfiles/data.csv). S3: Format the path for S3 buckets (e.g., s3://mybucket/data/myfile.csv), and consider using the S3 File Reader Snap for simplicity. SFTP: Provide the full path relative to the SFTP root (e.g., /home/user/data/myfile.csv). HTTP/HTTPS: Use the full URL (e.g., https://www.example.com/files/myfile.csv). Azure Blob Storage: Specify the blob path within the container. Authentication and Credentials: Securely accessing external storage systems requires robust authentication. Depending on the storage type, different methods are used to ensure authorized access to sensitive data. Proper configuration of these methods is essential for maintaining data integrity and security. S3: Use AWS Access Key ID and Secret Access Key, or IAM roles for AWS environments. The S3 File Reader Snap simplifies this process. SFTP: Configure with username/password or SSH keys, ensuring the correct key file is specified. HTTP/HTTPS: Input basic authentication credentials or bearer tokens as needed. Azure Blob Storage: Use SAS tokens or AAD credentials depending on your security requirements. 2. Selecting the Appropriate Parser Snap Once the File Reader Snap is configured, the next step is to select the appropriate Parser Snap for processing the file data: CSV Files: Use the CSV Parser Snap to read and parse comma-separated values (CSV) files. This Snap handles various delimiters and header rows, allowing direct mapping of CSV data to downstream processes. JSON Files: The JSON Parser Snap is ideal for parsing both flat and nested JSON structures, converting them into SnapLogic’s Document format for further processing. XML Files: The XML Parser Snap effectively handles XML data, supporting the parsing of complex XML structures, including attributes and nested elements. HTML Files: Use the HTML Parser Snap to extract data from HTML documents, with support for XPath or CSS selectors to accurately pinpoint specific elements within the HTML. PDF Files: The PDF Parser Snap is designed to convert PDFs into a structured format like JSON, enabling further manipulation within SnapLogic. Example of reading a file in Snaplogic Reading CSV File Add the File Reader Snap: Drag and drop the “File Reader” Snap onto the designer workspace. Configure the File Reader Snap: Click on the “File Reader” Snap to access its settings panel. Select or Upload a File: In the file settings, click on the folder icon to either upload a new file or select an existing one from the directory. Save Configuration: After selecting or uploading the desired file, click "Save" and then close the settings panel. Add the CSV Parser Snap: Drag and drop the “CSV Parser” Snap onto the workspace, positioning it after the File Reader. The default settings for the CSV Parser are typically sufficient for most use cases. Validate or Execute the Pipeline: Validate or execute the pipeline to process and read the contents of the CSV file. Reading HTML Content with File Reader Add the File Reader Snap: Drag and drop the “File Reader” Snap onto the designer workspace. Configure the File Reader Snap: Click on the “File Reader” Snap to access its settings panel. Then put the url that you want to connect to the settings Save Configuration: Click "Save" and close the settings. Add the HTML Parser Snap: Drag the "HTML Parser" snap to the Designer and connect it to the File Reader snap and leave the configuration as a default Configure the HTML Parser Snap: Open the HTML Parser settings and go to the "Views" tab. Choose to output either as a document or binary. Validate or Execute the Pipeline: Validate or execute the pipeline to retrieve the text content from the given URL. Reading HTML Content with HTTP Client Add the HTTP Client Snap: Drag and drop the “HTTP Client” Snap onto the designer workspace. Configure the HTTP Client Snap: Click on the “HTTP Client” Snap to access its settings panel. Then put the url that you want to connect to the settings. The HTTP Client snap allows user to do complex html calling with various of configuration. Users can select “Request Methods” or provide the “Pagination” mechanism to get continuous content Configure the HTTP Client Snap Views: Click the “Views” tab and select output to “Binary”. Then save and close the settings. Add the HTML Parser Snap: Drag the "HTML Parser" snap to the Designer then connect it to the File Reader snap and leave the configuration as a default Validate or Execute the Pipeline: Validate or execute the pipeline to retrieve the text content from the given URL. View output: The output will show the content in text format Reading PDF File To effectively read and extract information from a PDF file, users must first understand the nature of the content within the file. PDF files can generally be classified into two categories: those containing text-based content and those containing image-based content. For PDFs with text-based content, users can utilize a standard PDF parser, which is designed to extract and output the textual information in a readable format. These parsers are widely available and are efficient in converting the embedded text in a PDF document into a text format that can be further processed or analyzed. However, if the PDF contains image-based content, such as scanned documents or images of text, a different approach is required. In this case, Optical Character Recognition (OCR) technology must be employed. OCR services are specialized tools that analyze the images within the PDF and convert the visual representations of text into machine-readable text. This process is crucial for making the content accessible and editable, especially when dealing with scanned documents or other image-heavy files. By understanding these distinctions and choosing the appropriate tool for the type of content within a PDF, users can effectively extract the necessary information, ensuring accuracy and efficiency in their workflow. Reading PDF File with PDF Parser snap Add the File Reader Snap: Drag and drop the “File Reader” Snap onto the designer workspace. Configure the File Reader Snap: Click on the “File Reader” Snap to access its settings panel. Then, select or upload a pdf file that you want to read Save Configuration: Click "Save" and close the settings. Add the PDF Parser Snap: The PDF Parser snap can parse the pdf file into a text. Configure the PDF Parser Snap: Users can select a proper “Parser type” to parse the pdf file. In this case, we select “Text extractor” so we can get all the text from the pdf file. Validate or Execute the Pipeline: Validate or execute the pipeline to retrieve the text content from the given PDF file. The response contains pdf text that is extracted from the pdf file. Reading PDF Files with Unstructured Prerequisite: Users need to have an unstructured API key to access the unstructured service or have the unstructured instance deploy locally to use the unstructured API Add the File Reader Snap: Drag and drop the “File Reader” Snap onto the designer workspace. Configure the File Reader Snap: Click on the “File Reader” Snap to access its settings panel. Then, select or upload a pdf file that you want to read Save Configuration: Click "Save" and close the settings. Add the Partition API Snap: The partition API snap utilizes the unstructured partition API to parse a pdf file. Configure the Partition API Snap: Users can select a strategy to parse a pdf file. In this case, we use “fast” strategy to parse only text and not include images and tables. If you select hi_res, the image and table will be parsed as a base64. Validate or Execute the Pipeline: Validate or execute the pipeline to retrieve the text content from the given PDF file. The response also contains type of the content that is parsed from the PDF. So users can use “type” attribute to process further for example, users can process the title and content separately from the footer. Reading PDF File with Adobe Service Users can leverage the Adobe API for PDF parsing tasks. SnapLogic offers the Adobe Extract Snap for extracting text from PDF documents, as well as the Adobe OCR Snap for processing image-based PDF content. When dealing with PDFs that contain image-based content, the Adobe OCR Snap is the appropriate tool, enabling accurate extraction of text from images within the document. Reading PDF File with Adobe OCR Prerequisite: Users need to have Adobe key to access the Adobe OCR service before using the Adobe OCR snap Add the File Reader Snap: Drag and drop the “File Reader” Snap onto the designer workspace. Configure the File Reader Snap: Click on the “File Reader” Snap to access its settings panel. Then, select or upload a pdf file that you want to read Save Configuration: Click "Save" and close the settings. Add the Adobe OCR Snap: The Adobe OCR Snap utilizes the Adobe API to efficiently parse PDF files with image-based content. Users can rely on the default settings provided within the Adobe OCR Snap to accurately extract text from images embedded in the PDF. Validate or Execute the Pipeline: Validate or execute the pipeline to retrieve the text content from the given PDF file. The response contains a binary content that contains the extracted information.
BankTanapat
2 years ago Place SnapLogic Technical Blog
2.3KViews
2likes
0Comments
Introduction to PipeLoop
We all love the Pipeline Execute Snap, it greatly simplifies a complex pipeline by extracting sections into a sub-pipeline. But sometimes, we’d really want the ability to run a pipeline multiple times to perform some operations, like polling from an endpoint or performing LLM Tool calls. In this article, we will introduce the PipeLoop Snap, which adds iteration to the SnapLogic programming model. With PipeLoop, we can create new workflows that are previously hard to manage or even impossible. What is PipeLoop PipeLoop is a new Snap for iterative execution on a pipeline. For people who are familiar with iterations within programming languages, PipeLoop is essentially a do-while loop for pipelines. The user is required to provide an iteration limit as a hard cutoff to avoid resource depletion or infinite loop, and an optional stop condition to control the execution. Just like we can pass input documents to PipeExec, we can also pass input documents to PipeLoop, the difference between the two is that the output document of the pipeline executed with PipeLoop will be used as the next round of input to continue the execution until the stop condition is met or limit is reached. Due to this unique mechanism, the pipeline run by PipeLoop must have one unlinked input and one unlinked output to work properly. To put it simply, PipeLoop can be thought of as chaining a bunch of PipeExec Snaps with the same pipeline with variable length and a condition to exit early. PipeLoop execution flow 1. Input documents to PipeLoop are passed to the child pipeline for execution. 2. Child pipeline executes. 3. Child output is collected. 4. Evaluate stop condition based on document output. If true, exit and pass the output document to PipeLoop, otherwise continue. 5. Check if the iteration limit is reached. If true, exit and pass the output document to PipeLoop, otherwise continue. 6. Use the output document as the next round of input and continue (1.) PipeLoop execution walkthrough Let’s start with a very simple example. We’ll create a workflow using PipeLoop that increments a number from 1 to 3. For simplicity, we will refer to the pipeline with PipeLoop as the “Parent pipeline”, and the pipeline that is executed by PipeLoop as the “Child pipeline”. Parent pipeline setup The parent pipeline consists of one JSON Generator Snap with one document as input, and one PipeLoop Snap running the pipeline “child” with stop condition “$num >= 3”. We’ll also enable “Debug Iteration output” to see the output of each round in this walkthrough. Child pipeline setup The child pipeline consists of a single mapper snap that increments “$num” by 1, which satisfies the requirement “a pipeline with one unlinked input and one unlinked output” for a pipeline to be run by PipeLoop. Output The output of PipeLoop consists of two major sections when Debug mode is enabled: the output fields, and _iteration_documents. We can see the final output is “num”: 3, which means PipeLoop has successfully carried out the task. PipeLoop features There are multiple features in PipeLoop that can be helpful when building iterating pipelines. We’ll categorize them from where the features are located. Properties There are 4 main sections in the property of the PipeLoop Snap. Pipeline Pipeline Parameters Loop options Execution Options Pipeline The pipeline to be run. Pipeline Parameters We’ll take a deeper dive into this in the Pipeline Parameters section. Loop options Loop options are property settings that are related to iterations of this snap. Stop condition The Stop condition field allows the user to set an expression to be evaluated after the first execution has occurred. If the expression is evaluated to true, the iteration will be stopped. The stop condition can be also set to false if the user wishes to use this as a traditional for loop. There are cases where the user might pass an unintended value into the Stop condition field. In this scenario, PipeLoop generates a warning when the user provides a non-boolean String as the Stop condition, while the stop condition will be treated as false. Non-boolean Stop condition warning Iteration limit The Iteration limit field allows the user to limit the maximum number of iterations that could potentially occur. This field can also be used to limit the total number of executions if the Stop condition is set to false. Setting a large value for the Iteration limit with debug mode on could be dangerous. The accumulated documents could quickly deplete CPU and RAM resources. To prevent this, PipeLoop generates a warning in the Pipeline Validation Statistics tab when the Iteration limit is set to greater than or equal to 1000 with Debug mode set to enabled. Large iteration limit with debug mode enabled warning Debug iteration outputs This toggle field enables the output from the child pipelines for each iteration and the stop condition evaluation to be added into the final output as a separate field. Output example with Debug iteration outputs enabled Execution options Execute On To specify where the pipeline execution should take place. Currently only local executions (local snaplex, local node) are supported. Execution Label We’ll take a deeper dive into this in the Monitoring section. Pipeline Parameters For users that are familiar with Pipeline Parameters in PipeExec, feel free to skip to the next section as the instructions are identical. Introduction to Pipeline Parameters Before we take a look at the Pipeline Parameters support in the PipeLoop Snap, let’s take a step back and see what pipeline parameters are and how pipeline parameters can be leveraged. Pipeline parameters are String constants that can be defined in the Edit Pipeline Configuration settings. Users can use the parameters as a constant to be used anywhere in the pipeline. One major difference for Pipeline parameters and Pipeline variables is that Pipeline parameters are referred using an underscore prefix, whereas Pipeline variables are referred using a dollar sign prefix. Pipeline Parameters in Edit Pipeline Configuration Accessing Pipeline Parameters in an expression field Example Let’s take a look at Pipeline Parameters in action with PipeLoop. Our target here is to print out “Hello PipeLoop!” n times where n is the value of “num”. We’ll add two parameters in the child pipeline, param1 and param2. To demonstrate, we assign “value1” to param1 and keep it empty for param2. We’ll then add a message field with the value “Hello PipeLoop!” in the JSON Generator so that we can assign the String value to param2. Now we’re able to use param2 as a constant in the child pipeline. PipeLoop also has field name suggestions built in the Parameter name fields for ease of use. PipeLoop Pipeline Parameters in action For our child pipeline, we’ll add a new row in the Mapping table to print out “Hello PipeLoop!” repeatedly (followed with a new line character). One thing to bear in mind is that the order of the Mapping table does not affect the output (the number of “Hello PipeLoop!” printed in this case), as the output fields are updated after the execution of current iteration is finished. Child Pipeline configuration for our task Here’s the final result, we can see “Hello PipeLoop!” is being printed twice. Mission complete. Remarks Pipeline Parameters are String constants that can be set in Edit Pipeline Configuration. Users can pass a String to Pipeline Parameters defined in the Child pipeline in PipeLoop. Pipeline Parameters in PipeLoop will override previous pipeline parameter values defined in the Child pipeline if the parameters share the same name. Pipeline Parameters are constants, which means the values will not be modified during iterations even if the users did so. Monitoring When a snap in a pipeline is executed, there will not be any output until the execution is finished. Therefore, due to the nature of iterating pipeline execution as a single snap, it is slightly difficult to know where the execution is currently at, or which pipeline execution is corresponding to which input document. To deal with this, we have two extra features that can add more visibility to the PipeLoop execution. Pipeline Statistics progress bar During the execution of PipeLoop, a progress bar will be available in the Pipeline Validation Statistics tab, so that the user can get an idea of which iteration the PipeLoop is currently at. Note that the progress bar might not reflect the actual iteration index if the child pipeline executions are short, due to polling intervals. PipeLoop iteration progress bar Execution Label When a PipeLoop with multiple input documents is executed, the user will not be able to tell which pipeline execution is linked to which input document in the SnapLogic Monitor. Execution label is the answer to this problem. The user can pass in a value in the Execution label field that can differentiate input documents so that each input document will have its own label in the Snaplogic Monitor during Execution. Here’s an example of two input documents running on the child pipeline. We set the Execution label with the expression “child_label” + $num, so the execution for the first document will have the label “child_label0” and the second execution will have the label “child_label1”. Execution label settings SnapLogic Monitor View Summary In this article, we introduced PipeLoop, a new Snap for iterative execution workflows. The pipeline run by PipeLoop must have one unlinked input and one unlinked output. PipeLoop has the following features: Pipeline Parameters support Stop condition to exit early with warnings Iteration limit to avoid infinite loop with warnings Debug mode Execution label to differentiate runs in Monitor Progress bar for status tracking Happy Building!
tfan
2 years ago Place SnapLogic Technical Blog
2.1KViews
5likes
0Comments
Guide for Advanced GenAI App Patterns
In the rapidly evolving field of Generative AI (GenAI), foundational knowledge can take you far, but it's the mastery of advanced patterns that truly empowers you to build sophisticated, scalable, and efficient applications. As the complexity of AI-driven tasks grows, so does the need for robust strategies that can handle diverse scenarios—from maintaining context in multi-turn conversations to dynamically generating content based on user inputs. This guide delves into these advanced patterns, offering a deep dive into the strategies that can elevate your GenAI applications. Whether you're an admin seeking to optimize your AI systems or a developer aiming to push the boundaries of what's possible, understanding and implementing these patterns will enable you to manage and solve complex challenges with confidence. 1. Advanced Prompt Engineering 1.1 Comprehensive Control of Response Format In GenAI applications, controlling the output format is crucial for ensuring that responses align with specific user requirements. Advanced prompt engineering allows you to craft prompts that provide precise instructions on how the AI should structure its output. This approach not only improves the consistency of responses but also makes them more aligned with the desired objectives. For instance, you can design prompts with a detailed structure that includes multiple elements such as Context, Objective, Style, Audience, and desired Response Length. This method allows for granular control over the output. A sample prompt might look like this: Context: Provide background information on the topic to set the stage. Objective: Clearly define the purpose of the response. Style: Specify whether the response should be formal, informal, technical, or creative. Audience: Identify the target audience, which influences the language and depth of explanation. Response format: Instruct the AI to generate a response that takes approximately 3 minutes to read, ensuring depth and comprehensiveness, typically spanning 4-5 paragraphs. This level of detail in prompt engineering ensures that the AI-generated content meets specific needs, making it suitable for various use cases, such as generating educational material, detailed reports, or customer communications. 1.2 Few-Shot Learning Few-shot learning is an advanced technique where the AI model is provided with a small number of examples (often just a few) within the prompt to guide its output. This method is particularly powerful when dealing with tasks that require the model to understand and replicate complex patterns or formats with minimal input. By incorporating examples directly into the prompt, you can train the model to produce more accurate and contextually appropriate responses even in scenarios where large amounts of training data are not available. This approach is invaluable in customizing the AI's behavior for niche applications where generalized training data may not suffice. Example Prompt: "Generate a product description for the following items. The description should be concise, highlight key features, and appeal to a target audience interested in quality and innovation. Example 1: Product Name: SmartHome WiFi Thermostat Description: "The SmartHome WiFi Thermostat is your ultimate solution for energy-efficient temperature control. With seamless integration into your smart home ecosystem, it offers intuitive controls, customizable settings, and real-time energy usage reports. Designed for modern living, it ensures comfort while helping you save on energy bills." Example 2: Product Name: EcoBrew Stainless Steel Coffee Maker Description: "The EcoBrew Stainless Steel Coffee Maker combines sustainability with top-tier performance. Made from durable stainless steel, this eco-friendly coffee maker brews the perfect cup every time, with minimal environmental impact. Ideal for the conscious coffee lover, it offers precision brewing and a sleek design that complements any kitchen." New Item: Product Name: UltraLight Travel Backpack Description: AI-Generated Output: "UltraLight Travel Backpack: The UltraLight Travel Backpack is engineered for the modern adventurer. Weighing just under a pound, it offers unmatched portability without sacrificing storage capacity. Featuring water-resistant fabric, multiple compartments, and ergonomic shoulder straps, this backpack is designed for those who need durability and convenience on the go. Perfect for day hikes, weekend getaways, or everyday use, the UltraLight Travel Backpack is your reliable companion on any journey." Explanation: In this example, the AI was given two product descriptions that demonstrate the desired style, tone, and structure of the output. When asked to generate a new description for the "UltraLight Travel Backpack," the AI used the patterns from the provided examples to create a similar, contextually appropriate product description. Despite only being shown two examples, the model effectively captured the key elements needed to generate a new, high-quality output. This approach is highly beneficial when you need the AI to produce consistent results across similar tasks, especially in scenarios where creating extensive training data is impractical. By providing just a few examples, you guide the AI's understanding, allowing it to apply the learned pattern to new, but related, tasks. 1.3 Chain of Thought The chain of thought patterns encourages the AI to generate responses that follow a logical sequence, mirroring human reasoning. This technique is particularly useful in complex scenarios where the AI needs to make decisions, solve problems, or explain concepts step-by-step. By structuring prompts that lead the AI through a series of thought processes, you can guide it to produce more coherent and rational outputs. This is especially effective in applications requiring detailed explanations, such as scientific reasoning, technical problem-solving, or any situation where the AI needs to justify its conclusions.For instance, a prompt might instruct the AI to break down a complex problem into smaller, manageable parts and tackle each one sequentially. The AI would first identify the key components of the problem, then work through each one, explaining its reasoning at each step. This method not only enhances the clarity of the response but also improves the accuracy and relevance of the AI’s conclusions. 2. Multi-modal Processing Multi-modal processing in Generative AI is a cutting-edge approach that allows AI systems to integrate and process multiple types of data—such as text, images, audio, and video—simultaneously. This capability is crucial for applications that require a deep understanding of content across different modalities, leading to more accurate and contextually enriched outputs. For instance, in a scenario where an AI is tasked with generating a description of a scene from a video, multi-modal processing enables it to analyze both the visual elements and the accompanying audio to produce a description that reflects not just what is seen but also the context provided by sound. Similarly, when processing text and images together, such as in a captioning task, the AI can better understand the relationship between the words and the visual content, leading to more precise and relevant captions. This advanced pattern is particularly beneficial in complex environments where understanding the nuances across different data types is key to delivering high-quality outputs. For example, in medical diagnostics, AI systems using multi-modal processing can analyze medical images alongside patient records and spoken notes to offer more accurate diagnoses. In customer service, AI can interpret and respond to customer queries by simultaneously analyzing text and voice tone, improving the quality of interactions. Moreover, multi-modal processing enhances the AI's ability to learn from varied data sources, allowing it to build more robust models that generalize better across different tasks. This makes it an essential tool in the development of AI applications that need to operate in real-world scenarios where data is rarely homogeneous. By leveraging multi-modal processing, AI systems can generate responses that are not only more comprehensive but also tailored to the specific needs of the task at hand, making them highly effective in a wide range of applications. As this technology continues to evolve, it promises to unlock new possibilities in fields as diverse as entertainment, education, healthcare, and beyond. Example In many situations, data may include both images and text that need to be analyzed together to gain comprehensive insights. To effectively process and integrate these different data types, you can utilize a multi-modal processing pipeline in SnapLogic. This approach allows the Generative AI model to simultaneously analyze data from both sources, maintaining the integrity of each modality. This pipeline is composed of two distinct stages. The first stage focuses on extracting images from the source data and converting them into base64 format. The second stage involves generating a prompt using advanced prompt engineering techniques, which is then fed into the Large Language Model (LLM). The visual representation of this process is divided into two parts, as shown in the picture above. Extract the image from the source Add the File Reader Snap: Drag and drop the “File Reader” Snap onto the designer. Configure the File Reader Snap: Click on the “File Reader” Snap to access its settings panel. Then, select a file that contains images. In this case, we select a pdf file. Add the PDF Parser Snap: Drag and drop the “PDF Parser” Snap onto the designer and set the parser type to be “Pages to images converter” Configure views: Click on the “Views” tab and then select the output to be “Binary”. Convert to Base64: Add and connect “Binary to Document” snap to the PDF Parser snap. Then, configure the encoding to ENCODE_BASE64. Construct the prompt and send it to the GenAI Add a JSON Generator Snap: Drag the JSON Generator Snap and connect it to the preceding Mapper Snap. Then, click “Edit JSON” to modify the JSON string in the JSON editor mode. AWS Claude on Message allows you to send images via the prompt by configuring the source attribute within the content. You can construct the image prompt as demonstrated in the screenshot. Provide instruction with Prompt Generator: Add the prompt generator Snap and connect it to the JSON Generator Snap. Next, select the “Advanced Prompt Output” checkbox to enable the advanced prompt payload. Finally, click “Edit Prompt” to enter your specific instructions. The advanced prompt output will be structured as an array of messages, as illustrated in the screenshot below. Send to GenAI: Add the AWS Claude on AWS Message Snap and enter your credentials to access the AWS Bedrock service. Ensure that the “Use Message Payload” checkbox is selected, and then configure the message payload using $messages, which is the output from the previous Snap. After completing these steps, you can process the image using the LLM independently. This approach allows the LLM to focus on extracting detailed information from the image. Once the image has been processed, you can then combine this data with other sources, such as text or structured data, to generate a more comprehensive and accurate analysis. This multi-modal integration ensures that the insights derived from different data types are effectively synthesized, leading to richer and more precise results. 3. Semantic Caching To optimize both the cost and response time associated with using Large Language Models (LLMs), implementing a semantic caching mechanism is a highly effective strategy. Semantic caching involves storing responses generated by the model and reusing them when the system encounters queries with the same or similar meanings. This approach not only enhances the overall efficiency of the system but also significantly reduces the operational costs tied to model usage. The fundamental principle behind semantic caching is that many user queries are often semantically similar, even if they are phrased differently. By identifying and caching the responses to these semantically equivalent queries, the system can bypass the need to repeatedly invoke the LLM, which is resource-intensive. Instead, the system can quickly retrieve and return the cached response, leading to faster response times and a more seamless user experience. From a cost perspective, semantic caching directly translates into savings. Each time the system serves a response from the cache rather than querying the LLM, it avoids the computational expense associated with generating a new response. This reduction in the number of LLM invocations directly correlates with lower service costs, making the solution more economically viable, particularly in environments with high query volumes. Additionally, semantic caching contributes to system scalability. As the demand on the LLM grows, the caching mechanism helps manage the load more effectively, ensuring that response times remain consistent even as the system scales. This is crucial for maintaining the quality of service, especially in real-time applications where latency is a critical factor. Implementing semantic caching as part of the LLM deployment strategy offers a dual benefit: optimizing response times for end-users and minimizing the operational costs of model usage. This approach not only enhances the performance and scalability of AI-driven systems but also ensures that they remain cost-effective and responsive as user demand increases. Implementation Concept for Semantic Caching Semantic caching is a strategic approach designed to optimize both response time and computational efficiency in AI-driven systems. The implementation of semantic caching involves the following key steps: Query Submission and Vectorization: When a user submits a query, the system first processes this input by converting it into an embedding—a vectorized representation of the query. This embedding captures the semantic meaning of the query, enabling efficient comparison with previously stored data. Cache Lookup and Matching: The system then performs a lookup in the vector cache, which contains embeddings of previous queries along with their corresponding responses. During this lookup, the system searches for an existing embedding that closely matches the new query's embedding. Matching Threshold: A critical component of this process is the match threshold, which can be adjusted to control the sensitivity of the matching algorithm. This threshold determines how closely the new query needs to align with a stored embedding for the cache to consider it a match. Cache Hit and Response Retrieval: If the system identifies a match within the defined threshold, it retrieves the corresponding response from the cache. This "cache hit" allows the system to deliver the response to the user rapidly, bypassing the need for further processing. By serving responses directly from the cache, the system conserves computational resources and reduces response times. Cache Miss and LLM Processing: In cases where no suitable match is found in the cache—a "cache miss"—the system forwards the query to the Large Language Model (LLM). The LLM processes the query and generates a new response, ensuring that the user receives a relevant and accurate answer even for novel queries. Response Storage and Cache Management: After the LLM generates a new response, the system not only delivers this response to the user but also stores the response along with its associated query embedding back into the vector cache. This step ensures that if a similar query is submitted in the future, the system can serve the response directly from the cache, further optimizing the system’s efficiency. Time-to-Live (TTL) Adjustment: To maintain the relevance and accuracy of cached responses, the system can adjust the Time-to-Live (TTL) for each entry in the cache. The TTL determines how long a response remains valid in the cache before it is considered outdated and automatically removed. By fine-tuning the TTL settings, the system ensures that only up-to-date and contextually appropriate responses are served, thereby preventing the use of stale or irrelevant data. Implement Semantic Caching in Snaplogic The concept of semantic caching can be effectively implemented within SnapLogic, leveraging its robust pipeline capabilities. Below is an outline of how this implementation can be achieved: Embedding the Query: The process begins with the embedding of the user’s query (prompt). Using SnapLogic's capabilities, an embedder, such as the Amazon Titan Embedder, is employed to convert the prompt into a vectorized representation. This embedding captures the semantic meaning of the prompt, making it suitable for comparison with previously stored embeddings. Vector Cache Lookup: Once the prompt has been embedded, the system proceeds to search for a matching entry in the vector cache. In this implementation, the Snowflake Vector Database serves as the vector cache, storing embeddings of past queries along with their corresponding responses. This lookup is crucial for determining whether a similar query has been processed before. Flow Routing with Router Snap: After the lookup, the system uses a Router Snap to manage the flow based on whether a match (cache hit) is found or not (cache miss). The Router Snap directs the workflow as follows: Cache Hit: If a matching embedding is found in the vector cache, the Router Snap routes the process to immediately return the cached response to the user. This ensures rapid response times by avoiding unnecessary processing. Cache Miss: If no match is found, the Router Snap directs the workflow to request a new response from the Large Language Model (LLM). The LLM processes the prompt and generates a new, relevant response. Storing and Responding: In the event of a cache miss, after the LLM generates a new response, the system not only sends this response to the user but also stores the new embedding and response in the Snowflake Vector Database for future use. This step enhances the efficiency of subsequent queries, as similar prompts can be handled directly from the cache. 4. Multiplexing AI Agents Multiplexing AI agents refers to a strategy where multiple generative AI models, each specialized in a specific task, are utilized in parallel to address complex queries. This approach is akin to assembling a panel of experts, where each agent contributes its expertise to provide a comprehensive solution. Here is the key feature of using multiplexing AI Agents Specialization: A central advantage of multiplexing AI agents is the specialization of each agent in handling specific tasks or domains. Multiplexing ensures that responses are more relevant and accurate by assigning each AI model to a particular area of expertise. For example, one agent might be optimized for natural language understanding, another for technical problem-solving, and a third for summarizing complex data. This allows the system to handle multi-dimensional queries effectively, as each agent focuses on what it does best. This specialization significantly reduces the likelihood of errors or irrelevant responses, as the AI agents are tailored to their specific tasks. In scenarios where a query spans multiple domains—such as asking a technical question with a business aspect—the system can route different parts of the query to the appropriate agent. This structured approach allows for extracting more relevant and accurate information, leading to a solution that addresses all facets of the problem. Parallel Processing: Multiplexing AI agents take full advantage of parallel processing capabilities. By running multiple agents simultaneously, the system can tackle different aspects of a query at the same time, speeding up the overall response time. This parallel approach enhances both performance and scalability, as the workload is distributed among multiple agents rather than relying on a single model to process the entire task. For example, in a customer support application, one agent could handle the analysis of a customer’s previous interactions while another agent generates a response to a technical issue, and yet another creates a follow-up action plan. Each agent works on its respective task in parallel, and the system integrates its outputs into a cohesive response. This method not only accelerates problem-solving but also ensures that different dimensions of the problem are addressed simultaneously. Dynamic Task Allocation: In a multiplexing system, dynamic task allocation is crucial for efficiently distributing tasks among the specialized agents. A larger, general-purpose model, such as AWS Claude 3 Sonet, can act as an orchestrator, assessing the context of the query and determining which parts of the task should be delegated to smaller, more specialized agents. The orchestrator ensures that each task is assigned to the model best equipped to handle it. For instance, if a user submits a complex query about legal regulations and data security, the general model can break down the query, sending legal-related questions to an AI agent specialized in legal analysis and security-related queries to a security-focused agent like TinyLlama or a similar model. This dynamic delegation allows for the most relevant models to be used at the right time, improving both the efficiency and accuracy of the overall response. Integration of Outputs: Once the specialized agents have processed their respective tasks, the system must integrate their outputs to form a cohesive and comprehensive response. This integration is a critical feature of multiplexing, as it ensures that all aspects of a query are addressed without overlap or contradiction. The system combines the insights generated by each agent, creating a final output that reflects the full scope of the user’s request. In many cases, the integration process also includes filtering or refining the outputs to remove any inconsistencies or redundancies, ensuring that the response is logical and cohesive. This collaborative approach increases the reliability of the system, as it allows different agents to complement one another’s knowledge and expertise. Additionally, multiplexing reduces the likelihood of hallucinations—incorrect or nonsensical outputs that can sometimes occur with single, large-scale models. By dividing tasks among specialized agents, the system ensures that each part of the problem is handled by an AI that is specifically trained for that domain, minimizing the chance of erroneous or out-of-context responses. Improved Accuracy and Contextual Understanding: Multiplexing AI agents contribute to improved overall accuracy by distributing tasks to models that are more finely tuned to specific contexts or subjects. This approach ensures that the AI system can better understand and address the nuances of a query, particularly when the input involves complex or highly specialized information. Each agent’s deep focus on a specific task leads to a higher level of precision, resulting in a more accurate final output. Furthermore, multiplexing allows the system to build a more detailed contextual understanding. Since different agents are responsible for different elements of a task, the system can synthesize more detailed and context-aware responses. This holistic view is crucial for ensuring that the solution provided is not only accurate but also relevant to the specific situation presented by the user. In SnapLogic, we offer comprehensive support for building advanced workflows by integrating our GenAI Builder Snap. This feature allows users to incorporate generative AI capabilities into their workflow automation processes seamlessly. By leveraging the GenAI Builder Snap, users can harness the power of artificial intelligence to automate complex decision-making, data processing, and content generation tasks within their existing workflows. This integration provides a streamlined approach to embedding AI-driven functionalities, enhancing both efficiency and precision across various operational domains. For instance, users can design workflows where the GenAI Builder Snap collaborates with other SnapLogic components, such as data pipelines and transformation processes, to deliver intelligent, context-aware automation tailored to their unique business needs. In the example pipelines, the system sends a prompt simultaneously to multiple AI agents, each with its specialized area of expertise. These agents independently process the specific aspects of the prompt related to their specialization. Once the agents generate their respective outputs, the results are then joined together to form a cohesive response. To further enhance the clarity and conciseness of the final output, a summarization agent is employed. This summarization agent aggregates and refines the detailed responses from each specialized agent, distilling the information into a concise, unified summary that captures the key points from all the agents, ensuring a coherent and well-structured final response. 5. Multi-agent conversation Multi-agent conversation refers to the interaction and communication between multiple autonomous agents, typically AI systems, working together to achieve a shared goal. This framework is widely used in areas like collaborative problem-solving, multi-user systems, and complex task coordination where multiple perspectives or expertise areas are required. Unlike a single-agent conversation, where one AI handles all inputs and outputs, a multi-agent system divides tasks among several specialized agents, allowing for greater efficiency, deeper contextual understanding, and enhanced problem-solving capabilities. Here are the key features of using multi-agent conversations. Specialization and Expertise: Each agent in a multi-agent system is designed with a specific role or domain of expertise. This allows the system to leverage agents with specialized capabilities to handle different aspects of a task. For example, one agent might focus on natural language processing (NLP) to understand input, while another might handle complex calculations or retrieve data from external sources. This division of labor ensures that tasks are processed by the most capable agents, leading to more accurate and efficient results. Specialization reduces the likelihood of errors and allows for a deeper, domain-specific understanding of the problem. Collaboration and Coordination: In a multi-agent conversation, agents don’t work in isolation—they collaborate to achieve a shared goal. Each agent contributes its output to the broader conversation, sharing information and coordinating actions to ensure that the overall task is completed successfully. This collaboration is crucial when handling complex problems that require input from multiple domains. Effective coordination ensures that agents do not duplicate work or cause conflicts. Through predefined protocols or negotiation mechanisms, agents are able to work together harmoniously, producing a coherent solution that integrates their various inputs. Scalability: Multi-agent systems are inherently scalable, making them ideal for handling increasingly complex tasks. As the system grows in complexity or encounters new challenges, additional agents with specific skills can be introduced without overloading the system. Each agent can work independently, and the system's modular design allows for smooth expansion. Scalability ensures that the system can handle larger datasets, more diverse inputs, or more complex tasks as the environment evolves. This adaptability is essential in dynamic environments where workloads or requirements change over time. Distributed Decision-Making: In a multi-agent system, decision-making is often decentralized, meaning each agent has the autonomy to make decisions based on its expertise and the information available to it. This distributed decision-making process allows agents to handle tasks in parallel, without needing constant oversight from a central controller. Since agents can operate independently, decisions are made more quickly, and bottlenecks are avoided. This decentralized approach also enhances the system's resilience, as it avoids over-reliance on a single decision point and enables more adaptive and localized problem-solving. Fault Tolerance and Redundancy: Multi-agent systems are naturally resilient to errors and failures. Since each agent operates independently, the failure of one agent does not disrupt the entire system. Other agents can continue their tasks or, if necessary, take over the work of a failed agent. This built-in redundancy ensures the system can continue functioning even when some agents encounter issues. Fault tolerance is particularly valuable in complex systems, as it enhances reliability and minimizes downtime, allowing the system to maintain performance even under adverse conditions. SnapLogic provides robust capabilities for integrating workflow automation with Generative AI (GenAI), allowing users to seamlessly build advanced multi-agent conversation systems by combining the GenAI Snap with other Snaps within their pipeline. This integration enables users to create sophisticated workflows where multiple AI agents, each with their specialization, collaborate to process complex queries and tasks. In this example, we demonstrate a simple implementation of a multi-agent conversation system, leveraging a manager agent to oversee and control the workflow. The process begins by submitting a prompt to a large foundational model, which, in this case, is AWS Claude 3 Sonet. This model acts as the manager agent responsible for interpreting the prompt and determining the appropriate routing for different parts of the task. Based on the content and context of the prompt, the manager agent makes decisions on how to distribute the workload across specialized agents. After the initial prompt is processed, we utilize the Router Snap to dynamically route the output to the corresponding specialized agents. Each agent is tailored to handle a specific domain or task, such as data analysis, natural language processing, or knowledge retrieval, ensuring that the most relevant and specialized agent addresses each part of the query. Once the specialized agents have completed their respective tasks, their outputs are gathered and consolidated. The system then sends the final, aggregated result to the output destination. This approach ensures that all aspects of the query are addressed efficiently and accurately, with each agent contributing its expertise to the overall solution. The flexibility of SnapLogic’s platform, combined with the integration of GenAI models and Snaps, makes it easy for users to design, scale, and optimize complex multi-agent conversational workflows. By automating task routing and agent collaboration, SnapLogic enables more intelligent, scalable, and context-aware solutions for addressing a wide range of use cases, from customer service automation to advanced data processing. 6. Retrieval Augment Generation (RAG) To enhance the specificity and relevance of responses generated by a Generative AI (GenAI) model, it is crucial to provide the model with sufficient context. Contextual information helps the model understand the nuances of the task at hand, enabling it to generate more accurate and meaningful outputs. However, in many cases, the amount of context needed to fully inform the model exceeds the token limit that the model can process in a single prompt. This is where a technique known as Retrieval-Augmented Generation (RAG) becomes particularly valuable. RAG is designed to optimize the way context is fed into the GenAI model. Rather than attempting to fit all the necessary information into the limited input space, RAG utilizes a retrieval mechanism that dynamically sources relevant information from an external knowledge base. This approach allows users to overcome the token limit challenge by fetching only the most pertinent information at the time of query generation, ensuring that the context provided to the model remains focused and concise. The RAG framework can be broken down into two primary phases: Embedding Knowledge into a Vector Database: In the initial phase, the relevant content is embedded into a vector space using a machine learning model that transforms textual data into a format conducive to similarity matching. This embedding process effectively converts text into vectors, making it easier to store and retrieve later based on its semantic meaning. Once embedded, the knowledge is stored in a vector database for future access. In SnapLogic, embedding knowledge into a vector database can be accomplished through a streamlined pipeline designed for efficiency and scalability. The process begins with reading a PDF file using the File Reader Snap, followed by extracting the content with the PDF Parser Snap, which converts the document into a structured text format. Once the text is available, the Chunker Snap is used to intelligently segment the content into smaller, manageable chunks. These chunks are specifically sized to align with the input constraints of the model, ensuring optimal performance during later stages of retrieval. After chunking the text, each segment is processed and embedded into a vector representation, which is then stored in the vector database. This enables efficient similarity-based retrieval, allowing the system to quickly access relevant pieces of information as needed. By utilizing this pipeline in SnapLogic, users can easily manage and store large volumes of knowledge in a way that supports high-performance, context-driven AI applications. Retrieving Context through Similarity Matching: When a query is received, the system performs similarity matching to retrieve the most relevant content from the vector database. By evaluating the similarity between the embedded query and the stored vectors, RAG identifies the most pertinent pieces of information, which are then used to augment the input prompt. This step ensures that the GenAI model receives focused and contextually enriched data, allowing it to generate more insightful and accurate responses. To retrieve relevant context from the vector database in SnapLogic, users can leverage an embedder snap, such as the AWS Titan Embedder, to transform the incoming prompt into a vector representation. This vector serves as the key for performing a similarity-based search within the vector database where the previously embedded knowledge is stored. The vector search mechanism efficiently identifies the most relevant pieces of information, ensuring that only the most contextually appropriate content is retrieved. Once the pertinent knowledge is retrieved, it can be seamlessly integrated into the overall prompt-generation process. This is typically achieved by feeding the retrieved context into a prompt generator snap, which structures the information in a format optimized for use by the Generative AI model. In this case, the final prompt, enriched with the relevant context, is sent to the GenAI Snap, such as Anthropic Claude within the AWS Messages Snap. This approach ensures that the model receives highly specific and relevant information, ultimately enhancing the accuracy and relevance of its generated responses. By implementing RAG, users can fully harness the potential of GenAI models, even when dealing with complex queries that demand a significant amount of context. This approach not only enhances the accuracy of the model's responses but also ensures that the model remains efficient and scalable, making it a powerful tool for a wide range of real-world applications. 7. Tool Calling and Contextual instruction Traditional GenAI models are limited by the data they were trained on. Once trained, these models cannot access new or updated information unless they are retrained. This limitation means that without external input, models can only generate responses based on the static content within their training corpus. However, in a world where data is constantly evolving, relying on static knowledge is often inadequate, especially for tasks that require current or real-time information. In many real-world applications, Generative AI (GenAI) models need access to real-time data to generate contextually accurate and relevant responses. For example, if a user asks for the current weather in a particular location, the model cannot rely solely on pre-trained knowledge, as this data is dynamic and constantly changing. In such scenarios, traditional prompt engineering techniques are insufficient, as they primarily rely on static information that was available at the time of the model's training. This is where the tool-calling technique becomes invaluable. Tool calling refers to the ability of a GenAI model to interact with external tools, APIs, or databases to retrieve specific information in real-time. Instead of relying on its internal knowledge, which may be outdated or incomplete, the model can request up-to-date data from external sources and use it to generate a response that is both accurate and contextually relevant. This process significantly expands the capabilities of GenAI, allowing it to move beyond static, pre-trained content and incorporate dynamic, real-world data into its responses. For instance, when a user asks for live weather updates, stock market prices, or traffic conditions, the GenAI model can trigger a tool call to an external API—such as a weather service, financial data provider, or mapping service—to fetch the necessary data. This fetched data is then integrated into the model’s response, enabling it to provide an accurate and timely answer that would not have been possible using static prompts alone. Contextual instruction plays a critical role in the tool calling process. Before calling an external tool, the GenAI model must understand the nature of the user’s request and identify when external data is needed. For example, if a user asks, "What is the weather like in Paris right now?" the model recognizes that the question requires real-time weather information and that this cannot be answered based on internal knowledge alone. The model is thus programmed to trigger a tool call to a relevant weather service API, retrieve the live weather data for Paris, and incorporate it into the final response. This ability to understand and differentiate between static knowledge (which can be answered with pre-trained data) and dynamic, real-time information (which requires external tool calling) is essential for GenAI models to operate effectively in complex, real-world environments. Use Cases for Tool Calling Real-Time Data Retrieval: GenAI models can call external APIs to retrieve real-time data such as weather conditions, stock prices, news updates, or live sports scores. These tool calls ensure that the AI provides up-to-date and accurate responses that reflect the latest information. Complex Calculations and Specialized Tasks: Tool calling allows AI models to handle tasks that require specific calculations or domain expertise. For instance, an AI model handling a financial query can call an external financial analysis tool to perform complex calculations or retrieve historical stock market data. Integration with Enterprise Systems: In business environments, GenAI models can interact with external systems such as CRM platforms, ERP systems, or databases to retrieve or update information in real time. For example, a GenAI-driven customer service bot can pull account information from a CRM system or check order statuses from an external order management tool. Access to Specialized Knowledge: Tool calling allows AI models to fetch specialized information from databases or knowledge repositories that fall outside their domain of training. For example, a medical AI assistant could call an external database of medical research papers to provide the most current treatment options for a particular condition. Implementation of Tool Calling in Generative AI Systems Tool calling has become an integral feature in many advanced Generative AI (GenAI) models, allowing them to extend their functionality by interacting with external systems and services. For instance, AWS Anthropic Claude supports tool calling via the Message API, providing developers with a structured way to integrate external data and functionality directly into the model's response workflow. This capability allows the model to enhance its responses by incorporating real-time information, performing specific functions, or utilizing external APIs that provide specialized data beyond the model's training. To implement tool calling with AWS Anthropic Claude, users can leverage the Message API, which allows for seamless integration with external systems. The tool calling mechanism is activated by sending a message with a specific "tools" parameter. This parameter defines how the external tool or API will be called, using a JSON schema to structure the function call. This approach enables the GenAI model to recognize when external input is required and initiate a tool call based on the instructions provided. Implementation process Defining the Tool Schema: To initiate a tool call, users need to send a request with the "tools" parameter. This parameter is defined in a structured JSON schema, which includes details about the external tool or API that the GenAI model will call. The JSON schema outlines how the tool should be used, including the function name, parameters, and any necessary inputs for making the call. For example, if the tool is a weather API, the schema might define parameters such as location and time, allowing the model to query the API with these inputs to retrieve current weather data. Message Structure and Request Initiation: Once the tool schema is defined, the user can send a message to AWS Anthropic Claude containing the "tools" parameter alongside the prompt or query. The model will then interpret the request and, based on the context of the conversation or task, determine if it needs to call the external tool specified in the schema. If a tool call is required, the model will respond with a "stop_reason" value of "tool_use". This response indicates that the model is pausing its generation to call the external tool, rather than completing the response using only its internal knowledge. Tool Call Execution: When the model responds with "stop_reason": "tool_use", it signals that the external API or function should be called with the inputs provided. At this point, the external API (as specified in the JSON schema) is triggered to fetch the required data or perform the designated task. For example, if the user asks, "What is the weather in New York right now?", and the JSON schema defines a weather API tool, the model will pause and call the API with the location parameter set to "New York" and the time parameter set to "current." Handling the API Response: After the external tool processes the request and returns the result, the user (or system) sends a follow-up message containing the "tool_result". This message includes the output from the tool call, which can then be integrated into the ongoing conversation or task. In practice, this might look like a weather API returning a JSON object with temperature, humidity, and weather conditions. The response is passed back to the GenAI model via a user message, which contains the "tool_result" data. Final Response Generation: Once the model receives the "tool_result", it processes the data and completes the response. This allows the GenAI model to provide a final answer that incorporates real-time or specialized information retrieved from the external system. In our weather example, the final response might be, "The current weather in New York is 72°F with clear skies." Currently, SnapLogic does not yet provide native support for tool calling within the GenAI Snap Pack. However, we recognize the immense potential and value this feature can bring to users, enabling seamless integration with external systems and services for real-time data and advanced functionalities. We are actively working on incorporating tool calling capabilities into future updates of the platform. This enhancement will further empower users to build more dynamic and intelligent workflows, expanding the possibilities of automation and AI-driven solutions. We are excited about the potential it holds and look forward to sharing these innovations soon 8. Memory Cognition for LLMs Most large language models (LLMs) operate within a context window limitation, meaning they can only process and analyze a finite number of tokens (words, phrases, or symbols) at any given time. This limitation poses significant challenges, particularly when dealing with complex tasks, extended dialogues, or interactions that require long-term contextual understanding. For example, if a conversation or task extends beyond the token limit, the model loses awareness of earlier portions of the interaction, leading to responses that may become disconnected, repetitive, or contextually irrelevant. This limitation becomes especially problematic in applications where maintaining continuity and coherence across long interactions is crucial. In customer service scenarios, project management tools, or educational applications, it is often necessary to remember detailed information from earlier exchanges or to track progress over time. However, traditional models constrained by a fixed token window struggle to maintain relevance in such situations, as they are unable to "remember" or access earlier parts of the conversation once the context window is exceeded. To address these limitations and enable LLMs to handle longer and more complex interactions, we employ a technique known as memory cognition. This technique extends the capabilities of LLMs by introducing mechanisms that allow the model to retain, recall, and dynamically integrate past interactions or information, even when those interactions fall outside the immediate context window. Memory Cognition Components in Generative AI Applications To successfully implement memory cognition in Generative AI (GenAI) applications, a comprehensive and structured approach is required. This involves integrating various memory components that work together to enable the AI system to retain, retrieve, and utilize relevant information across different interactions. Memory cognition enables the AI model to go beyond stateless, short-term processing, creating a more context-aware, adaptive, and intelligent system capable of long-term interaction and decision-making. Here are the key components of memory cognition that must be considered when developing a GenAI application: Short-Term Memory (Session Memory) Short-term memory, commonly referred to as session memory, encompasses the model's capability to retain context and information during a single interaction or session. This component is vital for maintaining coherence in multi-turn conversations and short-term tasks. It enables the model to sustain continuity in its responses by referencing earlier parts of the conversation, thereby preventing the user from repeating previously provided information. Typically, short-term memory is restricted to the duration of the interaction. Once the session concludes or a new session begins, the memory is either reset or gradually decayed. This ensures the model can recall relevant details from earlier in the same session, creating a more seamless and fluid conversational experience. For example, in a customer service chatbot, short-term memory allows the AI to remember a customer’s issue throughout the conversation, ensuring that the problem is consistently addressed without needing the user to restate it multiple times. However, in large language models, short-term memory is often limited by the model's context window, which is constrained by the maximum number of tokens it can process in a single prompt. As new input is added during the conversation, older dialogue parts may be discarded or forgotten, depending on the token limit. This necessitates careful management of short-term memory to ensure that critical information is retained throughout the session. Long-Term Memory Long-term memory significantly enhances the model's capability by allowing it to retain information beyond the scope of a single session. Unlike short-term memory, which is confined to a single interaction, long-term memory persists across multiple interactions, enabling the AI to recall important information about users, their preferences, past conversations, or task-specific details, regardless of the time elapsed between sessions. This type of memory is typically stored in an external database or knowledge repository, ensuring it remains accessible over time and does not expire when a session ends. Long-term memory is especially valuable in applications that require the retention of critical or personalized information, such as user preferences, history, or recurring tasks. It allows for highly personalized interactions, as the AI can reference stored information to tailor its responses based on the user's previous interactions. For example, in virtual assistant applications, long-term memory enables the AI to remember a user's preferences—such as their favorite music or regular appointment times—and use this information to provide customized responses and recommendations. In enterprise environments, such as customer support systems, long-term memory enables the AI to reference previous issues or inquiries from the same user, allowing it to offer more informed and tailored assistance. This capability enhances the user experience by reducing the need for repetition and improving the overall efficiency and effectiveness of the interaction. Long-term memory, therefore, plays a crucial role in enabling AI systems to deliver consistent, contextually aware, and personalized responses across multiple sessions. Memory Management Dynamic memory management refers to the AI model’s ability to intelligently manage and prioritize stored information, continuously adjusting what is retained, discarded, or retrieved based on its relevance to the task at hand. This capability is crucial for optimizing both short-term and long-term memory usage, ensuring that the model remains responsive and efficient without being burdened by irrelevant or outdated information. Effective dynamic memory management allows the AI system to adapt its memory allocation in real-time, based on the immediate requirements of the conversation or task. In practical terms, dynamic memory management enables the AI to prioritize important information, such as key facts, user preferences, or contextually critical data, while discarding or de-prioritizing trivial or outdated details. For example, during an ongoing conversation, the system may focus on retaining essential pieces of information that are frequently referenced or highly relevant to the user’s current query, while allowing less pertinent information to decay or be removed. This process ensures that the AI can maintain a clear focus on what matters most, enhancing both accuracy and efficiency. To facilitate this, the system often employs relevance scoring mechanisms to evaluate and rank the importance of stored memories. Each piece of memory can be assigned a priority score based on factors such as how frequently it is referenced or its importance to the current task. Higher-priority memories are retained for longer periods, while lower-priority or outdated entries may be marked for removal. This scoring system helps prevent memory overload by ensuring that only the most pertinent information is retained over time. Dynamic memory management also includes memory decay mechanisms, wherein older or less relevant information gradually "fades" or is automatically removed from storage, preventing memory bloat. This ensures that the AI retains only the most critical data, avoiding inefficiencies and ensuring optimal performance, especially in large-scale applications that involve substantial amounts of data or memory-intensive operations. To further optimize resource usage, automated processes can be implemented to "forget" memory entries that have not been referenced for a significant amount of time or are no longer relevant to ongoing tasks. These processes ensure that memory resources, such as storage and processing power, are allocated efficiently, particularly in environments with large-scale memory requirements. By dynamically managing memory, the AI can continue to provide contextually accurate and timely responses while maintaining a balanced and efficient memory system. Implementation of memory cognition in Snaplogic SnapLogic provides robust capabilities for integrating with databases and storage systems, making it an ideal platform for creating workflows to manage memory cognition in AI applications. In the following example, we demonstrate a basic memory cognition pattern using SnapLogic to handle both short-term and long-term memory. Overview of the Workflow The workflow begins by embedding the prompt into a vector representation. This vector is then used to retrieve relevant memories from long-term memory storage. Long-term memory can be stored in a vector database, which is well-suited for similarity-based retrieval, or in a traditional database or key-value store, depending on the application requirements. Similarly, short-term memory can be stored in a regular database or a key-value store to keep track of recent interactions. Retrieving Memories Once the prompt is embedded, we retrieve relevant information from both short-term and long-term memory systems. The retrieval process is based on similarity scoring, where the similarity score indicates the relevance of the stored memory to the current prompt. For long-term memory, this typically involves querying a vector database, while short-term memory may be retrieved from a traditional relational database or key-value store. After retrieving the relevant memories from both systems, the data is fed into a memory management module. In this example, we implement a simple memory management mechanism using a script within SnapLogic. Memory Management The memory management module employs a sliding window technique, which is a straightforward yet effective way to manage memory. As new memory is added, older memories gradually fade out until they are removed from the memory stack. This ensures that the AI retains the most recent and relevant information while discarding outdated or less useful memories. The sliding window mechanism prioritizes newer or more relevant memories, placing them at the top of the memory stack, while older memories are pushed out over time. Generating the Final Prompt and Interacting with the LLM Once the memory management module has constructed the full context by combining short-term and long-term memory, the system generates the final prompt. This prompt is then sent to the language model for processing. In this case, we use AWS Claude through the Message API as the large language model (LLM) to generate a response based on the provided context. Updating Memory Upon receiving a response from the LLM, the workflow proceeds to update both short-term and long-term memory systems to ensure continuity and relevance in future interactions: Long-Term Memory: The long-term memory is refreshed by associating the original prompt with the LLM's response. In this context, the query key corresponds to the initial prompt, while the value is the response generated by the model. This update enables the system to store pertinent knowledge that can be accessed during future interactions, allowing for more informed and contextually aware responses over time. Short-Term Memory: The short-term memory is updated by appending the LLM's response to the most recent memory stack. This process ensures that the immediate context of the current conversation is maintained, allowing for seamless transitions and consistency in subsequent interactions within the session. This example demonstrates how SnapLogic can be effectively used to manage memory cognition in AI applications. By integrating with databases and leveraging SnapLogic’s powerful workflow automation, we can create an intelligent memory management system that handles both short-term and long-term memory. The sliding window mechanism ensures that the AI remains contextually aware while avoiding memory overload, and AWS Claude provides the processing power to generate responses based on rich contextual understanding. This approach offers a scalable and flexible solution for managing memory cognition in AI-driven workflows.
BankTanapat
2 years ago Place SnapLogic Technical Blog
1.9KViews
4likes
0Comments
Using Mustache Templating with the Prompt Generator Snap in SnapLogic
In the world of AI-driven data integration, the ability to dynamically generate prompts is crucial for creating adaptable and responsive workflows. The Prompt Generator Snap in SnapLogic leverages Mustache templating to allow users to craft dynamic text outputs based on input data. This whitepaper aims to educate users on the fundamentals of Mustache templating and how to effectively utilize it within the Prompt Generator Snap.
AaronK
2 years ago Place SnapLogic Technical Blog
1.8KViews
3likes
0Comments
Multimodal Processing in LLM
Multimodal processing in Generative AI represents a transformative leap in how AI systems extract and synthesize information from multiple data types—such as text, images, audio, and video—simultaneously. Unlike traditional single-modality AI models, which focus on one type of data, Multimodal systems integrate and process diverse data streams in parallel, creating a holistic understanding of complex scenarios. This integrated approach is critical for applications that require not just isolated insights from one modality, but a coherent synthesis across different data sources, leading to outputs that are contextually richer and more accurate. Generative AI, with multimodal processing, is redefining text extraction, surpassing traditional OCR by interpreting text within its visual and contextual environment. Unlike OCR, which only converts images to text, generative AI analyzes the surrounding image context, layout, and meaning, enhancing accuracy and depth. For instance, in complex documents, it can differentiate between headings, body text, and annotations, structuring information more intelligently. Additionally, it excels in low-quality or multilingual texts, making it invaluable in industries requiring precision and nuanced interpretation. In video analysis, a generative AI equipped with Multimodal processing can simultaneously interpret the visual elements of a scene, the audio (such as dialogue or background sounds), and any associated text (like subtitles or metadata). This allows the AI to produce a description or summary of the scene that is far more nuanced than what could be achieved by analyzing the video or audio alone. The interplay between these modalities ensures that the generated description reflects not only the visual and auditory content but also the deeper context and meaning derived from their combination. In tasks such as image captioning, Multimodal AI systems go beyond simply recognizing objects in a photo. They can interpret the semantic relationship between the image and accompanying text, enhancing the relevance and specificity of the generated captions. This capability is particularly useful in fields where the context provided by one modality significantly influences the interpretation of another, such as in journalism, where images and written reports must align meaningfully, or in education, where visual aids are integrated with instructional text. Multimodal processing enables AI to synthesize medical images (such as X-rays or MRIs) with patient history, clinical notes, and even live doctor-patient interactions in highly specialized applications like medical diagnostics. This comprehensive analysis allows the AI to provide more accurate diagnoses and treatment recommendations, addressing the complex interplay of symptoms, historical data, and visual diagnostics. Similarly, in customer service, Multimodal AI systems can improve communication quality by analyzing the textual content of a customer's inquiry and the tone and sentiment of their voice, leading to more empathetic and effective responses. Beyond individual use cases, Multimodal processing plays a crucial role in improving the learning and generalization capabilities of AI models. By training on a broader spectrum of data types, AI systems develop more robust, flexible models that can adapt to a wider variety of tasks and scenarios. This is especially important in real-world environments where data is often heterogeneous and requires cross-modal understanding to interpret fully. As Multimodal processing technologies continue to advance, they promise to unlock new capabilities across diverse sectors. In entertainment, Multimodal AI could enhance interactive media experiences by seamlessly integrating voice, visuals, and narrative elements. In education, it could revolutionize personalized learning by adapting content delivery to different sensory inputs. In healthcare, the fusion of Multimodal data could lead to breakthroughs in precision medicine. Ultimately, the ability to understand and generate contextually rich, Multimodal content positions Generative AI as a cornerstone technology in the next wave of AI-driven innovation. Multimodal Content Generator Snap The Multimodal Content Generator Snap encodes file or document inputs into the Snap's multimodal content format, preparing it for seamless integration. The output from this Snap must be connected to the Prompt Generator Snap to complete and format the message payload for further processing. This streamlined setup enables efficient multimodal content handling within the Snap ecosystem. The Snap Properties Type - Select the type of multimodal content. Content Type - Define the specific content type for data transmitted to the LLM. Content - Specify the content path to the multimodal content data for processing. Document Name - Name the document for reference and identification purposes. Aggregate Input - Enable this option to combine all inputs into a single content. Encode Base64 - Enable this option to convert the text input into Base64 encoding. Note: The Content property appears only if the input view is of the document type. The value assigned to Content must be in Base64 format for document inputs, while Snap will automatically use binary as content for binary input types. The Document Name can be set specifically for multimodal document types. The Encode Base64 property encodes text input into Base64 by default. If unchecked, the content will be passed through without encoding. Designing a Multimodal Prompt Workflow In this process, we will integrate multiple Snaps to create a seamless workflow for multimodal content generation and prompt delivery. By connecting the Multimodal Content Generator Snap to the Prompt Generator Snap, we configure it to handle multimodal content. The finalized message payload will then be sent to Claude by Anthropic Claude on AWS Messages. Steps: 1. Add the File Reader Snap: Drag and drop the File Reader Snap onto the designer canvas. Configure the File Reader Snap by accessing its settings panel, then select a file containing images (e.g., a PDF file). Download the sample image files at the bottom of this post if you have not already. Sample image file (Japan_flowers.jpg) 2. Add the Multimodal Content Generator Snap: Drag and drop the Multimodal Content Generator Snap onto the designer and connect it to the File Reader Snap. Open its settings panel, select the file type, and specify the appropriate content type. Here's a refined description of the output attributes from the Multimodal Content Generator: sl_content: Contains the actual content encoded in Base64 format. sl_contentType: Indicates the content type of the data. This is either selected from the configuration or, if the input is a binary, it extracts the contentType from the binary header. sl_type: Specifies the content type as defined in the Snap settings; in this case, it will display "image." 3. Add the Prompt Generator Snap: Add the Prompt Generator Snap to the designer and link it to the Multimodal Content Generator Snap. In the settings panel, enable the Advanced Prompt Output checkbox and configure the Content property to use the input from the Multimodal Content Generator Snap. Click “Edit Prompt” and input your instructions 4. Add and Configure the LLM Snap: Add the Anthropic Claude on AWS Message API Snap as the LLM. Connect this Snap to the Prompt Generator Snap. In the settings, select a model that supports multimodal content. Enable the Use Message Payload checkbox and input the message payload in the Message Payload field. 5. Verify the Result: Review the output from the LLM Snap to ensure the multimodal content has been processed correctly. Validate that the generated response aligns with the expected content and format requirements. If adjustments are needed, revisit the settings in previous Snaps to refine the configuration. Multimodal Models for Advanced Data Extraction Multimodal models are redefining data extraction by advancing beyond traditional OCR capabilities. Unlike OCR, which primarily converts images to text, these models directly analyze and interpret content within PDFs and images, capturing complex contextual information such as layout, formatting, and semantic relationships that OCR alone cannot achieve. By understanding both textual and visual structures, multimodal AI can manage intricate documents, including tables, forms, and embedded graphics, without requiring separate OCR processes. This approach not only enhances accuracy but also optimizes workflows by reducing dependency on traditional OCR tools. In today’s data-rich environment, information is often presented in varied formats, making the ability to analyze and derive insights from diverse data sources essential. Imagine managing a collection of invoices saved as PDFs or photos from scanners and smartphones, where a streamlined approach is needed to interpret their contents. Multimodal large language models (LLMs) excel in these scenarios, enabling seamless extraction of information across file types. These models support tasks such as automatically identifying key details, generating comprehensive summaries, and analyzing trends within invoices whether from scanned documents or images. Here’s a step-by-step guide to implementing this functionality within SnapLogic. Sample invoice files (download the files at the bottom of this post if you have not already) Invoice1.pdf Invoice2.pdf Invoice3.jpeg (Sometimes, the invoice image might be tilted) Upload the invoice files Open Manager page and go to your project that will be used to store the pipelines and related files Click the + (plus) sign and select File The Upload File dialog pops up. Click “Choose Files” to select all the invoice files both PDF and image formats (download the sample invoice files at the bottom of this post if you have not already) Click Upload button and the uploaded files will be shown. Building the pipeline Add the JSON Generator Snap: Drag and drop the JSON Generator onto the designer canvas. Click on the Snap to open settings, then click the "Edit JSON" button Highlight all the text from the template and delete it. Paste all invoice filenames in the format below. The editor should look like this. Click "OK" in the lower-right corner to save the prompt Save the settings and close the Snap Add the File Reader Snap: Drag and drop the File Reader Snap onto the designer canvas Click the Snap to open the configuration panel. Connect the Snap to the JSON Generator Snap by following these steps: Select Views tab Click plus(+) button on the Input pane to add the input view(input0) Save the configuration The Snap on the canvas will have the input view. Connecting it to the JSON Generator Snap In the configuration panel, select the Settings tab Set the File field by enabling expression by clicking the equal sign in front of the text input and set it to $filename to read all the files we specified in the JSON Generator Snap Validate the pipeline to see the File Reader output. Fields that will be used in the Multimodal Content Generator Snap Content-type shows file content type Content-location shows the file path and it will be used in the document name Add the Multimodal Content Generator Snap: Drag and drop the Multimodal Content Generator Snap onto the designer canvas and connect to the File Reader Snap Click the Snap to open the settings panel and configure the following fields: Type: enable the expression set the value to $['content-location'].endsWith('.pdf') ? 'document' : 'image' Document name enable the expression set the value to $['content-location'].snakeCase() Use the snake-case version of the file path as the document name to identify each file and make it compatible with the Amazon Bedrock Converse API. In snake case, words are lowercase and separated by underscores(_). Aggregate input check the checkbox Use this option to combine all input files into a single document. The settings should now look like the following Validate the pipeline to see the Multimodal Content Generator Snap output. The preview output should look like the below image. The sl_type will be document for the pdf file and image for the image file and the name will be the simplified file path. Add the Prompt Generator Snap: Drag and drop the Prompt Generator Snap onto the designer canvas and connect to the Multimodal Content Generator Snap Click the Snap to open the settings panel and configure the following fields: Enable the Advanced Prompt Output checkbox Set the Content to $content to use the content input from the Multimodal Content Generator Snap Click “Edit Prompt” and input your instructions. For example, Based on the total quantity across all invoices, which product has the highest and lowest purchase quantities, and in which invoices are these details found? Add and Configure the LLM Snap: Add the Amazon Bedrock Converse API Snap as the LLM Connect this Snap to the Prompt Generator Snap Click the Snap to open the configuration panel Select the Account tab and select your account Select the Settings tab Select a model that supports multimodal content. Enable the Use Message Payload checkbox Set the Message Payload to $messages to use the message from the Prompt Generator Snap Verify the result: Validate the pipeline and open the preview of the Amazon Bedrock Converse API Snap. The result should look like the following: In this example, the LLM successfully processes invoices in both PDF and image formats, demonstrating its ability to handle diverse inputs in a single workflow. By extracting and analyzing data across these formats, the LLM provides accurate responses and insights, showcasing the efficiency and flexibility of multimodal processing. You can adjust the queries in the Prompt Generator Snap to explore different results.
BankTanapat
2 years ago Place SnapLogic Technical Blog
1.7KViews
4likes
0Comments
Advance Prompt Engineering
This guide will cover some advanced prompt engineering techniques and how to apply them in Snaplogic GenAI App Builder to help you tackle more complex tasks and enhance overall performance. You will learn how to use system prompts, structure responses in JSON, create complex prompts, manage tokens, and consider prompt and context size. First, let’s level set on what exactly prompt engineering is and why it’s important. What is Prompt Engineering? At its core, prompt engineering is about designing the input (the “prompt”) that you give to an AI model. The way you phrase your prompt can significantly impact the quality and relevance of the model’s output. It’s not just about what you ask the AI to do, but how you ask it. Why is Prompt Engineering Important? Even the most advanced AI models rely heavily on the prompts they receive. A well-crafted prompt can lead to insightful, accurate, and highly relevant responses, while a poorly structured prompt can result in vague, inaccurate, or irrelevant answers. Understanding the nuances of prompt engineering can help you maximize the effectiveness of your AI applications. Prerequisites Basic of Snaplogic OpenAI, Azure OpenAI, Amazon Bedrock Anthropic Claude, or Google Gemini account System prompt The System prompt is a special input that defines the LLM’'s behavior, tone, and boundaries before it engages with users. It establishes the context and sets the rules for interactions, ensuring that the assistant's responses align with the desired persona and goals. Imagine you’re a travel agent assistant. Your job is to provide clients with tailored and precise travel recommendations. To do this effectively, it’s essential to establish the LLM’s behavior through the System prompt: defining the assistant’s role, setting the appropriate tone and style, and including important instructions. Drag the "OpenAI Chat Completion" "Azure OpenAI Chat Completion" "Anthropic Claude on AWS Messages", or "Google Gemini Generate" onto the canvas Select "Account" tab and select your configured account Select “Settings” tab to configure these fields Select a model: click the chat bubble icon to view the list of available models and select your preferred one. Set “Prompt” “I’m organizing a 3-day honeymoon in Japan. We’re interested in cultural experiences, and nature.” Set “System Prompt”: expand the “Advanced prompt Configuration” and enter the System prompt “You are a travel agent assistant. Provide personalized travel recommendations based on user preferences. Maintain a friendly and conversational tone. If asked about travel plan, include accommodations and restaurants.” Save and close the snap settings panel Validate the pipeline and let’s see the result by clicking the preview data output To summarize the above example System prompt: You are a travel agent assistant. Provide personalized travel recommendations based on user preferences. Maintain a friendly and conversational tone. If asked about travel plan, include accommodations and restaurants. Question: I’m organizing a 3-day honeymoon in Japan. We’re interested in cultural experiences, and nature. Answer: That's so exciting! Japan is the perfect destination for a honeymoon filled with cultural experiences and beautiful scenery. For your 3-day trip, I recommend starting in Kyoto, known for its traditional temples and gardens. You can visit iconic sites like Fushimi Inari Shrine, Arashiyama Bamboo Grove, and Kinkaku-ji Temple. Be sure to catch a traditional tea ceremony or try on a kimono for a truly immersive experience. Next, head to Hakone for a taste of Japan's stunning natural beauty. You can explore the Hakone Open-Air Museum, take a relaxing dip in an onsen (hot spring), and enjoy a scenic cruise on Lake Ashi with views of Mt. Fuji. For accommodations, I suggest staying in a traditional ryokan for an authentic Japanese experience. Ryokans offer tatami-matted rooms, futon beds, and kaiseki dinners featuring seasonal dishes. As for dining, make sure to try local specialties like sushi, ramen, and tempura. And don't miss out on trying sake, Japan's famous rice wine! Let me know if you'd like more details or assistance with booking accommodations and restaurants. Have a wonderful honeymoon in Japan! 🎎🌸🗾 The response adheres to the system prompt by providing personalized recommendations in a friendly tone, including details on accommodations and dining options. JSON response Structuring response in JSON format is a crucial technique for enhancing clarity, maintaining consistency, and ensuring seamless integration with various systems, including front-end applications, APIs, and databases. For example, if you need to present information from the above travel agent assistant example in a predefined format on a website, generate a PDF, or connect with other services that facilitate booking hotels, and activities, it is essential to format the prompt response as JSON. This approach ensures compatibility and smooth interaction across different platforms and services. Let's try modifying the system prompt from the previous example to produce output in a specific JSON format. Click the Chat Completion snap to open settings. Update the system prompt to instruct the LLM to produce the JSON response: "You are a travel agent assistant. Provide a JSON response that includes destination, trip_duration, list of activities, list of hotels (with fields for name and description), and list of restaurants(with fields for name, location, and description)." Check the “JSON mode” checkbox. The snap will output a field named json_output that contains the parsed JSON object of response. Save and close the snap settings panel. Validate the pipeline and let’s see the result. The prompt answer is the JSON string and the parsed JSON object can be found in the “json_output” field since the JSON mode is enabled. The JSON response complies with the structure specified in the system prompt, ensuring that all necessary fields are included. The structured format supports seamless integration with downstream applications. For a travel agency, this capability allows for the efficient generation of personalized itineraries, which can be utilized to populate web pages, generate PDFs or Excel documents, send emails, or directly update travel booking systems, including querying flight availability and checking hotel options. Complex prompt Using a list of messages to incorporate conversation history helps maintain context in ongoing dialogues. This approach ensures responses are relevant and coherent, improving the overall flow of the conversation. Additionally, these messages can be provided as examples of user responses to guide the model in interacting effectively. By including previous interactions, it enhances continuity and user engagement, facilitating the model's ability to handle complex, multi-turn exchanges. This technique allows the model to generate more natural and accurate responses, especially when building on earlier details, resulting in a more seamless and intuitive conversation. Moreover, they can be used for example of response to let model know how should interact with user. Each message contain a role and content. The common roles are: System: Provides the initial context, setting the tone and behavior for the LLM. User: Represents the user’s input, guiding the conversation based on their queries or commands. Assistant/Model: Contains previous responses from the LLM or examples of desired behavior. This section will guide you through the process of constructing a message list and using it as input for the LLM. We'll create the following pipeline to make a travel agent assistant be able to answer questions by leveraging the context from previous conversations. In this example, user asks about Japan's attractions in April and later inquires about the weather without specifying a location or time. Let’s create the pipeline and see how it works. Drag the "JSON Generator" snap onto the canvas. Click on the "JSON Generator" to open it, then click on the "Edit JSON" button in the main Settings tab Highlight all the text from the template and delete it. Paste in this text. This prompt will be used as the user question. { "prompt": "Can you tell me what the weather’s going to be like?" } The "JSON Generator" should now look like this Click "OK" in the lower-right corner to save the prompt Save the settings and close the snap Drag the “OpenAI Prompt Generator” or “Azure OpenAI Prompt Generator” onto the canvas. Connect the Prompt Generator to the “JSON Generator” Click on the "Prompt Generator" to open settings. Change the label to “System Prompt” Click on the "Edit prompt" to open the prompt editor Highlight all the text from the template and delete it. Paste in this text. We will use it as the system prompt. You are a travel agent assistant. Provide personalized travel recommendations based on user preferences. The prompt editor should now look like this Click "OK" in the lower-right corner to save the prompt Select the “Advanced prompt output” checkbox. The “User role” field will be populated. Set the “User role” field to “SYSTEM” The final settings of the “System Prompt” should now look like this. Save the settings and close the snap Drag the second “Prompt Generator” onto the canvas and connect it to the prior snap. This snap will handle the previous user’s questions. Follow step 9 to 17 as a guide to configure the following fields Label: User Message 1 Prompt editor: I am planning a trip to Japan in April. Can you help me find some tourist attractions? User role: USER The final settings of the “User Message 1” should be like this. Drag the third “Prompt Generator” onto the canvas and connect it to the prior snap. This snap will handle the previous LLM’s answer. Follow step 9 to 17 as a guide to configure the following fields Label: Assistant Message Prompt editor: Sure! Some tourist attractions in Japan during your trip in April are: 1. Cherry Blossom Viewing 2. Fushimi Inari Shrine 3. Hiroshima Peace Memorial Park 4. Mount Fuji 5. Gion District Let me know if you need more information or assistance with planning your trip! User role: ASSISTANT The final settings of the “Assistant Message” should be like this. Drag the fourth “Prompt Generator” onto the canvas and connect it to the prior snap. This snap will handle the user question. Follow step 9 to 17 as a guide to configure the following fields: Label: User Message 2 Prompt editor: {{prompt}} User role: USER The final settings of the “User Message 2” should be like this. Drag the “Chat Completion” onto the canvas and connect it to “User Message 2”. Click on the "Chat Completion" to open settings. Select the account in the Account tab. Select the Settings tab. Select the model name. Check “Use message payload” checkbox. The prompt generator will create a list of messages in the "messages" field. Enabling "Use message payload" is necessary to use this list of messages as input. The “Message payload” field appears. Set the value to $messages. The settings of the Chat Completion should now look like this Save and close the setting panel Validate the pipeline and let’s see the result. Click on the output view of “User Message 2” to see the message payload, which we have constructed using the advanced mode of the prompt generator snap. Click on the output view of “Chat Completion” snap to see the LLM response. The result is: In April, the weather in Japan is generally mild and pleasant with cherry blossoms in full bloom. The temperatures are typically around 10-20°C (50-68°F) and there may be occasional rain showers. It's a great time to explore outdoor attractions and enjoy the beautiful spring scenery. Make sure to pack layers and an umbrella just in case! The model effectively delivered weather information for Japan in April, even the last user query did not specify location or time. This is possible because the model uses the entire conversation history to understand the context and flow of the dialogue. Furthermore, the model echoed the user’s question before responding, maintaining a consistent conversational style. To achieve the best results, make sure your message list is complete and well-organized, as this will help the LLM generate more relevant and coherent responses, enhancing the quality of the interaction. Tokens Tokens are units of text, including words, character sets, or combinations of words and punctuation, that language models use to process and generate language. They can range from single characters or punctuation marks to entire words or parts of words, depending on the model. For instance, the word "artificial" might be split into tokens like "art", "ifi", and "cial". The total number of tokens in a prompt affects the model's response capability. Each model has a maximum token limit, which includes both the input and output. For instance, GPT-3.5-Turbo has a limit of 4,096 tokens, while GPT-4 has limits of 8,192 tokens and 32,768 tokens for the 32k context version. Effective token management ensures responses remain within these limits, improving efficiency, reducing costs, and enhancing accuracy. To manage token usage effectively, the maximum tokens parameter is essential. It sets a limit on the number of tokens the model can generate, ensuring the combined total of input and output stays within the model’s capacity. Setting a maximum tokens parameter has several benefits: it prevents responses from becoming excessively long, reduces response times by generating more concise outputs, optimizes performance, and minimizes costs by controlling token usage. Additionally, it enhances user experience by providing clear, focused, and quicker responses. Use case Examples: Customer Support Chatbots: By setting maximum tokens, you ensure that the chatbot's responses are brief and focused, providing quick, relevant answers to user inquiries without overwhelming them with excessive detail. This enhances user experience and keeps interactions efficient. Content summarization: Helps generate concise summaries of long texts, suitable for applications with space constraints, such as mobile apps or notifications. Interactive Storytelling: Controls the length of narrative segments or dialogue options, maintaining engaging and well-paced storytelling. Product Descriptions: Generate brief and effective product descriptions for e-commerce platforms, maintaining relevance and fitting within space constraints. Let's walk through how to configure the maximum tokens in the SnapLogic Chat Completion snap using the prompt: “Describe the Photosynthesis in simple terms.”. We’ll see how the LLM behaves with and without the maximum token setting. Drag the “OpenAI Chat Completion” or “Azure OpenAI Chat Completion”, or “Google Gemini Generate” onto the canvas Select “Account” tab and select your configured account Select “Settings” tab Select your preferred model to use Set prompt to the message “Describe the Photosynthesis in simple terms.” The Chat Completion settings should now look like this Save the snap settings and validate the pipeline to see the result. In the result, the “usage” field provide us the token consumption detail. prompt_tokens: tokens used by the input completion_tokens: tokens used for generating response. total_tokens: the combined number of tokens used for both the input prompt and the generated response We can see that the response is quite long and the token used for response (completion_tokens) is 241. Let’s set the maximum token and see the result again Expand the “Model parameters” Set “Maximum tokens” to 100 Save the snap settings and validate the pipeline to see the result. The result is more concise compared to the output when the maximum tokens are not set. In this case, the number of completion_tokens used is only 84, indicating a shorter and more focused response. Using the maximum tokens effectively ensures that responses are concise and relevant, optimizing both performance and cost-efficiency. By setting this limit, you can prevent excessively long outputs, reduce response times, and maintain clarity in the generated content. To achieve optimal results, align the maximum tokens setting with your specific needs, such as the desired response length and application requirements. Regularly review and adjust this parameter to balance brevity with completeness, ensuring that the outputs remain useful and within operational constraints. Prompt size considerations In the previous section, we covered techniques for managing response size to stay within token limits. Now, we turn our focus to prompt size and context considerations. By ensuring that both prompts and context are appropriately sized, you can improve the accuracy and relevance of model responses while staying within token limits. Here are some techniques for managing prompt and context size: Keep prompt clear and concise By making prompts clear and direct, you reduce token usage, which helps keep the prompt within the model's limits. Focusing on essential information and removing unnecessary words enhances the accuracy and relevance of the model's responses. Additionally, specifying the desired output length further optimizes the interaction, preventing excessively long responses and improving overall efficiency. Example Prompt: “Could you please provide a detailed explanation of how the process of photosynthesis works in plants, including the roles of chlorophyll, sunlight, and water?” Better prompt: "Explain the process of photosynthesis in plants, including the roles of chlorophyll, sunlight, and water, in about 50 words." Splitting complex tasks into simpler prompts Breaking down complex tasks into smaller, more manageable subtasks not only reduces the size of each individual prompt but also enables the model to process each part more efficiently. This approach ensures that each prompt stays within token limits, resulting in clearer and more accurate responses. Example Complex Task: "Write a detailed report on the economic impact of climate change in developing countries, including statistical analysis, case studies, and policy recommendations." Simplified Prompts: "Summarize the economic impact of climate change in developing countries." "Provide a statistical analysis of how climate change affects agriculture in developing countries." "List case studies that demonstrate the economic consequences of climate change in developing countries." "Suggest policy recommendations for mitigating the economic impact of climate change in developing countries." Use a sliding window for chat history From the complex prompt section, we know that including the entire chat history helps maintain context, but it can also quickly use up available tokens. To optimize prompt size, employ a sliding window approach. This technique involves including only a portion of the chat history, focusing on recent and relevant exchanges, to keep the prompt within token limits. Summarize contexts Use a summarization technique to condense context into a brief summary. Instead of including extensive conversation history, create a concise summary that captures the essential information. This approach reduces token usage while retaining key details for generating accurate responses. By applying these techniques, you can effectively manage prompt and context size, ensuring that interactions remain efficient and relevant while optimizing token usage.
ChompooPanida
2 years ago Place SnapLogic Technical Blog
1.7KViews
7likes
0Comments
A Comprehensive Guide to Integrate Azure AI Search to Azure OpenAI
The Retrieval-Augmented Generation (RAG) pipeline has gained significant traction in recent years. Large Language Models (LLMs) leverage domain-specific knowledge through the RAG mechanism to generate insightful and contextually relevant responses. Implementing a RAG pipeline requires a well-architected infrastructure, including vector databases and a data ingestion pipeline to efficiently transfer information from source systems to the database. Azure offers Azure AI Search, a fully managed RAG solution that simplifies implementation and reduces operational complexity. As an enterprise-grade information retrieval system, Azure AI Search processes heterogeneous content, indexes data for optimized retrieval, and delivers relevant information through queries and applications, and it is the recommended retrieval system for developing RAG-based applications on Azure. It features native LLM integrations with Azure OpenAI Service and Azure Machine Learning, supports custom model integration mechanisms, and offers multiple relevance-tuning strategies to enhance search effectiveness. To further streamline RAG implementation, SnapLogic facilitates seamless integration between Azure AI Search and Azure OpenAI, enabling organizations to build intelligent and efficient AI-powered applications. The following guideline outlines the steps required to achieve this integration. Basic of Using Azure AI Search The following steps provide a structured approach to setting up and utilizing Azure AI Search for indexing and querying data. Step 1: Set Up Azure AI Search Subscribe to Azure AI Search via the Azure portal. Import Data: Click "Import Data" and select the data source you want to integrate. Create an Index: Define an index on the data source. Ensure the field you want Azure AI Search to process has the searchable attribute enabled. Configure the Indexer: Complete the setup wizard to create an indexer. Once the indexer is created, your Azure AI Search instance is ready to use. Step 2: Configure the Azure Data Source AI Snap To enable seamless integration between Azure AI Search and Azure OpenAI, follow these steps to configure the Azure Data Source AI Snap in Snaplogic Designer: Configure the Snap Account Set up the Azure Data Source AI Snap by configuring the associated account. Provide the AI Search Endpoint Enter the Azure AI Search endpoint to the snap settings. Specify the Index Name Define the index name that will be used for searching. Field Mapping (Optional) An array of values. A comma-separated string. Users can optionally provide a field mapping configuration to define relationships between different fields. Specify which field represents the title and which field contains the content. The content field can be either: The title field must be a string. Set the Query Type (Optional) The default query type is set to "simple". Users can modify this setting as needed or retain the default value. Connect to Azure Chat Completion Once configured, connect the Data Source AI Search Snap to Azure ChatCompletion to enable intelligent query responses using Azure OpenAI. Step 3: Configure Azure OpenAI Chat Completion To complete the integration and enable AI-powered responses, follow these steps to configure Azure OpenAI Chat Completion: Configure the Data Source Field Use the output of the Azure Data Source AI Search Snap as the input for Azure OpenAI Chat Completion. Provide the Prompt Define the prompt that you want to use for querying the AI model. The prompt should be designed to leverage the retrieved data effectively. Execute the Pipeline Run the pipeline to process the query. The ChatCompletion Snap will generate responses based on Azure AI Search results. The output will include a "citations" field, indicating the source of the retrieved information. Using Vector Queries in Azure AI Search Step 1: Set Up Azure AI Search Subscribe to Azure AI Search Access the Azure Portal and create an Azure AI Search service. Import and Vectorize Data Click "Import Data and Vectorize Data" and select the data source to be integrated. Embed Data into Vectors (if applicable) To enable vector search, data must be converted into vector embeddings using an embedding model. If your dataset already contains vectorized data, you can integrate it directly without re-vectorizing. Verify Index Creation After completing the index setup, the vector field will be visible in the index schema. Step 2: Configure the Azure Data Source AI Snap To enable seamless integration between Azure AI Search and Azure OpenAI, configure the Azure Data Source AI Snap by following these steps: Configure the Snap Account Set up the Azure Data Source AI Snap by configuring the associated SnapLogic account. Provide the Azure AI Search Endpoint Enter the Azure AI Search endpoint to establish the connection. Specify the Index Name Define the index name that will be used for vector-based searching. Configure Field Mapping (Optional) Users can define field mappings to specify relationships between different fields. Assign a title field and a content field: The content field can be: A list (array) of values. A comma-separated string. The title field must be a string. For vector-based queries, specify the vector field to inform Azure AI Search which field to use for vector comparisons. The vector field can be: A string. A list of strings. A comma-separated string. Set the Query Type Specify the query type as "vector" to enable vector-based searches. Connect to Azure Chat Completion Once configured, connect the Azure Data Source AI Search Snap to Azure ChatCompletion to enable AI-powered responses using Azure OpenAI. Step 3: Configure Azure OpenAI Chat Completion To complete the integration and enable AI-powered responses, follow these steps to configure Azure OpenAI Chat Completion: Configure the Data Source Field Use the output of the Azure Data Source AI Search Snap as the input for Azure OpenAI Chat Completion. Set Up the Embedding Model Dependency Ensure that the same embedding model used to vectorize the data is referenced in Azure OpenAI. This step is crucial for accurate vector similarity comparisons and retrieval performance. Provide the Prompt Define the prompt that will be used for querying the AI model. Ensure the prompt is structured to effectively leverage retrieved vector-based data for optimal AI responses. Using Semantic Queries in Azure AI Search Step 1: Set Up Azure AI Search Access Azure AI Search Navigate to the Azure AI Search service in the Azure Portal. Select the Index Choose the index you want to use for semantic search. Create a Semantic Configuration Define a new semantic configuration for the selected index. Configure Semantic Fields Specify the required fields: Title Field – Represents the document title. Content Field – Contains the main body of the document. Keywords Field – Includes key terms for enhanced semantic matching. Save the Configuration Once all fields are assigned, save the configuration. Your index is now ready for semantic search. Step 2: Configure the Azure Data Source AI Search Snap Change the Query Type Set the query type to "semantic" to enable semantic search capabilities. Specify the Semantic Configuration Enter the semantic configuration name created in Azure AI Search. Connect to Azure OpenAI Chat Completion Link the Azure Data Source AI Search Snap to Azure OpenAI ChatCompletion. This integration allows semantic search to enhance the accuracy and relevance of AI-generated responses. Customizing Search Results in Azure AI Search To further refine and enhance search accuracy and relevance, Azure AI Search allows users to customize their search queries with hybrid query types and filters. Hybrid Query Types Azure AI Search supports hybrid search, which allows combining different query types to improve search results: Hybrid of Vector and Simple Queries This combines vector-based similarity with traditional keyword-based search, ensuring both semantic relevance and text-based keyword matching. Hybrid of Vector and Semantic Queries This approach enhances vector similarity search with semantic ranking, enabling context-aware results with better relevance scoring. To enable hybrid search: Set the query type to either: "vector_simple_hybrid" "vector_semantic_hybrid" This ensures search results are a blend of the two selected query types. Applying Search Filters Filters help narrow down search results to match specific conditions or constraints. Steps to Apply Filters: Define a Filter Condition Use filters to restrict results based on specific criteria, such as date ranges, categories, or custom attributes. Please refer the filter syntax to Azure AI Search document (https://learn.microsoft.com/en-us/azure/search/search-filters) Ensure Index Fields are Filterable Filters only work if the index fields have the filterable attribute enabled. Before applying filters, verify that the selected index supports filtering. Integrate the Filter in Your Query Apply custom filters to refine search results based on your requirements. Conclusion Integrating Azure AI Search with Azure OpenAI unlocks powerful capabilities for retrieval-augmented generation (RAG), enabling organizations to build intelligent, AI-powered applications with enhanced search functionality. By leveraging vector, semantic, and hybrid search queries, businesses can optimize information retrieval and improve the relevance of AI-generated responses. This guide has outlined the key steps to: Set up Azure AI Search, including configuring vector and semantic search. Integrate Azure Data Source AI Search Snap, enabling seamless data retrieval. Configure Azure OpenAI Chat Completion, ensuring AI-generated responses are contextually aware and accurate. Customize search results using hybrid search queries and filtering mechanisms to refine and enhance query outcomes. By following these steps, organizations can maximize the effectiveness of Azure AI Search and OpenAI, improving search relevance, accuracy, and AI-driven insights for a wide range of applications. With scalability, flexibility, and advanced AI integration, this solution is ideal for businesses looking to deploy cutting-edge enterprise search and AI-driven automation.
BankTanapat
9 months ago Place SnapLogic Technical Blog
1.6KViews
1like
0Comments
LLM response logging for analytics
Why do we need LLM Observability? GenAI applications are great, they answer like how a human does. But how do you know if GPT isn’t being “too creative” to you when results from the LLM shows “Company finances are facing issues due to insufficient sun coverage”? As the scope of GenAI apps broaden, the vulnerability expands, and since LLM outputs are non-deterministic, a setup that once worked isn’t guaranteed to always work. Here’s an example of comparing the reasons why an LLM prompt fails vs why a RAG application fails. What could go wrong in the configuration? LLM prompts Suboptimal model parameters Temperature too high / tokens too small Uninformative System prompts RAG Indexing The data wasn’t chunked with the right size, information is sparse yet the window is small. Wrong distance was used. Used Euclidean distance instead of cosine Dimension was too small / too large Retrieval Top K too big, too much irrelevant context fetched Top K too small, not enough relevant context to generate result Filter misused And everything in LLM Prompts Although observability does not magically solve all problems, it gives us a good chance to figure out what might have gone wrong. LLM Observability provides methodologies to help developers better understand LLM applications, model performances, biases, and can help resolve issues before they reach the end users. What are common issues and how observability helps? Observability helps understanding in many ways, from performance bottlenecks to error detection, security and debugging. Here’s a list of common questions we might ask ourselves and how observability may come in handy. How long does it take to generate an answer? Monitor LLM response times and database query times helps identify potential bottlenecks of the application. Is the context retrieved from the Vector Database relevant? Logging database query and results retrieved helps identify better performing queries. Can assist on chunk size configuration based on retrieved results. How many tokens are used in a call? Monitor token usage can help determine the cost of each LLM call. How much better/worse is my new configuration setup doing? Parameter monitoring and response logging helps compare the performance of different models and model configurations. How is the GenAI application performing overall? Tracing stages of the application and evaluation helps identify the performance of the application What are users asking? Logging and analyzing user prompts help understand user needs and can help evaluate if optimizations can be introduced to reduce costs. Helps identify security vulnerabilities by monitoring malicious attempts and help proactively respond to mitigate threats. What should be tracked? GenAI applications involve components chained together. Depending on the use case, there are events and input/output parameters that we want to capture and analyze. A list of components to consider: Vector Database metadata Vector dimension: The vector dimension used to in the vector database Distance function: The way two vectors are compared in the vector database Vector Indexing parameters Chunk configuration: How a chunk is configured, including the size of the chunk, the unit of chunks, etc. This affects information density in a chunk. Vector Query parameters Query: The query used to retrieve context from the Vector Database Top K: The maximum number of vectors to retrieve from the Vector Database Prompt templates System prompt: The prompt to be used throughout the application Prompt Template: The template used to construct a prompt. Prompts work differently in different models and LLM providers LLM request metadata Prompt: The input sent to the LLM model from each end-user, combined with the template Model name: The LLM model used for generation, which affects the capability of the application Tokens: The number of tokens limit for a single request Temperature: The parameter for setting the creativity and randomness of the model Top P: The range of selection of words, the smaller the value the narrower the word selection is sampled from. LLM response metadata Tokens: The number of tokens used in input and output generation, affects costs Request details: May include information such as guardrails, id of the request, etc. Execution Metrics Execution time: Time taken to process individual requests Pipeline examples Logging a Chat completions pipeline We're using MongoDB to store model parameters and LLM responses as JSON documents for easy processing. Logging a RAG pipeline In this case, we're storing parameters to the RAG system (Agent Retrieve in this case) and the model. We're using JSON Generator Snaps to parameterize all input parameters to the RAG system and the LLM models. We then concat the response from the Vector Database, LLM model, and the parameters we provided for the requests.
tfan
2 years ago Place SnapLogic Technical Blog
1.3KViews
3likes
1Comment
What is Retrieval-Augmented Generation (RAG)?
What is Retrieval-Augmented Generation (RAG)? Retrieval-Augmented Generation (RAG) is the process of enhancing the reference data used by language models (LLMs) through integrating them with traditional information retrieval systems. This hybrid approach allows LLMs to access and utilize external knowledge bases, databases, and other authoritative sources of information, thereby improving the accuracy, relevance, and currency of the generated responses without requiring extensive retraining. Without RAG, LLMs generate responses based on the information they were trained on. With RAG, the response generation process is enriched by integrating external information into the generation. How does Retrieval-Augmented Generation work? Retrieval-Augmented Generation works through bringing multiple systems or services to generate the prompt to the LLM. This means there will be required setup to support the different systems and services to feed the appropriate data for a RAG workflow. This involves several key steps: 1. External Data Source Creation: External data refers to information outside the original training data of the LLM. This data can come from a variety of sources such as APIs, databases, document repositories, and web pages. The data is pre-processed and converted into numerical representations (embeddings) using embedding models, and then stored in a searchable vector database along with reference to the data that was used to generate the embedding. This forms a knowledge library that can be used to augment a prompt when calling into the LLM for generation of a response to a given input. 2. Retrieval of Relevant Information: When a user inputs a query, it is embedded into a vector representation and matched against the entries in the vector database. The vector database retrieves the most relevant documents or data based on semantic similarity. For example, a query about company leave policies would retrieve both the general leave policy document and the specific role leave policies. 3. Augmentation of LLM Prompt: The retrieved information is then integrated into the prompt to send to the LLM using prompt engineering techniques. This fully formed prompt is sent to the LLM, providing additional context and relevant data that enables the model to generate more accurate and contextually appropriate responses. 4. Generation of Response: The LLM processes the augmented prompt and generates a response that is coherent, contextually appropriate, and enriched with accurate, up-to-date information. The following diagram illustrates the flow of data when using RAG with LLMs. Why use Retrieval-Augmented Generation? RAG addresses several inherent challenges of using LLMs by leveraging external data sources: 1. Enhanced Accuracy and Relevance: By accessing up-to-date and authoritative information, RAG ensures that the generated responses are accurate, specific, and relevant to the user's query. This is particularly important for applications requiring precise and current information, such as specific company details, release dates and release items, new features available for a product, individual product details, etc.. 2. Cost-Effective Implementation: RAG enables organizations to enhance the performance of LLMs without the need for expensive and time-consuming fine-tuning or custom model training. By incorporating external knowledge libraries, RAG provides a more efficient way to update and expand the model's basis of knowledge. 3. Improved User Trust: With RAG, responses can include citations or references to the original sources of information, increasing transparency and trust. Users can verify the source of the information, which enhances the credibility and trust of an AI system. 4. Greater Developer Control: Developers can easily update and manage the external knowledge sources used by the LLM, allowing for flexible adaptation to changing requirements or specific domain needs. This control includes the ability to restrict sensitive information retrieval and ensure the correctness of generated responses. Doing this in conjunction with an evaluation framework (link to evaluation pipeline article) can help to roll out newer content more rapidly to downstream consumers. Snaplogic GenAI App Builder: Building RAG with Ease Snaplogic GenAI App Builder empowers business users to create large language model (LLM) powered solutions without requiring any coding skills. This tool provides the fastest path to developing generative enterprise applications by leveraging services from industry leaders such as OpenAI, Azure OpenAI, Amazon Bedrock, Anthropic Claude on AWS, and Google Gemini. Users can effortlessly create LLM applications and workflows using this robust platform. With Snaplogic GenAI App Builder, you can construct both an indexing pipeline and a Retrieval-Augmented Generation (RAG) pipeline with minimal effort. Indexing Pipeline This pipeline is designed to store the contents of a PDF file into a knowledge library, making the content readily accessible for future use. Snaps used: File Reader, PDF Parser, Chunker, Amazon Titan Embedder, Mapper, OpenSearch Upsert. After running this pipeline, we would be able to view these vectors in OpenSearch. RAG Pipeline This pipeline enables the creation of a chatbot capable of answering questions based on the information stored in the knowledge library. Snap used: HTTP Router, Amazon Titan Embedder, Mapper, OpenSearch Query, Amazon Bedrock Prompt Generator, Anthropic Claude on AWS Messages. To implement these pipelines, the solution utilizes the Amazon Bedrock Snap Pack and the OpenSearch Snap Pack. However, users have the flexibility to employ other LLM and vector database Snaps to achieve similar functionality.
Shumin
2 years ago Place SnapLogic Technical Blog
1.3KViews
4likes
0Comments