SnapLogic - Integration Nation

RogerSramkoski · ‎02-01-2024

👋 Welcome!

Hello everyone, and welcome to our technical guide to getting started with GenAI App Builder on SnapLogic! At the time of publishing, GenAI App Builder is available for testing and will be generally available in our February release. For existing customers and partners, you can request access for testing GenAI App Builder by speaking to your Customer Success Manager or other member of your account team. If you're not yet a customer, you can speak to your Sales team about testing GenAI App Builder.

🤔 What is GenAI App Builder?

Before we begin, let's take a moment to understand what GenAI App Builder is and at least a high-level talk about the components. GenAI App Builder is the latest offering in SnapLogic AI portfolio, focused on helping modern enterprises create applications with Generative AI faster, using a low-/no-code interface. That feels like a mouthful of buzzwords, so let me paint a picture (skip this if you're familiar with GenAI or watch our video, "Enabling employee and customer self-service").

Imagine yourself as a member of an HR team responsible for recruiting year round. Every new employee has an enrollment period after or just before their start date, and every existing employee has open enrollment once per year. During this time, employees need to choose between different medical insurance offerings, which usually involves a comparison of deductibles, networks, max-out-of-pocket, and other related features and limits. As you're thinking about all of this material, sorting out how to explain it all to your employees, you're interrupted by your Slack or Teams DM noise. Bing bong!

Questions start flooding in:

Hi, I'm a new employee and I'm wondering, when do I get paid? What happens payday is on a weekend or holiday? Speaking of holidays, what are company-recognized holidays this year?
Hi, my financial account said I should change my insurance plan to one with an HSA. Can you help me figure out which plan(s) include an HSA and confirm the maximum contribution limits for a family this year?
Hi, how does vacation accrual work? When does vacation rollover? Is unpaid vacation paid out or lost?

All these questions and many others are answered in documents the HR team manages, including the employee handbook, insurance comparison charts, disability insurance sheets, life insurance sheets, other data sheets, etc. What if, instead of you having to answer all these questions, you would leverage a human-sounding large language model (LLM) to field these questions for you by making sure they referenced only the source documents you provide, so you don't have to worry about hallucinations?!

Enter GenAI Builder!

🏗 Building an HR Q&A example

Once you have access to test GenAI App Builder, you can use the following steps to start building out an HR Q&A example that will answer questions using only the employee handbook or whichever document that you provide. In this guide we will cover the two pipelines used, one that loads data and one that we will use to answer questions. We will not get into Snap customization or Snap details with this guide - it is just meant to show a quick use case.

We do assume that you are familiar enough with SnapLogic to create a new Pipeline or import and existing one, search for Snaps, connect Snaps, and a few other simple steps. We will walk you through anything that is new to SnapLogic or that needs some additional context. We also assume you have some familiarity with Generative AI in this guide.

We will also make a video with similar content in the near future, so I'll update or reply to this post once that content is available.

Prerequisites

In order to complete this guide, you will need the items below regardless of whether or not you use the Community-supported chatbot UI from SnapLogic.

Access to a Pinecone instance (sign up for a free account at https://www.pinecone.io) with an existing index
Access to Azure OpenAI or OpenAI
You need a file to load, such as your company's employee handbook

Loading data

Our first step is to load data into the vector database using a Pipeline similar to the one below, which we will call the "Indexer" Pipeline since it helps populate the Pinecone Index. If you cannot find the patterns in the Pattern Library, you can find it attached below as "Indexer_Feb2024.slp". The steps below assume you have already imported the Pipeline or are building it as we go through.

To add more color here, loading data into the vector database is only something that needs to be done when the files are updated. In the HR scenario, this might be once a year for open enrollment documents and maybe a few times a year for the employee handbook. We will explore some other use cases in the future where document updates would be much frequent.

Click on the "File Reader" Snap to open its settings
Click on the icon at the far right of the "File" field as shown in the screenshot below
Click the "Upload" button in the upper-right corner of the window that pops up
Select the PDF file from your local system that you want to index (we are using an employee handbook and you're welcome to do the same) to upload it, then make sure it is selected
Save and close the "File Reader" Snap once your file is selected
Leave the "PDF Parser" Snap with default settings
Click on the "Chunker" Snap to open it, then mirror the settings in the screenshot below.
Now open the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap (you may need to replace the embedder that came in the Pattern or import with the appropriate one you have an account with).
Go to the "Account" tab and create a new account for the embedder you're using. You need to replace the variables {YOUR_ACCOUNT_LABEL} with a label for the account that makes sense for you, then replace {YOUR_ENDPOINT} with the appropriate snippet from your Azure OpenAI endpoint. Validate the account if you can to make sure it works.
After you save your new account you can go back to the main "Settings" tab on the Snap. If the account setup was successful, you should now be able to click the chat bubble icon at the far right of the "Deployment ID" field to suggest a "Deployment ID" - in our environment shown in the screenshot below, you can see we have one named "Jump-emb-ada-002" which I can now select.
Finally, make sure the "Text to embed" field is set as shown below, then save and close this Snap.
Now open the "Mapper" Snap so we can map the output of the embedder Snap to the "Pinecone Upsert" Snap as shown in the screenshot below.

If it is difficult to see the mappings in the screenshot above, here is a zoomed in version:

For a little more context here, we're mapping the $embedding object coming out of the embedder Snap to the $values object in Pinecone, which is required. If that was all you mapped though, your Q&A example would always reply with something like "I don't know" because there is no data. To do that, we need to make use of the very flexible "metadata" object in Pinecone by mapping $original.chunk to $metadata.chunk. We also statically set $metadata.source to "Employee Handbook.pdf" which allows the retriever Pipeline to return the source file used in answering a question (in a real-world scenario, you would probably determine the source dynamically/programmatically such as using the filename so this pipeline could load other files too).
Save and close the "Mapper" Snap
Finally, open the "Pinecone Upsert" Snap then click the "Account" tab and create a new account with your Pinecone API Key and validate it to make sure it works before saving
Back on the main "Settings" tab of the "Pinecone Upsert" Snap, you can now click on the chat bubble icon to suggest existing indexes in Pinecone. For example, in our screenshot below you can see we have four which have been obscured and one named "se-demo." Indexes cannot be created on the fly, so you will have to make sure the index is created in the Pinecone web interface.
The last setting we'll talk about for the Indexer pipeline is the "Namespace" field in the "Pinecone Upsert" Snap. Setting a namespace is optional. Namespaces in Pinecone create a logical separation between vectors within an index and can be created on-the-fly during Pipeline execution.

For example, you could create an index like "2024_enrollment" for all documents published in 2024 for open enrollment and another called "2024_employeehandbook" to separate those documents into separate namespaces. Although these can be used just for internal purposes of organization, you can also direct a chatbot to only use one namespace to answer questions. We'll talk about this more in the "Answering Questions" section below which covers the Retriever Pipeline.
Save and close the "Pinecone Upsert" Snap
You should now be able to validate the entire Pipeline to see what the data looks like as it flows through the Snaps, and when you're ready to commit the data to Pinecone, you can Execute the Pipeline.

Answering Questions

To answer questions using the data we just loaded into Pinecone, we're going to recreate or import the Retriever Pipeline (attached as "Retriever_Feb2024.slp"). If you import the Pipeline you may need to add additional "Mapper" Snaps as shown below. We will walk through that in the steps below, just know this is what we'll end up with at the end of our first article.

The screenshot above shows what the pattern will look like when you import it. Since this first part of the series will only take us up to the point of testing in SnapLogic, our first few steps will involve some changes with that in mind.

Right-click on the "HTTP Router" Snap, click "Disable Snap"
Click the circle between "HTTP Router" and embedder Snap to disconnect them
Drag the "HTTP Router" Snap somewhere out of the way on the canvas (you can also delete it if you're comfortable replacing it later); your Pipeline should now look like this:
In the asset palette on the left, search for the "JSON Generator" (it should appear before you finish typing that all out):
Drag a "JSON Generator" onto the canvas, connecting it to the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap
Click on the "JSON Generator" to open it, then click on the "Edit JSON" button in the main Settings tab
Highlight all the text from the template and delete it so we have a clean slate to work with
Paste in this text, replacing "Your question here." with an actual question you want to ask that can be answered from the document you loaded with your Indexer Pipeline. For example, I loaded an employee handbook and I will ask the question, "When do I get paid?"
```
[
    {
        "prompt" : "Your question here."
    }
]
```
Your "JSON Generator" should now look something like this but with your question:
Click "OK" in the lower-right corner to save the prompt
Click no the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap to view its settings
Click on the Account tab, then use the drop-down box to select the account you created in the section above ("Loading Data", steps 8-9)
Click on the chat bubble icon to suggest "Deployment IDs" and choose the same one you chose in "Loading Data", step 10
Set the "Text to embed" field to $prompt as shown in the screenshot below:
Save and close the "Azure OpenAI Embedder" or "OpenAI Embedder" Snap
Click on the Mapper immediately after the embedder Snap
Create a mapping for $embedding that maps to $vector
Check the "Pass through" box; this Mapper Snap should now look like this:
Save and close this "Mapper"
Open the "Pinecone Query" Snap
Click the Account tab, then use the drop-down to select the Pinecone account you created in "Loading Data", step 14
Use the chat bubble on the right side of the "Index name" field to select your existing Index
[OPTIONAL] Use the chat bubble on the right side of the "Namespace" field to select your existing Namespace, if you created one; the "Pinecone Query" Snap should now look like this:
Save and close the "Pinecone Query" Snap.
Click on the "Mapper" Snap after the "Pinecone Query" Snap.
In this "Mapper" we need to map the three items listed below, which are also shown in the following screenshot. If you're not familiar with the $original JSON key, it occurs when an upstream Snap has implicit pass through, or like the "Mapper" in step 17, we explicitly enable pass through, allowing us to access the original JSON document that went into the upstream Snap. (NOTE: If you're validating your pipeline along the way or making use of our Dynamic Validation, you may notice that no Target Schema shows up in this Mapper until after you complete steps 27-30.)
Map $original.original.prompt to $prompt
Map jsonPath($, "$matches[*].metadata.chunk") to jsonPath($, "$context[*].data")
Map jsonPath($, "$matches[*].metadata.source") to jsonPath($, "$context[*].source")
Save and close that "Mapper".
Click on the "Azure OpenAI Prompt Generator" or "OpenAI Prompt Generator" so we can set our prompt.
Click on the "Edit prompt" button and make sure your default prompt looks like the screenshot below. On lines 4-6 you can see we are using mustache templating like {{#context}} {{source}} {{/context}} which is the same as the jsonPath($, "$context[*].source") from the "Mapper" in step 25 above. We'll talk about this more in future articles - for now, just know this will be a way for you customize the prompt and data included in the future.
Click "OK" in the lower-right corner
Save and close the prompt generator Snap
Click on the "Azure OpenAI Chat Completions" or "OpenAI Chat Completions" Snap
Click the "Account" tab then use the drop-down box to select the account you created earlier
Click the chat bubble icon to the far right of the "Deployment ID" field to suggest a deployment; this ID may be different than the one you've chosen in previous "Azure OpenAI" or "OpenAI" Snaps since we're selecting an LLM this team instead of an embedding model
Set the "Prompt" field to $prompt; your Snap should look something like this:
Save and close the chat completions Snap

Testing our example

Now it's time to validate our pipeline and take a look at the output! Once validated the Pipeline should look something like this:

If you click the preview data output on the last Snap, the chat completions Snap, you should see output that looks like this:

The answer to our prompt is under $choices[0].message.content. For the test above, I asked the question "When do I get paid?" against an employee handbook and the answer was this:

Employees are paid on a semi-monthly basis (24 pay periods per year), with payday on the 15th and the last day of the month. If a regular payday falls on a Company-recognized holiday or on a weekend, paychecks will be distributed the preceding business day.
The related context is retrieved from the following sources: [Employee Handbook.pdf]

Wrapping up

Stay tuned for further articles in the "GenAI App Builder Getting Started Series" for more use cases, closer looks at individual Snaps and their settings, and even how to connect a chat interface! Most if not all of these articles will also have an associated video if you learn better that way!

If you have issues with the setup, find a missing step or detail, please reply to this thread to let us know!

RogerSramkoski · ‎05-10-2024

Hi everyone, GenAI Builder is now generally available in our Elastic environment. New features and functionality will continue to be available in SnapLabs first for public preview before making their way into Elastic, our production environment. Please reach out if you have any questions!