Hi Sharan, If your goal is simply to return a specific PDF page to the user, one option would be to physically split the document and store individual pages. However, this introduces additional complexity around document lifecycle management (updates, re-ingestion, monitoring, etc.) and is generally not necessary for your use case. Based on your description, the real requirement is to return the relevant page of an existing document that corresponds to a retrieved vector, not to ingest each page as a standalone document. For that scenario, your current approach is already sufficient. By storing the sourceURL and pageNumber as metadata during ingestion, you already have everything needed to present the correct page back to the user. Since you surfaces RAG responses via a Microsoft Teams app, a practical frontend solution is to render the PDF using something like PDF.js, passing the document URL and page number dynamically. For example:
iframe
src="/pdfjs/web/viewer.html?file=/docs/manual.pdf#page=5"
width="100%"
height="100%"
style="border:none">
</iframe>Both the document URL and the page number can be parameterized directly from your vector metadata. This allows the user to open the PDF at the exact page referenced by the answer, while still being able to scroll to adjacent pages for additional context—reducing the need for follow-up queries. In summary: your RAG ingestion setup is already sound. There’s no need to ingest individual PDF pages as separate entities. The remaining work is primarily on the frontend: using the existing metadata to present the correct page to the user in a seamless and verifiable way. Hope this helps.
Hi Integration Nation, If you are interested in Agent Creator and how you can build a scalable AI Knowledge Assistant, please check this blog post on LinkedIn out. Here, I've shared the challenges we faced while working with a major German enterprise customer to build an AI Knowledge Assistant. You will learn why Agents with many tools are getting harder to control and how we were able to improve quality performance and costs by switching to a Knowledge Graph as an information source from relational data. If you have a similar use case or need some guidance happy to exchange ideas:)
