Migration / Upgrade of RAG chatbot to Agents SDK using ResponsesAPI

kavitatipnis · June 19, 2025, 12:49am

I am in the middle of a poc to upgrade our v-1 rag chatbot using chatCompletions API to AgentsSDK with access to tools - vector search etc, however, I am wondering if there are tools to port the existing langchain implementation over that generates citations as well. citations are a huge part of our chatbot and I do not see an easy migration path. Further how do I keep my existing langchain implementation in the AgentsSDK ? Our v1 is live and in prod so the v2 has to have the same functionality at minimum. Appreciate any pointers from folks who are in the middle of an upgrade to the newer AgentsSDK for RAG chatbots.

lucmachine · June 19, 2025, 2:23am

Hi there, thanks for sharing this. I’m not using LangChain or the OpenAI Agents SDK in my RAG system. Instead, I built a fully custom stack. I describe the approach here:
Building first RAG system - #3 by lucmachine

I chose not to use LangChain or the SDK because:

I wanted full control over chunking, embedding, and metadata handling
My documents are complex (GHG protocols, legal texts), so I built regex-aware semantic chunkers
LangChain and the SDK add layers of abstraction that didn’t match my transparency and citation needs

I think that a solid chunking strategy is needed to get a better ChatBot. Some PDFs are really bad source documents. I’m also playing around with the idea of add TAGs to my chunks by having the LLM propose them.

Tags = “what is this about?”
Citation = “where did this come from?”

That said, I don’t have direct experience with the SDK upgrade path you’re on, but I do have strong prompting experience and wanted to offer a concept that might help.

Start of LLM Transmission: →

Suggestion: Bridge LangChain-style Citations into Agents SDK

If citations are central to your RAG chatbot, here’s a potential solution path:

1. Extract and Store Rich Metadata Per Chunk

If not already, include the following metadata in your vector database (e.g., Pinecone, pgvector):
doc_title, section_heading, chunk_index, source_url, and so on.

2. Add Metadata Into Retrieved Context

When your retrieval tool (whether from LangChain or SDK Tools API) returns relevant chunks, inject them into the context with metadata visible. For example:

[Source: NAME_OF_SOURCE_DOCUMENT.pdf | Section 3.2 | Page 12]  
Carbon credits must be tracked independently to avoid double counting...

3. Prompt the Model Explicitly to Preserve Citations

Add prompt instructions like:

“For each factual answer, include the citation of the document and section. Use the format:
[Source: {{doc_title}}, Section: {{section_heading}}, Page: {{page}}]”

This primes the LLM to include citations in the final response.

4. Middleware Post-Validation (Optional)

Optionally, check citations at generation time by matching the quoted passage against the original chunk using fuzzy matching or hashing.

4. Use LLM to Create Semantic Tags for Chunks

Tags help categorize chunks by topic, theme, or function (e.g. ["scope 1", "baseline", "reporting year", "offsets"]) and can be used to improve filtering, clustering, or search refinement.

A. During Ingestion:

For each chunk, before or after creating the embedding:

Send the chunk to the LLM with a prompt like:

“Given this text, return 3 to 5 concise tags that describe its main concepts or topics.”

Include something like:

json

CopyEdit

{
  "chunk_text": "Emissions from owned vehicles should be reported as Scope 1. Companies must include all mobile combustion sources in the base year inventory...",
  "response": ["scope 1", "mobile combustion", "base year"]
}

B. Store Those Tags as Metadata

Tags would go into a tags column in their vector DB (e.g., pgvector or Pinecone), as an array or JSON field.

C. Use Tags for Filtering or Grouping

When doing retrieval or displaying answers, they could:

Filter by tag (e.g. “only show Scope 1 content”)
Group search results by tag category
Show tags to the user as context (“This answer came from content tagged: Scope 1, baseline…”)

Tools to Use

Tool	Role
LangChain	Can wrap the LLM call to generate tags using `LLMChain` or `PromptTemplate`
OpenAI SDK	Can embed a tool or step in their ingestion pipeline to call GPT and label
Their own ingestion script	Can just use a single call to `openai.ChatCompletion.create()` and insert the result into their chunk metadata

End of LLM transmission

I hope this help?
Luc

kavitatipnis · June 19, 2025, 11:04pm

@lucmachine - Thank you for sharing the details, we have all the functionality in place , today we generate citations using the Langchain’s runnable over prompt engineering - the former is a slightly more deterministic as we feed each doc with the metadata that matters to us. My question is really how do I “upgrade” this to the newer AgentSDK framework , very primitively, I will have to convert everything to a tool (aka function call) and run evals to perform regression testing. Perhaps what I am looking for in the long run is OOB “tools” for generating citations, memory/state management and streaming. From what I gather, the ResponsesAPI does this all natively which is a huge change over the existing chatCompletionsAPI. For reference, here are my next steps in this migration poc -

Create right tools - vector search, citations generation, history for wrapping existing langchain functions
Orchestrate the knowledge bot using the AgentsSDK and tools

Topic		Replies	Views
Request for Migration Guide to chatCompletions to Responses API using AgentsSDK & LangChain API	5	737	June 26, 2025
Thoughts on replacing langchain with native orchestration and doubling down OpenAI apis directly API	7	665	October 1, 2025
Fine-Tuning GPT-4o Mini for Bakery Chatbot with Function Calling API	6	381	April 4, 2025
How to build an AI system that can search over 50,000 documents with high accuracy? Community gpt-4 , fine-tuning , api , rag , assistants-api	7	2706	June 16, 2025
Building first RAG system API	17	2660	July 6, 2025