Implementing RAG via Custom Functions in OpenAI Assistants

abc01 · October 18, 2024, 8:25pm

Hi everyone,

I’m exploring the possibility of implementing Retrieval-Augmented Generation (RAG) directly using the “custom functions” feature in OpenAI Assistants.

Specifically, given a vector storage database (such as Pinecone or a similar service), I’m wondering if it’s feasible to integrate the RAG context search functionality directly within the Assistant through custom functions.

For example, if I provide the database URL, API key, index name, and top_k, I would expect the Assistant to retrieve the relevant context from the database before generating a response.

On top of other benefits, this approach could potentially improve latency by reducing the overhead of external API calls.

Has anyone tried this, or do you think this approach is feasible?

Thank you in advance for any suggestions!

sergeliatko · October 18, 2024, 8:40pm

Hi @abc01,

Yes, totally doable.

Depending on the complexity of the data, a good starting point would be a combo of postgresql for relational database + weaviate for vector management and vector search + Directus to manage the both and crud API along with custom webhooks for extra features, all behind a Traefik proxy.

What would be the app you’re building?

abc01 · October 18, 2024, 8:52pm

Thank you @sergeliatko,

Assuming I make use of pinecone, wouldnt be ‘simply’ a matter of writing a custom function that essentially combine the following three?

def query_pinecone(query_text, top_k=5):
#Convert query to embedding
query_embedding = get_embedding(query_text)

query Pinecone
result = index.query(queries=[query_embedding], top_k=top_k)
return result
def get_relevant_context(query):
result = query_pinecone(query)
relevant_text = [match[‘metadata’][‘text’] for match in result[‘matches’]]
return " ".join(relevant_text)
def generate_answer_with_context(query):
context = get_relevant_context(query)
prompt = f"Context: {context}\n\nUser Query: {query}\nAnswer:"
response = openai.Completion.create(
engine=“gpt-3.5-turbo”,
prompt=prompt,
)
return response.choices[0].text.strip()

Thank you

sergeliatko · October 19, 2024, 7:04am

Sure you would be ready to start like this as well. Personally I prefer something ready to be scaled if your project takes off. But then, if your goal was to test a concept, chose whatever is easier and faster.

abc01 · October 21, 2024, 2:04pm

Thank you for the feedback. I realize my initial question might not have been clear. What I’m specifically asking is not about the choice of vector management solutions (e.g., Weaviate, Pinecone, Elasticsearch) or the external systems involved. My focus is whether it’s possible to implement the search and retrieval of context directly within OpenAI’s Agent using the ‘Functions’ or ‘Code Interpreter’ features. The goal is to minimize external API calls and reduce latency. Or do you believe this process (the search and retrieval of context) must necessarily be initiated from outside the AI Assistant?

If you believe this integration is possible within the Agent, should it be done via the ‘Code Interpreter’ or ‘Functions’? Or do you think the search and retrieval process must necessarily be initiated externally and then, the retirved context, feeded into the AI Assistant?

sergeliatko · October 21, 2024, 11:28pm

Personally, I don’t use assistants as I don’t really see benefits of giving AI the control of what context is used to form the response my apps need. But that’s a personal choice + specifics of what I’m doing usually.

Assistants are cool when you don’t want to deal with thread and context management. The price for skipping that bootstrap is:

It’s assistant who picks the most relevant messages from the current thread to answer the user…

I prefer using tools I mentioned (relatively easy to work with) and chat endpoints to have stateless tools and full control over the context.

It’s totally doable to use function calling to allow assistants access the context they judge necessary. But have you evaluated their judgement and the tradeoffs of such approach? If it’s you who is going to handle the retrieval and context, are you sure you need assistants?

sergeliatko · October 21, 2024, 11:39pm

As for the screenshot, when I need to add features to assistants (like custom GPTs for my wife to handle her website sales and reservations stats + attendee seats info with pre processing in Google sheets) I prefer exposing custom API definitions (basically same thing as function calling but the GPT bot handles the API request for you) with all features they might need and good set of instructions/workflow descriptions inside the knowledge files when they don’t fit into bot instructions limit. Works cool. Here is an example:

Topic		Replies	Views
Assistants API and RAG - Best of Both Worlds? API rag , assistants-api	7	11253	May 29, 2024
Seeking the Best API Choice: Should I Use OpenAI's Assistant API or Chat Completion API? API chatgpt , plugin-development , fine-tuning , api , assistants-api	12	1283	August 25, 2024
Assistant Retrieval method and RAG (are they doing same?) API codex , gpt-4 , gpt-35-turbo , chatgpt , api	6	6851	January 1, 2024
How to Add Knowledge Base in API API api	12	16259	December 15, 2023
Retrievals from files are cool , but I want to use retrieval from databases any ideas? API	3	1255	March 13, 2024

Implementing RAG via Custom Functions in OpenAI Assistants

Related topics