Implementing RAG via Custom Functions in OpenAI Assistants

Hi everyone,

I’m exploring the possibility of implementing Retrieval-Augmented Generation (RAG) directly using the “custom functions” feature in OpenAI Assistants.

Specifically, given a vector storage database (such as Pinecone or a similar service), I’m wondering if it’s feasible to integrate the RAG context search functionality directly within the Assistant through custom functions.

For example, if I provide the database URL, API key, index name, and top_k, I would expect the Assistant to retrieve the relevant context from the database before generating a response.

On top of other benefits, this approach could potentially improve latency by reducing the overhead of external API calls.

Has anyone tried this, or do you think this approach is feasible?

Thank you in advance for any suggestions!

Hi @abc01,

Yes, totally doable.

Depending on the complexity of the data, a good starting point would be a combo of postgresql for relational database + weaviate for vector management and vector search + Directus to manage the both and crud API along with custom webhooks for extra features, all behind a Traefik proxy.

What would be the app you’re building?

1 Like

Thank you @sergeliatko,

Assuming I make use of pinecone, wouldnt be ‘simply’ a matter of writing a custom function that essentially combine the following three?

  1. def query_pinecone(query_text, top_k=5):
    #Convert query to embedding
    query_embedding = get_embedding(query_text)

    query Pinecone
    result = index.query(queries=[query_embedding], top_k=top_k)
    return result

  2. def get_relevant_context(query):
    result = query_pinecone(query)
    relevant_text = [match[‘metadata’][‘text’] for match in result[‘matches’]]
    return " ".join(relevant_text)

  3. def generate_answer_with_context(query):
    context = get_relevant_context(query)
    prompt = f"Context: {context}\n\nUser Query: {query}\nAnswer:"
    response = openai.Completion.create(
    engine=“gpt-3.5-turbo”,
    prompt=prompt,
    )
    return response.choices[0].text.strip()


Thank you

Sure you would be ready to start like this as well. Personally I prefer something ready to be scaled if your project takes off. But then, if your goal was to test a concept, chose whatever is easier and faster.

Thank you for the feedback. I realize my initial question might not have been clear. What I’m specifically asking is not about the choice of vector management solutions (e.g., Weaviate, Pinecone, Elasticsearch) or the external systems involved. My focus is whether it’s possible to implement the search and retrieval of context directly within OpenAI’s Agent using the ‘Functions’ or ‘Code Interpreter’ features. The goal is to minimize external API calls and reduce latency. Or do you believe this process (the search and retrieval of context) must necessarily be initiated from outside the AI Assistant?

If you believe this integration is possible within the Agent, should it be done via the ‘Code Interpreter’ or ‘Functions’? Or do you think the search and retrieval process must necessarily be initiated externally and then, the retirved context, feeded into the AI Assistant?

Personally, I don’t use assistants as I don’t really see benefits of giving AI the control of what context is used to form the response my apps need. But that’s a personal choice + specifics of what I’m doing usually.

Assistants are cool when you don’t want to deal with thread and context management. The price for skipping that bootstrap is:

It’s assistant who picks the most relevant messages from the current thread to answer the user…

I prefer using tools I mentioned (relatively easy to work with) and chat endpoints to have stateless tools and full control over the context.

It’s totally doable to use function calling to allow assistants access the context they judge necessary. But have you evaluated their judgement and the tradeoffs of such approach? If it’s you who is going to handle the retrieval and context, are you sure you need assistants?

2 Likes

As for the screenshot, when I need to add features to assistants (like custom GPTs for my wife to handle her website sales and reservations stats + attendee seats info with pre processing in Google sheets) I prefer exposing custom API definitions (basically same thing as function calling but the GPT bot handles the API request for you) with all features they might need and good set of instructions/workflow descriptions inside the knowledge files when they don’t fit into bot instructions limit. Works cool. Here is an example: