Designing a Custom Chatbot with RAG and Function Calling

I want to create a custom chatbot using OpenAI’s API that utilizes both Retrieval-Augmented Generation (RAG) for fetching up-to-date contextual information and function calling to perform specific actions (e.g., processing orders, querying databases). For example, in an e-commerce chatbot, the system could retrieve real-time product availability using RAG and then call a function to process an order.

How can I design such a system? What are the best practices for integrating RAG and function calling together?

I’ve tried a basic approach where I fetch documents from an IndexDB based on the query and then make a GPT-4 API call that incorporates function calling. However, the performance has been subpar. Specifically, the model loses context. For instance, if a function call requires parameter clarification, the model asks for clarification but continues with the RAG step instead of proceeding with the function call once the parameters are clarified.

Here’s the code I’ve been working with:

response = openai_client.chat.completions.create(
    model=azure_openai_chat_completions_deployment_name,
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": formatted_user_question_template},
    ],
    tools=tools,
    temperature=0.1,
    max_tokens=MAX_COMPLETION_TOKENS,
    parallel_tool_calls=False,
)

formatted_user_question_template has chat history, user question, and retrieved documents.

2 Likes

Hi @satellitef and welcome to the community!

The performance of a RAG system tends to be highly dependent on the “retriever” component. It’s quite an elaborate topic but you can have a look at some of the “considerations” mentioned here.

With the retrieval part, you may have purely structured data (like looking up product IDs), but often you will probably have purely unstructured data (product descriptions), and embedding those and performing vector search, and then doing some additional re-ranking using structured data and other “signals” is usually how you improve the performance.

1 Like

Hi @platypus thank you so much for your response! I actually came across that article already and am now looking for a more advanced and detailed solution, example, or article. I appreciate your help!

1 Like

No problem, and great!

I think my comment still stands - improving retrieval and re-ranking is key here. A while back Anthropic came out with contextual retrieval, which is essentially combining old school techniques (like BM25) with modern vector search. Also I would recommend looking for a very classic text from Stanford, Introduction to Information Retrieval.

Hi,
RAG

  • For RAG I would recommend you consider Redis and QDrant, which both allow’s fast retrival for RAG, and there if of course other options you can take into account these are just some examples.
  • Additioanlly I would recommend QDrant further more for similarity searches.

Function Calling Integration

  • For function calling I think AutoGen from Microsoft is a good choice you could look into, and also it integrates with OpenAI easily.

Just some thoughts and recommendations :slight_smile:
Hope this helps