Assistant vs Chat with Embeddings

Hey everyone,
I use Azure with OpenAI API and I have created my own ChatBot using the Chat API with embeddings.

The ChatBot is being called from the backend and for each of my users and is fed the following prompt data:

  • A System prompt with the guidlines
  • The message history with all the previous messages (both questions of the user and answers of the chatbot)
  • A series of documentation texts retrieved from .md files. Those files are selected using a cosine similarity between the latest message (user prompt) and the static embeddings of the documentation files (one by one check)
  • The project of the user which consists of a large Json file which can be changed even between each message.
  • The latest message (user prompt/input)

The question is whether should I g with creating a static assistant in the backend that contains all the files in the file search mode and from there each user has their own thread to work with.

From the research I’ve made so far I know that wit the assistant I have less control on the costs and the tuning while with the chat I have more control but it is harder to make it work.

In my case I already have the chat implementation and it works fine. But I need your opinion on your experience because I clearly made an Assistant with my chat bit and I am thinking whether or not I should put effort to check that also.

What really concerns me at this point is that I feed the large JSON in each message to the machine and I need a way to avoid that. Maybe use embeddings for that as well? But the issue is that it can actually change from call to call.

Note:
I have no access in my calls token usage for the steaming because Azure hasn’t really enabled token usage…

1 Like