Switching from Assistants API to Chat Completion?

Has anyone switched from the Assistants API to Chat Completion and can share their experience here?
I am considering making the switch, but it seems I would need to reimplement threads, runs, the context window, and possibly many other mechanisms…
Any insights or recommendations on how to deal with this ?

I like the simplicity of the Assistants API but it is still in beta and there are many drawbacks making me consider switching to Chat Completion or other LLM solutions until it improves :

  • Too slow for chat apps (where the users expects an instant answer)
  • Instability - including critical bugs that take several hours to resolve, leaving my app completely unusable
  • Not being able to use all the params we can use with chat completion (frequency penalty, max tokens, presence penalty, response format, temperature, …)
  • Not being able to use fine-tuned models in assistants API, which makes my assistant answer like all other non fine-tuned assistants, always “delving deeper into the tapestry of” something, even after experimenting with a lot of different prompt instructions.

Insights, feedback, and recommendations appreciated ! :pray:

2 Likes

Here’s one potential way of implementing this. The basic architectural philosophy is to use Assistant, Threads, Messages & Runs AS-IS and use Metadata to drive inputs to chat completion. In this manner, one gets chat completion till Assistant Api matures.

A basic implementation can sub-class of off BaseAssistant, BaseThread, BaseMessage & BaseRun in betaassi/src/openai_session_handler/models at main · icdev2dev/betaassi · GitHub

Class CCAssistant(BaseAssistant):

Class CCThread(BaseThread):

Class CCMessage(BaseMessage):
p_role:Optional[str] = Field(default =“”)

Each variable thus declared uses up one metadata field (out of 15, because each of the Basexxxx uses 1 metadata field).

Then you can use all the messages in the thread through the p-role; intializing the thread’s first message to provide “system” context from the assistant’s instruction by providing the message.role to be “user”; BUT distinguishing the actual role through p_role=“system”

Every time one needs an input from chat completion, the message formation for chat completion is done through iterating over the entire set of messages in order of insertion of messages in thread. A little transformation is required for each message to format according to what is required for chat completion api

When you get an answer from chat completion, you can append a message with role=“user”, p_role=“chatcompletion” (remembering that the message role is restricted to “user”).

That in a nutshell is it.

edit: will open source the implementation soon. But with the above, anyone can get started

2 Likes

Thanks a lot @icdev2dev for the solution you’ve proposed! I’m definitely going to give it a try as it seems to offer the best of both worlds: the ease of use of the Assistants API with the advanced functionalities of chat completion.

I haven’t explored the code in detail yet but I’m curious about your take on the compatibility of this solution with LLM models other than those from OpenAI. Do you believe it’s possible to mimic the structure of the Assistants API (Thread, Message, Run, Assistants) and develop a system that’s LLM agnostic or do you think of major incompatibilities?
Or maybe I’m trying to reinvent the wheel? :sweat_smile:

Glad that you asked!

The first answer is from a pure technical perspective. In that perspective, i view the Assistant Api (Assistant, Thread, Message and Run – aka ATMR) as having a dual role. The first role is the “storage role”: storing the data on which LLMs operate & the second role is the “execution role”: running the execution of the stored data through the LLMs to provide necessary output. Both of those roles are conflated in the usage of AssistantApi, currently. What betaassi essentially provides is a clean separation of the two ; so that one can be agnostic of what one uses as an “execution role” (in this case, chat completion).

Because i believe that the fundamental data structures (ATMR) are well thought out from the utilization of LLMs, i believe that use cases that benefit from this “technical perspective” can extend beyond just the utilization of Chatcompletion. For instance, it is “easy” to create “snapshots” of threads, by providing a list_snapshots pseudo variable within a subclass of BaseThread that utilizes n number of real metadata attributes such as list_snapshot_1, list_snapshot_2, list_snapshot_3 that uses up three metadata attributes out of 16. In this manner, the snapshot of a thread is a thread itself and only the thread_id of the snapshotted thread is stored as a snapshot thread in list_snapshot. Clearly there are limitations to how one can use such a snapshot mechanism. Nevertheless the fact that this is possible is nice.

The second answer is more a business perspective. While theoretically possible to extend this philosophy to storing all of the data on OpenAI and executing on a completely different platform (i.e. not OpenAI) , I am not sure about the overall viability of doing so over the longer term; given that creating threads, assistants et.al is currently free on the OpenAI platform. Overall the even longer term solution could involve a platform that even stores the AssistantApi structures (ATMR) separately. But that is not today.

Hope this help!

2 Likes