Questions on Creating 100K+ threads

If a service needs 100,000+ threads to serve 100,000 users because each of the threads represents a conversation between the user and the assistant, are there any specific things that the service need to worry about from an OpenAI perspective.

Assume that :

  • the ability to intermingle NamedMessages (messages belonging to a specific user) in a thread has been ruled out because of, ultimately, latency reasons.
  • there is a robust mechanism in place to locate a specific thread belonging to a user (aka NamedThreads)
  • Currently because of the brittle (and functionally incomplete) nature of the Beta APIs, the service does not intend to use the run mechanism.
  • The service intend to keep all threads alive by posting a heartbeat message onto the thread periodically.

So the question is :

  • Would so many threads cause eyebrows to be raised by OpenAI Infra Team?

The assistants endpoint cannot handle a fraction of that quantity of usage, even with streaming instead of polling, because you are limited to 60 API calls per minute as an organization.

Threads do not expire until inactive for 60 days. Thus, you can maintain a ChatGPT-like list of thread titles, and recall them, however a long-term user may not expect that their chats start dropping off the face of the Earth (unlike ChatGPT), and you’ll need to programmatically handle the management of all threads associated with customers in a database, deleting them and expecting them to be made unavailable after expiration.

If you’re thinking about just using threads as some storage mechanism without ever making API calls to use assistants, you can see that’s pretty poor. The amount of your own database that is required to maintain customers, thread id’s and titles, and latency of trying to recall messages from a thread API call just to repopulate a user interface: You might as well manage all conversation on your side…and avoid assistants entirely.

1 Like
  • Own Database
    There’s NO my “own database”. In my design there is NO storage outside of OpenAI’s assistants, threads and messages.

  • Thread expiry
    Since I intend to post a hearbeat message onto each thread periodically (~ 14 days), I believe that will suffice from threads never being inactive.

  • API Call Limits
    You do bring up a point of 60 APIs calls per min; which effectlvely means ~86000 per day. Most of the users will use this service may be once week.

That all said, my question still remains unanswered:

  • Will so many threads cause eyebrows to be raised at the OpenAI infra level?

I would think not. Imagine the billions of conversations from 100M+ free ChatGPT users maintained forever…

If you are still doing lots of billed API use otherwise, they probably won’t run some future scanner on you that could say “hey, buddy, quit making all those threads but not running assistants”.

Threads alone is not that useful, as you can’t place real assistant messages, and that “heartbeat” may need to be adding more messages that would break a conversation.

1 Like
  • real assistant messages on threads

It is true that one cannot , currently, put real assistant messages on the thread. The service does not intend to run run to get to completion; rather uses chatcompletion.

As has been demonstrated here (openairetro/examples/temparature/core.py at main · icdev2dev/openairetro · GitHub), it is completely feasible to have another field called actual_role to take place of the role.

It may be noted that all of this ringmarole is necessary because of brittle and functionally incomplete beta AssistantApi.

  • quit making so many threads
    The service intends to make use of Chatcompletion currently; so i think openai won’t mind too much.

@icdev2dev there is really no point on using an slow and moody API like the current Assistants Beta.

You are better off starting on a free tier of a managed database service, you can use it as an API in the same way and you avoid the common issues with this Beta API– from latency to downtime– which can really affect user experience.

If you’re happy with how they define convos, just copy the data model.

Happy coding!

1 Like

So i think that there are two aspects to this AssistantApi. One is the data plane (where the act of data creation…assistant.create, thread.create etc…happens) and the second is the execution plane (where all the completion related things happens).

So far i can tell, the data plane seems to be alright. If you have pointers to counter that, happy to look.

The moodiness comes into play at the execution plane…again so far as i can tell. So trying to avoid that with Chatcompletion.

That’s why i say : use the semantics of AssistantApi but mechanics of Chatcompletion.

If i were to consider moving to , let’s say, a Redis in managed services mode, that would be in a larger context. i.e. reduce our exposure to OpenAI by exploring alternatives like groq etc. I might be getting there if the APIs don’t get better soon (from just getting the APIs to complete better).

Hth with my thought process.