Limitations of assistants and threads

neil12 · November 24, 2024, 5:28pm

Hi, we’re looking into using Assistants api with threads and have a few questions:

We’re building a SaaS tool where Enterprise customers have access to AI features. Should the setup be either:
a) Separate assistant for each customer (name of the assistant would be the uid of the customer).
b) We have a single assistant which all customers use. Each customer would have their own thread under the single assistant.
Are there any cost implications with regards to creating assistants and threads? Given pricing is based on token usage in theory price is same if you use 1m tokens across 1 thread or 100 threads?
Are there limitations in terms of the number of assistants you can create and the number of threads?

Thanks!

_j · November 24, 2024, 5:49pm

You will find the Assistants platform to have its own limitations that cannot be overcome, because it operates one way only. A document search tool that has OpenAI’s instructions and not yours, conversation management and token cost that is OpenAI’s wishes and not yours, etc. You should explore and have a good understanding of Chat Completions and then see why you would want Assistants for server-side conversations or the two tool helpers offered for running python code or vector storage. Also, as “beta” it has a very low count of total API calls that can be made, de-published 60-300 per minute depending on the usage tier.

The non-AI parts of creating an assistant, creating a thread, adding messages, having untold numbers of them, do not have additional cost. It is when you run that you incur AI generation fees from the input context that OpenAI places and for what is generated.

Adding search documents does not incur a fee for the one-time embeddings run done to extract and chunk the document, but the storage space then used has daily charges. The Python code interpreter has per-session charges that are almost per-use because they expire quickly.

There is no need and no point to create individual assistants per customer. It is not a model or a conversation, it is basically a text block of instructions and some settings that can be applied to any run. It can be one purpose or identity, and a thread can be switched to another one for its latest message. Reuse is a feature.

Threads, of course, are user-specific, and you must maintain a synchronous database of your own - mapping customers to thread IDs, thread titles that you’d create and keep, and the user’s last preferred assistant to use on the thread or other settings you have such as enabled tools. You have to do even more work because it is server-side than if it were your own chat database, maintaining the state of uploaded files to their own auto-created fast-expiring store, images that will break a thread if deleted, generated files from code, tracking the server side expiration of message vector stores, expiration of code interpreter downloads or keeping your local copies against that, expiration of inactive threads you can no longer engage in, then maintaining file storage to prevent orphans and vector stores that have no metadata about their use.

There’s no limitation about thread counts or assistant counts, only limitations on the API rate to use the year-long beta.

neil12 · December 3, 2024, 9:37am

Thanks! So after a bit of playing with assistants I have learned the following:

PROS

Threads is useful for us as we need to retain a history. Prior to threads we would store the messages array in the database, however, this is impractical as you end up having to send the entire (ever growing) history with each request.
You can store things like the json response schema as part of the assistant so no need to send the schema as part of every request.

CONS

We want json schema responses which appears to only be supported by gpt-4o-mini and not gpt-4o which is annoying as gpt-4o gives better results for us.
We have to stream the response as that is the only way to know when the run has completed. The alternative would be to spam call the stream to check if it’s completed which would be terrible.
API docs appear to be out of date. Finding that what is in the docs, vs what is in the latest version of the sdk don’t align.
All assistants are viewable and editable directly in chatgpt playground. Super handy for debugging but feels somewhat risky if that assistant is being used in production

nickm · December 3, 2024, 12:21pm

API docs appear to be out of date. Finding that what is in the docs, vs what is in the latest version of the sdk don’t align.

The documentation does seem a bit stale in some areas. It’s also hard to understand, imho (but I am not l33t hackerz).

One thing that is unfortunate at the moment is the inability for the file search tool to extract the verbatim content used for in its file search. This was a feature in v1 of the API but they removed it for v2 at the moment (said it’s coming back, though). If you try to write code to extract the content yourself from the files used in the search, it will hallucinate. Due to this, i’m not using this extracte content feature at the moment.

Topic		Replies	Views
Questions about Assistant, threads API gpt-4 , assistants , assistants-api , assistants-pricing	29	36881	July 18, 2024
Assistant Thread limitations API gpt-4 , api , assistants-api	5	1122	July 30, 2024
Questions on Creating 100K+ threads API threads	6	862	March 18, 2024
New "Assistants" API a potential replacement for low level "RAG" style content generation? API	9	8584	March 4, 2024
Timeline for Assistants API Features? API assistants-api	3	1280	February 10, 2024

Limitations of assistants and threads

Related topics