You will find the Assistants platform to have its own limitations that cannot be overcome, because it operates one way only. A document search tool that has OpenAI’s instructions and not yours, conversation management and token cost that is OpenAI’s wishes and not yours, etc. You should explore and have a good understanding of Chat Completions and then see why you would want Assistants for server-side conversations or the two tool helpers offered for running python code or vector storage. Also, as “beta” it has a very low count of total API calls that can be made, de-published 60-300 per minute depending on the usage tier.
The non-AI parts of creating an assistant, creating a thread, adding messages, having untold numbers of them, do not have additional cost. It is when you run that you incur AI generation fees from the input context that OpenAI places and for what is generated.
Adding search documents does not incur a fee for the one-time embeddings run done to extract and chunk the document, but the storage space then used has daily charges. The Python code interpreter has per-session charges that are almost per-use because they expire quickly.
There is no need and no point to create individual assistants per customer. It is not a model or a conversation, it is basically a text block of instructions and some settings that can be applied to any run. It can be one purpose or identity, and a thread can be switched to another one for its latest message. Reuse is a feature.
Threads, of course, are user-specific, and you must maintain a synchronous database of your own - mapping customers to thread IDs, thread titles that you’d create and keep, and the user’s last preferred assistant to use on the thread or other settings you have such as enabled tools. You have to do even more work because it is server-side than if it were your own chat database, maintaining the state of uploaded files to their own auto-created fast-expiring store, images that will break a thread if deleted, generated files from code, tracking the server side expiration of message vector stores, expiration of code interpreter downloads or keeping your local copies against that, expiration of inactive threads you can no longer engage in, then maintaining file storage to prevent orphans and vector stores that have no metadata about their use.
There’s no limitation about thread counts or assistant counts, only limitations on the API rate to use the year-long beta.