Assistants API Retrieval Pricing: how much does this cost?

I’m not sure I understand the pricing for my application:

I need to create 40.000 assistants, each with a single roughly 100kb text file.

Does that mean each assistant will cost 0.20 x 0.0001 = $0.00002 per day, and that therefore the retrieval part of of the cost for the entire set of 40,000 assistants will cost around 40000 x 0.20 x 0.0001 = $0.8 per day?

If not, how much will it cost?

The docs merely state:

How will Retrieval in the API be priced?

Retrieval is priced at $0.20/GB per assistant per day. If your application stores 1GB of files for one day and passes it to two Assistants for the purpose of retrieval (e.g., customer-facing Assistant #1 and internal employee Assistant #2), you’ll be charged twice for this storage fee (2 * $0.20 per day). This fee does not vary with the number of end users and threads retrieving knowledge from a given assistant.

Thank you.

There has to be a confusion. Why are you creating 40,000 assistants? an assistant is a configuration for a conversation or thread, you can have 1 assistant, the Doctor Assistant, and have 40,000 conversations or threads, all of them will have their own context, and the users will all see the same persona (assistant), the Doctor Assistant. At least that is how I understand assistants

because I want to give each assistant exactly one unique file for retrieval context, and not any other data. I want to prevent the other assistants from even knowing of the other files.


With that many unique assistants, you’d be better off just having your own RAG, or maybe you can fit the 100k into the context window without the RAG? (pricier option, but no infrastructure)

Sounds like a “unique information per person” situation. Which is not the apparent intent of the OpenAI Assistant offering. Their offering is a single entity that serves everyone with the same knowledge.


I don’t know what their offering is from the pricing and docs sections. Hence my question and calculations above.

If you know the official answer (eg that the costs above are incorrect, and what the correct ones are) please let me know.

I don’t see any max assistants listed in any of my quotas or the docs. But it’s not wise to assume they are infinite either.

Just start building them, and let us know! Maybe OpenAI can chime in here.


Will do if I find an answer. Alternatively if we could set specific documents for different instances of the same assistant, or just to enable and disable docs for each thread, that would work too.

If you find out please let me know.

40,000 assistants is insane. As @curt.kennedy mentioned you should just try even creating 40,000 to begin with. I don’t think that this is possible.

If you want 40,000 contained instances for documents you would be MUCH better off just using metadata along with some business logic to retrieve the documents (the same business logic that would match the user to the assistant).

For pricing, nobody (besides staff) will know. As of now it’s free so we can’t check our billing.

1 Like

do you mean standard RAG with something like pinecone + normal openai chat?

If not, what do you mean?

If yes, I know that can work but it would not be better. It would be more complicated. It’s simpler to create 40k assistants if that’s allowed and if the stated pricing and limits are indeed correct.

Your program must somehow match the user to the correct assistant, right?

Why not just have one assistant, and instead match the user to the document? You can attach metadata to track which document has been attached

Because I don’t want to match a user to a document. I want a gpt assistant to interact in a chat interface using the single file for RAG.

In other words, I want exactly what 40k assistants are.

1 Like

You can attach files to your messages
Screenshot from 2023-11-08 17-28-06


You can at creation time, not at message time.

One way forward would be to create a new assistant on the spot each time I need to chat with one particular document, but that would probably add latency and make things very weird.

Since this is a consumer product this might effectively mean 10k assistants would be created and then immediately deleted every day, which would introduce ambiguity to the 20c per day cost and is likely not going to be ok with OpenAI even if latency were not a problem.

That’s the reason why creating 40k assistants, each with one file, would make sense, as long as the pricing section is in fact correct.

The documentation I showed you is for creating a message, at message time.

But I think you have decided what you’re going to do. I think the first step is to try and even create 40,000 assistants, or even cycling through 10,000 a day.

1 Like

That’s not for adding files. It’s just for referencing which already-uploaded files should be used for the message. The max upload file number is 20, not 40,000, so it wouldn’t work for this.

Good to know that you can select which of the 20 files you want to use, but not workable for this case it seems.

you attach the file to the thread, it still makes nos sense to me why you would want to create 40,000 asistants

so that each chat session or thread has access to one and only one file out of 40.000.

It may be worth investigating. You can only upload 20 files directly to an assistant to always potentially use for retrieval. But, you can upload as many files as you want to your directory:

Upload a file that can be used across various endpoints/features. The size of all the files uploaded by one organization can be up to 100 GB.

And thinking further, this is a “hacky” way to bypass the limits. If it does work then it will only be for a while, and you could risk losing your account.


Let me assume that you have 40.000 customers and you want to make the customer to be able to talk to his “private consultant” based on some individual data of that person.
I’d not put all the personal data from so many people into the cloud system, if there appear problems you will be in trouble.
You should rather think of a “private WEB-Backend” that may use the chat-API with the data from the logged in customer.
That seems to me more safe for customer data, and should not be a big problem, when using the chat-Endpoint with 128k Context-Window.

1 Like