I’m not sure I understand the pricing for my application:
I need to create 40.000 assistants, each with a single roughly 100kb text file.
Does that mean each assistant will cost 0.20 x 0.0001 = $0.00002 per day, and that therefore the retrieval part of of the cost for the entire set of 40,000 assistants will cost around 40000 x 0.20 x 0.0001 = $0.8 per day?
If not, how much will it cost?
The docs merely state:
How will Retrieval in the API be priced?
Retrieval is priced at $0.20/GB per assistant per day. If your application stores 1GB of files for one day and passes it to two Assistants for the purpose of retrieval (e.g., customer-facing Assistant #1 and internal employee Assistant #2), you’ll be charged twice for this storage fee (2 * $0.20 per day). This fee does not vary with the number of end users and threads retrieving knowledge from a given assistant.
There has to be a confusion. Why are you creating 40,000 assistants? an assistant is a configuration for a conversation or thread, you can have 1 assistant, the Doctor Assistant, and have 40,000 conversations or threads, all of them will have their own context, and the users will all see the same persona (assistant), the Doctor Assistant. At least that is how I understand assistants
With that many unique assistants, you’d be better off just having your own RAG, or maybe you can fit the 100k into the context window without the RAG? (pricier option, but no infrastructure)
Sounds like a “unique information per person” situation. Which is not the apparent intent of the OpenAI Assistant offering. Their offering is a single entity that serves everyone with the same knowledge.
40,000 assistants is insane. As @curt.kennedy mentioned you should just try even creating 40,000 to begin with. I don’t think that this is possible.
If you want 40,000 contained instances for documents you would be MUCH better off just using metadata along with some business logic to retrieve the documents (the same business logic that would match the user to the assistant).
For pricing, nobody (besides staff) will know. As of now it’s free so we can’t check our billing.
One way forward would be to create a new assistant on the spot each time I need to chat with one particular document, but that would probably add latency and make things very weird.
Since this is a consumer product this might effectively mean 10k assistants would be created and then immediately deleted every day, which would introduce ambiguity to the 20c per day cost and is likely not going to be ok with OpenAI even if latency were not a problem.
Let me assume that you have 40.000 customers and you want to make the customer to be able to talk to his “private consultant” based on some individual data of that person.
I’d not put all the personal data from so many people into the cloud system, if there appear problems you will be in trouble.
You should rather think of a “private WEB-Backend” that may use the chat-API with the data from the logged in customer.
That seems to me more safe for customer data, and should not be a big problem, when using the chat-Endpoint with 128k Context-Window.