I’m not sure I understand the pricing for my application:
I need to create 40.000 assistants, each with a single roughly 100kb text file.
Does that mean each assistant will cost 0.20 x 0.0001 = $0.00002 per day, and that therefore the retrieval part of of the cost for the entire set of 40,000 assistants will cost around 40000 x 0.20 x 0.0001 = $0.8 per day?
If not, how much will it cost?
The docs merely state:
How will Retrieval in the API be priced?
Retrieval is priced at $0.20/GB per assistant per day. If your application stores 1GB of files for one day and passes it to two Assistants for the purpose of retrieval (e.g., customer-facing Assistant #1 and internal employee Assistant #2), you’ll be charged twice for this storage fee (2 * $0.20 per day). This fee does not vary with the number of end users and threads retrieving knowledge from a given assistant.
There has to be a confusion. Why are you creating 40,000 assistants? an assistant is a configuration for a conversation or thread, you can have 1 assistant, the Doctor Assistant, and have 40,000 conversations or threads, all of them will have their own context, and the users will all see the same persona (assistant), the Doctor Assistant. At least that is how I understand assistants
because I want to give each assistant exactly one unique file for retrieval context, and not any other data. I want to prevent the other assistants from even knowing of the other files.
With that many unique assistants, you’d be better off just having your own RAG, or maybe you can fit the 100k into the context window without the RAG? (pricier option, but no infrastructure)
Sounds like a “unique information per person” situation. Which is not the apparent intent of the OpenAI Assistant offering. Their offering is a single entity that serves everyone with the same knowledge.
Will do if I find an answer. Alternatively if we could set specific documents for different instances of the same assistant, or just to enable and disable docs for each thread, that would work too.
40,000 assistants is insane. As @curt.kennedy mentioned you should just try even creating 40,000 to begin with. I don’t think that this is possible.
If you want 40,000 contained instances for documents you would be MUCH better off just using metadata along with some business logic to retrieve the documents (the same business logic that would match the user to the assistant).
For pricing, nobody (besides staff) will know. As of now it’s free so we can’t check our billing.
do you mean standard RAG with something like pinecone + normal openai chat?
If not, what do you mean?
If yes, I know that can work but it would not be better. It would be more complicated. It’s simpler to create 40k assistants if that’s allowed and if the stated pricing and limits are indeed correct.
One way forward would be to create a new assistant on the spot each time I need to chat with one particular document, but that would probably add latency and make things very weird.
Since this is a consumer product this might effectively mean 10k assistants would be created and then immediately deleted every day, which would introduce ambiguity to the 20c per day cost and is likely not going to be ok with OpenAI even if latency were not a problem.
But I think you have decided what you’re going to do. I think the first step is to try and even create 40,000 assistants, or even cycling through 10,000 a day.
That’s not for adding files. It’s just for referencing which already-uploaded files should be used for the message. The max upload file number is 20, not 40,000, so it wouldn’t work for this.
Good to know that you can select which of the 20 files you want to use, but not workable for this case it seems.
It may be worth investigating. You can only upload 20 files directly to an assistant to always potentially use for retrieval. But, you can upload as many files as you want to your directory:
Upload a file that can be used across various endpoints/features. The size of all the files uploaded by one organization can be up to 100 GB.
And thinking further, this is a “hacky” way to bypass the limits. If it does work then it will only be for a while, and you could risk losing your account.
Let me assume that you have 40.000 customers and you want to make the customer to be able to talk to his “private consultant” based on some individual data of that person.
I’d not put all the personal data from so many people into the cloud system, if there appear problems you will be in trouble.
You should rather think of a “private WEB-Backend” that may use the chat-API with the data from the logged in customer.
That seems to me more safe for customer data, and should not be a big problem, when using the chat-Endpoint with 128k Context-Window.