OPENAI ASSISTANTS API PRICING - help pls

eddie4 · February 23, 2024, 10:50pm

Hi guys,

I hope you are well. I am building a mobile app that is using the GPT Assistants API to create the avatar that users will be chatting with.

I am trying to create a financial model and it is super difficult to get a price going on how much the Assistants will cost because obviously it depends on tokenisation etc.

How do you think I can go about a sophisticated method to get as accurate as I can to making the assumption for the cost of the OpenAI Assistants for the AI model.

naypat · February 24, 2024, 12:49pm

@eddie4 Please let me know if you get an answer to this

DhruvAwasthi · June 25, 2024, 5:45pm

@eddie4 @naypat were you guys able to figure it out please?

naypat · June 25, 2024, 6:04pm

@DhruvAwasthi Yes I was.

Here’s an example break down that may help you.

For 1,000 users asking “What’s the annual leave policy?” (7 tokens) and getting back 100 tokens in response using GPT-4:

Input: 7 tokens x 1,000 users = 7,000 tokens. Cost = 7,000 tokens * $0.03/1,000 = $0.21.
Output: 100 tokens x 1,000 users = 100,000 tokens. Cost = 100,000 tokens * $0.06/1,000 = $6.

Total token cost = $0.21 + $6 = $6.21.

For retrieval, if you’re using 1GB of data, it’s $0.20/GB/day. So, if we add that, your total would be $6.21 for the tokens plus $0.20 for the retrieval, making it $6.41 in total. You can then change that based on how large the average file would be in your context.

Prices can change, so check OpenAI’s pricing page for the latest. Hope that helps clear things up! Let me know if you need more clarifications.

Munna23 · June 25, 2024, 6:08pm

You can calculate the rough estimate based on system prompt length and length of conversations you are looking to store as context. Here is the pricing list for different models.
https://openai.com/api/pricing/

DhruvAwasthi · June 25, 2024, 6:09pm

Are you sure it works this way?

From what I have been reading and exploring, it seems like the document tokens are also added in each user message which sometimes can be up to 16,000 tokens.

@_j Can you please commend on this scenario?

naypat · June 25, 2024, 6:21pm

@DhruvAwasthi

Yes. You can set limits on the Assistants response, you can customise how many tokens it can respond with. And you of course give the assistant the message, so you decide the input tokens.

All information is on the OpenAI pricing website, though not as easily understandable.

DhruvAwasthi · June 25, 2024, 6:26pm

Oh okay @naypat. Thanks for your help man!

_j · June 25, 2024, 7:38pm

When the AI within Assistants emits a tool call and receives a language response, this is also stored as messages within a thread with their own roles, and are not presented to you as messages you can retrieve. The AI then can continue with successive calls to a tool, or finally output to a user.

These tool responses will continue to be part of the thread, re-sent until they expire because of the limited model context length or because you use the truncation_strategy to limit the number of past turns.

The token figure for “document tokens” is from a private message that was solicited. The file search that can be done on uploaded documents is another type of tool the AI can internally use, with the results added to a thread as context. This token count is figured by the documents being chunked into 800 token pieces (with overlap also), and the search returning 20 chunks (with no relevancy cutoff employed), for 800x20 = 16000, with variation depending on how many partial chunks or chunks from small documents may be in results. (gpt-3.5 gets 5 chunks back)

There are new parameters where you can specify the number of returned chunks, along with the file chunking size when you create a vector store.

Assistants does not have a high quality limitation on its response, which must instead be done by guidance to the AI. The current parameters will only terminate output after still incurring cost, especially if setting unrealistically-low values that do not allow tools to operate.

DhruvAwasthi · July 14, 2024, 4:50pm

Sorry for the late reply.
Due to the limited flexibility, I moved to Langchain to build the RAG. Now all the things like retrieval, trimming the messages, etc. gives more flexibility to update them as we want. And this really seems to work.

Thank you for your detailed response!

Topic		Replies	Views
Need Help Understanding ASSISTANT API Pricing for GPT-4 Turbo and File Storage API	10	9402	March 9, 2024
Query about Assistant API & Tools Pricing in-detail API api , pricing , assistants-api , assistants-pricing , tools	1	790	February 14, 2024
OpenAI team, thanks for the work you’re doing Community chatgpt	1	342	June 4, 2024
Woa 35k tokens in one go ?!?!?!?!?!?!? API gpt-4 , assistants-api	11	1132	February 14, 2024
Seeking Advice on Reducing Costs for RAG Chatbot Using File Search Assistant API api	4	1011	July 6, 2024

OPENAI ASSISTANTS API PRICING - help pls

Related topics