As a Ph.D. student engaged in AI in education research, I’ve been utilizing OpenAI’s Assistant API for my project. My experience has led to some important observations and concerns:
Retrieval Charges: Despite OpenAI stating that retrieval is free until 1/12/2024, I was charged for each retrieval of my PDF, which significantly inflated my costs.
Token Count Discrepancy: The API seems to read the raw PDF data resulting in inflated tokens count and higher costs. In my case I computed 3566 tokens while the assistant API retrieved around 13k tokens.
Tokenization Limitation: The API appends the entire conversation thread, including any PDFs (when retrieval is active), to each message. The API will keep appending until it accumulates up to approximately 128k tokens (the GPT-4 token limit).
Context Window Management: OpenAI’s current setup does not allow users to control the length of the context window. While OpenAI is considering enabling this feature, there’s no definitive timeline or update.
Documentation Clarity on Threads: The official documentation lacks clear guidance on the cost per thread. Questions about thread creation costs, management, deletion, and whether these can be controlled via the API remain unanswered.
Cost Analysis:
Expected Cost: Based on OpenAI’s pricing and official tokenizer, I calculated the expected cost for my usage as $26.07.
Incurred Cost: The actual cost tallied to $189.40, significantly higher than expected. This includes charges for failed attempts, which are not clearly outlined in OpenAI’s pricing model. The inflated costs were incurred mainly due to the re-retrieval of the document for every message and appending the whole conversation in the thread to the new messages.
I conducted a few preliminary tests before proceeding to a full run. In my initial tests, I sent a few messages without looping over the prompts. Since these early conversations were brief, the cost per message appeared minimal and didn’t raise any concerns. However, due to time constraints in my research, I soon progressed to looping over the prompt and wasn’t able to monitor the cost during the run. It was in this phase that the significant costs, previously unnoticed in the shorter tests, became apparent.
In summary, my experience with the Assistant API has been financially burdensome, contradicting OpenAI’s claims of cost-efficiency. The lack of transparency in pricing and the apparent hidden costs have made it challenging to continue the use of OpenAI’s GPT models.
Persistent guidance of new users and those that would propose making consumer facing products away from assistants is a wise choice. They don’t need to experience this for themselves.
The only case I could even make for assistants is in an internal company application where incredible costs can be absorbed alongside other $4000 per-seat licenses.
Thanks for the response. Glad to be part of the community. Looking at the documentation it’s not very clear if they were taking about the storage. I wasnt concerned about the storage anyway because my file size is under 50kb. Plus I even computed the cost with assuming the retreival feaure appended the pdf with each message and the costs still didnt add up. I dont see why providing a pdf would be useful if they are going to append the whole pdf to every messages in the thread.
Thanks for your resonse. So far due to numerous hidden costs and lack of details in the documentations, it only looks good on paper. If they actually provided more control of the retreival and the cotext window, it might be worth while. In its current state, I dont see it to be cost effective for any sort of company if the idea is to deploy GPT based app to a large audience.
I hope context windows management is updated soon. Casual usecases will not necessarily needs all chat history until 128k, its just expensive. I love assistant API and already use this for my small team, i just need more control.
I’ve been using the assistant API since the beginning too. Costs are way up and I’ve been getting a ton of failed runs lately. I will probably switch back tonight and wait until it’s a bit more mature.
I was just browsing the threads management page and noticed why I was getting failed runs. I don’t think the API was returning a proper error message. Will have to double check my error handling. All I was getting was a failed run error as I recall:
Rate limit reached for gpt-4-1106-preview in organization X on tokens_usage_based per day: Limit 500000, Used 497557, Requested 4096. Please try again in 4m45.638s. Visit https://platform.openai.com/account/rate-limits to learn more.
Thanks for elaborating on your perspective. I confirm your well described observations and frustrations. It is pretty absurd that the tokens for instructions and data are counted with each message. The way Retrieval is handled and charged today kills most business cases.