Earlier, I used chat completions for my application with the GPT-4-turbo model, and had <1$ costs a day. But due to the requirement of my application, I had to Use Assistant API with the GPT-4-turbo model, along with the File Search functionality. Since then, Costs Exploded and Continue To Do So ($8-10 per day).
The average level of cost/day clear went up since I use ‘retrieval’ but I see no direct link between the daily cost and how much I use the Assistant. Feels random. I had a peak usage of about 12$ (23/04/2024), due to which I suddenly started looking into this. I observed that it was due to enormously large number of context tokens (in the range of 1 Million) being fed to the model, in order to generate the response.
These context tokens are beyond my understanding. According to my use case, each request should not be companied by file search mechanism i.e. only specific user queries require retrieval from uploaded data and I have explicitly mentioned the criteria in the instructions prompt where it should. But still, I presume the entire context is being sent to each and every request. We have 0 transparency about behind the scenes for these context tokens and as a developer, when I am integrating all of these openAI’s resources in my application and experimenting, the expenses that I have to bear is too high. How can I think of putting this into Production with such enormous costs!
When I look into ‘my activity’ the high costs are all related to these mysterious ‘context tokens’.
Releasing something in beta mode while still having some issues is fine, but benefiting from this by charging testers unexplainable high amounts compared to normal costs is unjustifiable.
It’s a genuine request to OpenAI, kindly look into this issue and provide help and offer a solution/method such that I can control these costs. Also, would Like to hear if there’s something in progress such that we can have some control and transparency over these context tokens.