This seems to be the conclusion I’ve come to as well. Am I right in thinking that we are billed for every token in every run step as well as every run status response? I’m not clear on what tokens I’m getting charged for, I just know that when I use 4 turbo it’s incredibly expensive and when I use 3.5 turbo it sucks at retrieval. Either I’m doing something very wrong or they are really going to have to restructure their pricing if they want people to build assistants at all, let alone put them into production.
Another revelation that is not documented but almost designed to burn up data: there seems to be a function for browsing files of retrieval by parts, like browsing a web page. Which means you get another way that the model runs again and again billing you for a loaded-up context each time.
I would prompt and function dump and document the whole thing. But I can save money by just recommending it be avoided and shut off.
If I upload a file, create an assistant with that file, ask a question, then delete everything, how much would I expect to be charged? Does time-based pricing round up to 1 day automatically?
@_j So we are creating new threads every so often to mitigate this but even still we are being charged more than the token consumption. Have you heard anything about being charged more than token consumption?
Yes, you are going to be charged unseen and unknown amounts. OpenAI went out of their way to return no token usage and no report of the internal language used calling functions, and then combined that with a new usage page that removes the prior by-the-minute token usage.
You write an “instruction” to guide the AI. Is that wrapped in hundreds of tokens telling the AI what it can and cannot do with a “user” instruction provided there, like custom instructions in ChatGPT?
You upload a file. Does that come with 500 tokens of instruction and functions telling the AI to recursively browse little chunks of the file? (answer: yes it does, a myfiles_browser function that is not documented as an expense that is allowed to operate recursively “browsing” your uploaded files.)
You enable code interpreter. More unseen system instructions telling it how to act and how to call and output. Throw in a few more instructions to iterate until it’s sure?
And then it calls one of those functions you didn’t program yourself, and it continues to loop because of a faulty context system that doesn’t let the AI fix its errors…while having threads loaded up to the maximum of the model’s context, retrieval loaded up to the maximum of the model’s context (which is exactly what is described in the documentation).
All that making a “test, are you there” message potentially approach $1.
Until OpenAI addresses multiple issues, all which are at the developer’s expense, I would avoid this system, and look at developing your own RAG solution for “answer about my documents”.
Ya after some testing, I found the costs can vary wildly too. I do appreciate how the “Assistant” create its own steps, but the downside to this is the economics and the step polling. I was able to get my example running smooth, but there are quite a few steps to get the messages and tool outputs to display on the front end.
Of course this can all be recreated using the original chat completions api where you can better control costs. I didn’t even know they were not charging for charging to Code interpreter yet, so that is another cost to add.
Yes tough to justify…
I wrote this a week ago: Assistants API and RAG - Best of Both Worlds?
With just a wee bit more control over the Assistants API (where we could optionally deploy in a RAG scenario) I see plenty of upside.
But, I think at this point, OpenAI is committed not to selling the picks and shovels, but leasing them out.
Probably easier to run GP4ALL locally… just sayin…