since the pricing model is quite complex, and its not clear what was sent to the model for a particular thread run (system message+message history+retrieved content chunks), it’d be useful to surface how much the thread run costed (in tokens and $).
How about the input and output tokens consumed by every single step, placed right in the run object or step objects as metadata, with retrieval of the full AI input and generation of every step available.
Then a ledger download in usage with every single API call made by an assistant, with all associated IDs.
Or would the truth be dangerous to the entire project?
You wanted an API level token tracking system, and that is what is being discussed. I feel that you are moving the goalposts. I do not see any intentional hiding of information, however you are free to have that opinion.
“Would you be excited / would it be useful if we released usage tracking based on API key for the API?”
That is not useful, except for those that mistakenly think that OpenAI should be tracking their ten customer’s usage for them, or think OpenAI should be running ChatGPT for them in an iframe but they get a cut of the action.
OpenAI have obviously and with their own motivation made “usage” where you can only see use by model, and you can only see use by an entire day. And released it the same hour of devday as “assistants”. Giving me one “usage” per API key adds nothing, except I could make one call per day per key.
What IS useful is immediately seeing what is actually being consumed by an assistant run in real time by every step.
You make your own code replacement for every single feature of “assistant” (including its feature of not streaming), and you get the input and output token statistics of every single API call, and can log and diagnose the AI inputs and responses every iteration loop. You would see when one chat is totaling over $20.
OpenAI isn’t even forthcoming in describing how retrieval works, with their own tokens of “myfiles_browser” functions injected and iterating on scrolling through documents at your expense.
For me, I make sure I have not used the model I want to test that day, then I run my test and stop, then I wait for the API usage page to catch up, for me it’s usually about 10-15 mins and then I can see the token count used for that session.
For me, understanding the cost incurred by a conversation would be incredibly useful. I would think this means that the cost breakdown should live at the run object level.
Of course, if it was only available for each iteration of running a thread (i.e. becomes complex when the thread ‘requires action’), we could always have the cost for that individual thread run and copy that up into the metadata of the run object.
But all of this breeds complexity, it would be best if the answer was just in the run object.
However, the run objects are volatile (once ended) I think so you would need to monitor them and persist the results somewhere else.
In the current state of Assistants API it is usable for a quick prototype test. Integrating this into an application is not feasible because you don’t have control over the costs, history, data retrieval, and tools used.
For more complex use cases it is better to replicate the functionality within your application.