Has anyone used both the Assistants API and the Responses API?
Is the Responses API more expensive just because it includes File Search?
My colleague and I can’t reach a consensus because we seem to be interpreting it very differently. To me, the Responses API just has two line items, with an additional File Search cost — but that doesn’t necessarily make it more expensive, because with the Assistants API, the input tokens will be larger since it has to process the file as input tokens on top of the user prompt.
I’m aware there is the Responses API without File Search, but we’re not ready to build our own vector store yet or handle RAG.
The answer is unlikely to be “nobody has used both”
The way a tool call to file search works (when wanted by AI):
Turn 1:
user: price of widget x?
assistant: (internal tool call with query, which can also be multiple query strings)
Automatic Turn 2:
user: price of widget x?
assistant: (internal tool call with query, which can also be multiple query strings)
tool: 15000 tokens of chunks
assistant: I don’t have information about widgets.
(more turns by AI choice)
You are billed in input tokens for the retrieved data on both Assistants and on Responses. Responses just tacks on an additional fee per query (and for where the AI can repeatedly call internal hosted tools and send multiple queries at once, multiple fees.)
When using the same models and ensuring the same input and parameters, the operation and cost will be similar for limited-length conversation, apart from additional fee for the same service.
Important is that while Assistants has a working “truncation” parameter to limit the number of turns of chat history from a thread, Responses has no method for budgeting, and the the chat and tool results can grow to the maximum input of the model, greatly amplifying the costs if you use any of their server-side chat state products.
We used the Assistants API and switched over to the Responses API when it became available. Like you, we were disappointed to see that they are charging for file-search services in the Responses API when they did not do so in the Assistants API. I believe that the free file-search services in the Assistants API was essentially an oversight. And they are obsoleting the Assistants API next year, so you won’t have a choice. You could save a bit of money by delaying the transition. In our experience, the cost of the file-searches is relatively modest compared with the cost of the tokens. But, of course, that will depend on the nature of your application.
I should note that @_j is quite correct – and something we ran into – which is that the convenient response history chaining can lead not only to higher costs but to actual failures when the context grows beyond the max allowed. So be careful of that. It seems like a major mistake by the OpenAI developers. They should have some way to automatically truncate that history.
I should also mention that my understanding is that the Responses API charges you BOTH for the file_search action PLUS for the additional input tokens that are returned by the search and added into the context.
Does the responses API charge for a tool call, lets say I implement my own file search tool and use the responses API to trigger my tool call. My tool the does whats needed and submits the output. I understand that the input / output tokens would be charged, the question is would the tool call itself be charged additionally?
When you call your own tool (a function tool), this means that you will then need to call the model again with the results of calling the tool. You have to pay for the input and output tokens of each of these two generations. However, typically the output tokens from the first cycle are small because they are just reasoning plus tool selection and args. Also, if you only append the tool outputs to the same inputs from the first cycle, then the model is likely to use its cache for a lot of your input tokens on the second pass.
I mention this because it took us a while to understand that this 2-cycle generation (or more cycles if you call multiple tools sequentially) is really 2 generations and you have to pay for both.