Has anyone used both the Assistants API and the Responses API?
Is the Responses API more expensive just because it includes File Search?
My colleague and I can’t reach a consensus because we seem to be interpreting it very differently. To me, the Responses API just has two line items, with an additional File Search cost — but that doesn’t necessarily make it more expensive, because with the Assistants API, the input tokens will be larger since it has to process the file as input tokens on top of the user prompt.
I’m aware there is the Responses API without File Search, but we’re not ready to build our own vector store yet or handle RAG.
The answer is unlikely to be “nobody has used both”
The way a tool call to file search works (when wanted by AI):
Turn 1:
user: price of widget x?
assistant: (internal tool call with query, which can also be multiple query strings)
Automatic Turn 2:
user: price of widget x?
assistant: (internal tool call with query, which can also be multiple query strings)
tool: 15000 tokens of chunks
assistant: I don’t have information about widgets.
(more turns by AI choice)
You are billed in input tokens for the retrieved data on both Assistants and on Responses. Responses just tacks on an additional fee per query (and for where the AI can repeatedly call internal hosted tools and send multiple queries at once, multiple fees.)
When using the same models and ensuring the same input and parameters, the operation and cost will be similar for limited-length conversation, apart from additional fee for the same service.
Important is that while Assistants has a working “truncation” parameter to limit the number of turns of chat history from a thread, Responses has no method for budgeting, and the the chat and tool results can grow to the maximum input of the model, greatly amplifying the costs if you use any of their server-side chat state products.
We used the Assistants API and switched over to the Responses API when it became available. Like you, we were disappointed to see that they are charging for file-search services in the Responses API when they did not do so in the Assistants API. I believe that the free file-search services in the Assistants API was essentially an oversight. And they are obsoleting the Assistants API next year, so you won’t have a choice. You could save a bit of money by delaying the transition. In our experience, the cost of the file-searches is relatively modest compared with the cost of the tokens. But, of course, that will depend on the nature of your application.
I should note that @_j is quite correct – and something we ran into – which is that the convenient response history chaining can lead not only to higher costs but to actual failures when the context grows beyond the max allowed. So be careful of that. It seems like a major mistake by the OpenAI developers. They should have some way to automatically truncate that history.
I should also mention that my understanding is that the Responses API charges you BOTH for the file_search action PLUS for the additional input tokens that are returned by the search and added into the context.