Response vs assistant api cost

Has anyone used both the Assistants API and the Responses API?
Is the Responses API more expensive just because it includes File Search?

My colleague and I can’t reach a consensus because we seem to be interpreting it very differently. To me, the Responses API just has two line items, with an additional File Search cost — but that doesn’t necessarily make it more expensive, because with the Assistants API, the input tokens will be larger since it has to process the file as input tokens on top of the user prompt.

I’m aware there is the Responses API without File Search, but we’re not ready to build our own vector store yet or handle RAG.

The answer is unlikely to be “nobody has used both” :slight_smile:

The way a tool call to file search works (when wanted by AI):

Turn 1:

user: price of widget x?
assistant: (internal tool call with query, which can also be multiple query strings)

Automatic Turn 2:

user: price of widget x?
assistant: (internal tool call with query, which can also be multiple query strings)
tool: 15000 tokens of chunks
assistant: I don’t have information about widgets.

(more turns by AI choice)

You are billed in input tokens for the retrieved data on both Assistants and on Responses. Responses just tacks on an additional fee per query (and for where the AI can repeatedly call internal hosted tools and send multiple queries at once, multiple fees.)

When using the same models and ensuring the same input and parameters, the operation and cost will be similar for limited-length conversation, apart from additional fee for the same service.

Important is that while Assistants has a working “truncation” parameter to limit the number of turns of chat history from a thread, Responses has no method for budgeting, and the the chat and tool results can grow to the maximum input of the model, greatly amplifying the costs if you use any of their server-side chat state products.

We used the Assistants API and switched over to the Responses API when it became available. Like you, we were disappointed to see that they are charging for file-search services in the Responses API when they did not do so in the Assistants API. I believe that the free file-search services in the Assistants API was essentially an oversight. And they are obsoleting the Assistants API next year, so you won’t have a choice. You could save a bit of money by delaying the transition. In our experience, the cost of the file-searches is relatively modest compared with the cost of the tokens. But, of course, that will depend on the nature of your application.

I should note that @_j is quite correct – and something we ran into – which is that the convenient response history chaining can lead not only to higher costs but to actual failures when the context grows beyond the max allowed. So be careful of that. It seems like a major mistake by the OpenAI developers. They should have some way to automatically truncate that history.

2 Likes

that is not what i said at all

What you said:

What was answered:

It is not more expensive just because of fees when your AI calls file search. It is also more expensive because of no chat history length management.

I should also mention that my understanding is that the Responses API charges you BOTH for the file_search action PLUS for the additional input tokens that are returned by the search and added into the context.

Does the responses API charge for a tool call, lets say I implement my own file search tool and use the responses API to trigger my tool call. My tool the does whats needed and submits the output. I understand that the input / output tokens would be charged, the question is would the tool call itself be charged additionally?

1 Like

Developer-created function that you service yourself have no additional charge.

Billed per AI usage, the AI model sending to the tool

  • web search (every internal instance)
  • file search (every internal instance)
  • code interpreter (auto container creation, 20 minute life)

Billed by your own API action

  • code interpreter (manual container creation)
  • file search vector store data consumption, priced by GB/day

They have yet to invent a charge for using your own functions or internet-based MCP tool functions.

1 Like

Thanks for clarifying this. It is helpful.

One more point to be aware of…

When you call your own tool (a function tool), this means that you will then need to call the model again with the results of calling the tool. You have to pay for the input and output tokens of each of these two generations. However, typically the output tokens from the first cycle are small because they are just reasoning plus tool selection and args. Also, if you only append the tool outputs to the same inputs from the first cycle, then the model is likely to use its cache for a lot of your input tokens on the second pass.

I mention this because it took us a while to understand that this 2-cycle generation (or more cycles if you call multiple tools sequentially) is really 2 generations and you have to pay for both.

2 Likes

Yes this was a little unclear in the docs, thanks for clarifying it.

1 Like

Exactly that’s is solution