Open AI charging too much for web searches?

I’m using the new OpenAI Agents SDK to build an agent which uses the web search tool.

You can see that I’ve done 21 web searches so far for which I’ve been charged over $2 for.

According to their pricing documentation this cost $35/1k tool calls, so that would give me a total of $0.74.

Am I missing something?

3 Likes

$0.035 per usage x 21 = $0.735.

$0.035 along with three internally iterative tool usages per call response or three parallel tool calls = $2.205. Which is what you show.

The context loading may be making multiple search tool calls to obtain its idea of a satisfactory answer. And/or that every input gets some sort of internet search done, even if this is external, AI powered, and that AI decides that an actual search is not necessary. This behavior would not be what is described anywhere.

Continued use of the model for things that might not direct an internet search get mounting search bills.

BUG 1

I made just one inquiry on chat completions with the web search model. “web search tool calls - gpt4o, med” in usage - $0.105 : confirmed higher price per use by 3x

The web search seems to be external context placement out of the AI’s control.

BUG 2

The chat completions search model is also having the complete text of file search - for vector stores - placed into the tools context even though it cannot utilize vector stores or tools other than functions - and the model does not allow any tools via Playground. It is the version of the tool that ChatGPT uses, citing automatic inclusions, not Assistants’s version of file search. Seemingly more tokens burned with distraction.

# Tools

## file_search

// Tool for browsing the files uploaded by the user. To use this tool, set the recipient of your message as `to=file_search.msearch`.
// Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool when the relevant parts don't contain the necessary information to fulfill the user's request.
...
Full tools listing of gpt-4o-search-preview-2025-03-11 on CC
# Tools

## file_search

// Tool for browsing the files uploaded by the user. To use this tool, set the recipient of your message as `to=file_search.msearch`.
// Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool when the relevant parts don't contain the necessary information to fulfill the user's request.
// Please provide citations for your answers and render them in the following format: `〖{message idx}:{search idx}†{source}〗`.
// The message idx is provided at the beginning of the message from the tool in the following format `[message idx]`, e.g. [3].
// The search index should be extracted from the search results, e.g. # 〖13†Paris†4f4915f6-2a0b-4eb5-85d1-352e00c125bb〗refers to the 13th search result, which comes from a document titled "Paris" with ID 4f4...

namespace file_search {

// Issues multiple queries to a search over the file(s) uploaded by the user and displays the results.
// You can issue up to five queries to the msearch command at a time. However, you should only issue multiple queries when the user's question needs to be decomposed / rewritten to find different facts.
// In other scenarios, prefer providing a single, well-designed query. Avoid short queries that are extremely broad and will return unrelated results.
// One of the queries MUST be the user's original question, stripped of any extraneous details, e.g. instructions or unnecessary context. However, you must fill in relevant context from the rest of the conversation to make the question complete. E.g. "What was their age?" => "What was Kevin's age?" because the preceding conversation makes it clear that the user is talking about Kevin.
// Here are some examples of how to use the msearch command:
// User: What was the GDP of France and Italy in the 1970s? => {"queries": ["What was the GDP of France and Italy in the 1970s?", "france gdp 1970", "italy gdp 1970"]} # User's question is copied over.
// User: What does the report say about the GPT4 performance on MMLU? => {"queries": ["What does the report say about the GPT4 performance on MMLU?"]}
// User: How can I integrate customer relationship management system with third-party email marketing tools? => {"queries": ["How can I integrate customer relationship management system with third-party email marketing tools?", "customer management system marketing integration"]}
// User: What are the best practices for data security and privacy for our cloud storage services? => {"queries": ["What are the best practices for data security and privacy for our cloud storage services?"]}
// User: What was the average P/E ratio for APPL in Q4 2023? The P/E ratio is calculated by dividing the market value price per share by the company's earnings per share (EPS).  => {"queries": ["What was the average P/E ratio for APPL in Q4 2023?"]} # Instructions are removed from the user's question.
// REMEMBER: One of the queries MUST be the user's original question, stripped of any extraneous details, but with ambiguous references resolved using context from the conversation. It MUST be a complete sentence.
type msearch = (_: {
queries?: string[],
}) => any;

} // namespace file_search

You are trained on data up to October 2023.
2 Likes

The cost for the web search tool are based on input and output tokens plus costs per session.

The dashboard screenshot you are sharing displays the number of searches only.

The table detailing the costs with ‘Search context size’ shows costs per tool invocation. You are likely asking for the number of tokens consumed by the model using the tool before providing the final answer.

You can find these values using the legacy dashboard via the activity tab.
I hope this helps!

The cost is not “per session” (like a code interpreter session that lasts an hour before re-billing you).

What is shown in screenshot is the usage page line-item for the separate use of web search tool - which is distinctly metered in “calls”.

Model Cost
gpt-4o or gpt-4o-search-preview low $30.00 1k calls
medium (default) $35.00 1k calls
high $50.00 1k calls
gpt-4o-mini or gpt-4o-mini-search-preview low $25.00 1k calls
medium (default) $27.50 1k calls
high $30.00 1k calls

Two usages:

But for those two calls, you receive the billing of six usages:

You are not wrong but the answer is incomplete. Below are the input and output tokens for the search tool from the same pricing page.

On the activity dashboard we can see the token usage, but not per tool.

When specifically using the gpt-4o-search-preview model, a model name specially added by OpenAI to provide internet search on Chat Completions without needing a tool specification (as you cannot use internal tools, only your functions), you’d be paying the model’s token pricing for the amount of input context results that allows the AI to answer.

However, that input consumption should be under the model name’s token billing in the usage page, not the search tool call invocation billings that would give you no clue about any token count or token consumption (no need to mention chat completions not billing internet tokens of input…).

The token consumption will also be returned in the API’s token usage object.

The “overbilling” per call seems far too consistent - an exact multiplier of “per call” - to be a result of dynamic web search size.

I also used the search tool once with the “Medium” setting and once with the “High” setting. The cost was $0.07 for one Medium usage and $0.06 for one High usage.

According to the pricing table, the cost per Medium usage should be $0.035 per call(since it’s $35 per 1K calls), and for High, it should be $0.05 per call (since it’s $50 per 1K calls).

The amount charged on the Usage page roughly matches the expected cost when using High, but it does not match at all when using Medium.

The cost for using Medium once was double the expected price from the pricing table.
At that time, I was only using the built-in search tool as a tool.

1 Like

In your opinion, does this mean if ResponseFunctionWebSearch is used and the response returns “usage=Usage(requests=1, input_tokens=499, output_tokens=449, total_tokens=948”

That we’d use the $35/1k calls number for the tool + $2.50/mill input + $10/mill output?

It doesn’t look like there’s a way in the agents SDK to change the model for search so assuming it’s defaults to 4o not mini.

2 Likes

I thought I’d again investigate the bill placed on a “clean” project from a usage of this model on chat completions.

(Asking the search model itself on “high” is useless):

But the intention was to get billed.

The cost impact now seems repaired - the expected $0.05 per “high” gpt-4o call:

The legacy usage page gives the model’s token consumption:

image
image

and the Playground’s report of that call:
image

Comparing favorably to the billing a week before of two calls:
image

(A day with other billing issues, like complementary data sharing tokens being billed on Responses.)