O3-deep-research - 1 million tokens spent .. no output :(

In general the growth of our internal token usage over time is making me realize that there is no upper limit for this kind of stuff. It will continue to grow.
But I can also see that for stuff like deep research good prompting and guard rails are just as important or even more. Because now you get back pages of results that cannot be easily validated anymore.

1 Like

I mainly use API not ChatGPT for anything important… Spent WAY MORE on ChatGPT than API

It really is about asking the right questions ^^. And for us ‘cheaps’… At the right time

(Best results still from API but maybe the questions I ask there)

I have three live websites using the API. They are tightly and periodically optimised but still cost $. There’s no way around that.

1 Like

Yeah I’m a selfish nub who doesn’t run websites and just relays my knowledge locally (kinda weird I know)…

I used to build websites… Is that still where the world is going?

They should have sources, no?

This is a website (and PWA) … so yeah, I think websites are as relevant as ever. If anything I think apps are becoming less relevant and redundant. A lot of apps are just wrapped websites these days.

(this is super off topic now so very happy if these posts are moved)

Oooh… ‘Wasted thought’ I think it’s still on topic :smiley:

I am not thinking apps… Apps?

I only think people.

No disrespect… I know how hard websites are…

But this is not the mission statement of OpenAI…

‘web’ / ‘internet’ doesn’t even feature in the charter

https://openai.com/charter/?utm_source=chatgpt.com

Sure but there is no easy ‘evals’ to run on a 10 page report. You can of course let an AI read it :slight_smile:

1 Like

Yeah. Even checking one page summaries is time consuming

Can you break the problem down?

I likewise keep running into the same issue, even when using the o4-mini-deep-research variant instead of o3-deep-research. Even when I pass an additional “max_tool_calls = 10”, the issue keeps happening. Running the requests in background (background=true, store=true) does not seem to fix the issue. Feels as if the power of these deep research API requests is only available for the higher tier consumers rather than the individual developers. Will have to spend a bit more on the API-side if that would be the only fix :slight_smile:

1 Like

I submitted a Deep Research API query with an input prompt of only 165 tokens. However, the system ultimately reported around 2 million input tokens consumed.

Why is that? From my understanding, reasoning and response generation are accounted for in the output token calculation, not input. “Advanced processing like reasoning and analysis” is already reflected in the output tokens, so I don’t understand why the input token count is so high.
And also raised this question in separate branch:

1 Like

I’m getting the same problem. I had 4.5 million input tokens consumed for a prompt that was a hundred or so tokens, with nothing in the output.

does anyone know how to set something like a max input tokens? moreover, it only returned 8 sources for me, so i have no clue why 4.5 million tokens were necessary.

You set the max_tokens in the api call.

For deep research models, the closest thing is setting max_tool_calls.

> You can also use the max_tool_calls parameter when creating a deep research request to control the total number of tool calls (like to web search or an MCP server) that the model will make before returning a result. This is the primary tool available to you to constrain cost and latency when using these models.

2 Likes

had this exact same experience today: sent one test request, saw it was 1.7 million tokens, and my immediate reaction was “WTF, this makes no sense”.

from my understanding (as others have mentioned), the options as they exist today are:

  • use o4-mini-deep-research if you can get comparable output. it’ll consume the exact same number of tokens, but they cost less
  • use the Batch API (again, if feasible)
  • set max_tool_calls to limit the tokens used on web search and data ingestion

we should also expect the costs for these to come down over time as these models mature and new models are introduced. using o3-deep-research via API is almost prohibitively expensive for most use cases today unless you have a lot of token budget to burn, but that will likely change and follow the same patterns as GPT-4

In my case the initial problem was not having max_output tokens high enough. (Which I know sounds strange …)

Bit late to the party - but keep getting this error after it running for a couple of hours

"RuntimeError: Response failed: ResponseError(code=‘server_error’, message=‘An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID wfr_019ac5750fc07223ad10ea67ddb303b9 in your message.’)
"

In addition, I am finding that max_tool_calls doesn’t do anything - set to 5 (as a test) and we still get 100+, on the runs that do work!

Any ideas?