In general the growth of our internal token usage over time is making me realize that there is no upper limit for this kind of stuff. It will continue to grow.
But I can also see that for stuff like deep research good prompting and guard rails are just as important or even more. Because now you get back pages of results that cannot be easily validated anymore.
I mainly use API not ChatGPT for anything important⌠Spent WAY MORE on ChatGPT than API
It really is about asking the right questions ^^. And for us âcheapsâ⌠At the right time
(Best results still from API but maybe the questions I ask there)
I have three live websites using the API. They are tightly and periodically optimised but still cost $. Thereâs no way around that.
Yeah Iâm a selfish nub who doesnât run websites and just relays my knowledge locally (kinda weird I know)âŚ
I used to build websites⌠Is that still where the world is going?
They should have sources, no?
This is a website (and PWA) ⌠so yeah, I think websites are as relevant as ever. If anything I think apps are becoming less relevant and redundant. A lot of apps are just wrapped websites these days.
(this is super off topic now so very happy if these posts are moved)
Oooh⌠âWasted thoughtâ I think itâs still on topic ![]()
I am not thinking apps⌠Apps?
I only think people.
No disrespect⌠I know how hard websites areâŚ
But this is not the mission statement of OpenAIâŚ
âwebâ / âinternetâ doesnât even feature in the charter
Sure but there is no easy âevalsâ to run on a 10 page report. You can of course let an AI read it ![]()
Yeah. Even checking one page summaries is time consuming
Can you break the problem down?
I likewise keep running into the same issue, even when using the o4-mini-deep-research variant instead of o3-deep-research. Even when I pass an additional âmax_tool_calls = 10â, the issue keeps happening. Running the requests in background (background=true, store=true) does not seem to fix the issue. Feels as if the power of these deep research API requests is only available for the higher tier consumers rather than the individual developers. Will have to spend a bit more on the API-side if that would be the only fix ![]()
I submitted a Deep Research API query with an input prompt of only 165 tokens. However, the system ultimately reported around 2 million input tokens consumed.
Why is that? From my understanding, reasoning and response generation are accounted for in the output token calculation, not input. âAdvanced processing like reasoning and analysisâ is already reflected in the output tokens, so I donât understand why the input token count is so high.
And also raised this question in separate branch:
Iâm getting the same problem. I had 4.5 million input tokens consumed for a prompt that was a hundred or so tokens, with nothing in the output.
does anyone know how to set something like a max input tokens? moreover, it only returned 8 sources for me, so i have no clue why 4.5 million tokens were necessary.
You set the max_tokens in the api call.
For deep research models, the closest thing is setting max_tool_calls.
had this exact same experience today: sent one test request, saw it was 1.7 million tokens, and my immediate reaction was âWTF, this makes no senseâ.
from my understanding (as others have mentioned), the options as they exist today are:
- use
o4-mini-deep-researchif you can get comparable output. itâll consume the exact same number of tokens, but they cost less - use the Batch API (again, if feasible)
- set
max_tool_callsto limit the tokens used on web search and data ingestion
we should also expect the costs for these to come down over time as these models mature and new models are introduced. using o3-deep-research via API is almost prohibitively expensive for most use cases today unless you have a lot of token budget to burn, but that will likely change and follow the same patterns as GPT-4
In my case the initial problem was not having max_output tokens high enough. (Which I know sounds strange âŚ)
Bit late to the party - but keep getting this error after it running for a couple of hours
"RuntimeError: Response failed: ResponseError(code=âserver_errorâ, message=âAn error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID wfr_019ac5750fc07223ad10ea67ddb303b9 in your message.â)
"
In addition, I am finding that max_tool_calls doesnât do anything - set to 5 (as a test) and we still get 100+, on the runs that do work!
Any ideas?
