Hi everyone,
I’ve been exploring the gpt-4o-mini-search-preview
model recently, and I’m really impressed with its performance. However, I noticed that it currently doesn’t support streaming responses.
Does anyone know if streaming support is planned for this model, and if so, when it might be available?
Would love to hear from anyone who has insights or updates from the OpenAI team. Thanks in advance!
1 Like
_j
March 21, 2025, 9:09am
2
Now?
This is a Chat-Completions-only model. On Responses, you can use search as a tool (also less expensive as they are at the option of the AI).
I only had to remove top_p
from my streaming benchmark.
Stream:gpt-4o-mini-search-preview
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ChatCompletionChunk(id=‘chatcmpl-1234’, choices= , created=1742547098, model=‘gpt-4o-mini-search-preview-2025-03-11’, object=‘chat.completion.chunk’, service_tier=‘default’, system_fingerprint=‘’, usage=CompletionUsage(completion_tokens=1381, prompt_tokens=1852, total_tokens=3233, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))
For 2 trials of gpt-4o-mini-search-preview @ 2025-03-21:
Stat
Average
Cold
Minimum
Maximum
stream rate
Avg: 90.350
Cold: 92.4
Min: 88.3
Max: 92.4
latency (s)
Avg: 2.577
Cold: 3.1215
Min: 2.0329
Max: 3.1215
total response (s)
Avg: 18.071
Cold: 18.4752
Min: 17.6661
Max: 18.4752
total rate
Avg: 77.489
Cold: 76.806
Min: 76.806
Max: 78.172
response tokens
Avg: 1400.000
Cold: 1419
Min: 1381
Max: 1419
To report:
max_tokens=256 was ignored. That’s len(dots) == 1123 showing here, the chunk count (and more tokens in them to total 1381).
The internet retrieval on this model is not a tool under main AI control; you get billed for “web search tool calls” anyway, $0.03 extra for poems.
No cache, despite the same 1800 tokens input to activate it. Search injected into system likely breaks it.
1 Like
oh sorry i’m ganna try without top_p now thank’s for your info