Does anyone know when GPT-4o-mini-search-preview will support streaming?

Hi everyone,
I’ve been exploring the gpt-4o-mini-search-preview model recently, and I’m really impressed with its performance. However, I noticed that it currently doesn’t support streaming responses.

Does anyone know if streaming support is planned for this model, and if so, when it might be available?
Would love to hear from anyone who has insights or updates from the OpenAI team. Thanks in advance!

1 Like

Now?

This is a Chat-Completions-only model. On Responses, you can use search as a tool (also less expensive as they are at the option of the AI).
I only had to remove top_p from my streaming benchmark.

Stream:gpt-4o-mini-search-preview
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ChatCompletionChunk(id=‘chatcmpl-1234’, choices=, created=1742547098, model=‘gpt-4o-mini-search-preview-2025-03-11’, object=‘chat.completion.chunk’, service_tier=‘default’, system_fingerprint=‘’, usage=CompletionUsage(completion_tokens=1381, prompt_tokens=1852, total_tokens=3233, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

For 2 trials of gpt-4o-mini-search-preview @ 2025-03-21:

Stat Average Cold Minimum Maximum
stream rate Avg: 90.350 Cold: 92.4 Min: 88.3 Max: 92.4
latency (s) Avg: 2.577 Cold: 3.1215 Min: 2.0329 Max: 3.1215
total response (s) Avg: 18.071 Cold: 18.4752 Min: 17.6661 Max: 18.4752
total rate Avg: 77.489 Cold: 76.806 Min: 76.806 Max: 78.172
response tokens Avg: 1400.000 Cold: 1419 Min: 1381 Max: 1419

To report:

  • max_tokens=256 was ignored. That’s len(dots) == 1123 showing here, the chunk count (and more tokens in them to total 1381).

  • The internet retrieval on this model is not a tool under main AI control; you get billed for “web search tool calls” anyway, $0.03 extra for poems.

  • No cache, despite the same 1800 tokens input to activate it. Search injected into system likely breaks it.

1 Like

oh sorry i’m ganna try without top_p now thank’s for your info