Fairly similar results regardless of the parameters I pass to `client.responses.create()`

alonn · May 2, 2025, 9:54am

Hello all,
I’m trying to bridge the gap between the rich answers I get on the UI with web search to those from the API.
It seems like whatever I’m pushing to the API (gpt-4o but not just), I’m getting pretty similar results.
e.g.:

“search_context_size”: “high” / “medium” / “low”
“max_output_tokens” = 2000
different temperatures
different explicit instructions (e.g. “use at least 10 different sources”, “prefer reputable sources like Wikipedia, TechCrunch, Forbes, Wired, The Verge, CNET, Zapier, and similar”)
different variations of the prompt (sometimes crafted with chatgpt)

etc.

I even asked deep research for suggestions.

I typically get a response that’s c. 400-500 tokens, has exactly 5 annotations (almost always with repeats), low authority sources.

So there are two issues:

seems like parameters don’t change much.
API responses worse than the UI (even in temporary mode).

Would love some help and guidance!

Thanks!

example call:

client = get_openai_client()

prompt = "What are the best project management tools for startups in 2025?"

# instructions= "Provide helpful, trustworthy answers using high-quality sources. Return a response that's as close as possible to what a user would get with the same prompt when using ChatGPT UI, model 4o with web search"


instructions = """You are ChatGPT, an intelligent assistant developed by OpenAI. You are helpful, thorough, and professional. Always provide accurate, up-to-date answers. If a user asks a question involving recent developments or product comparisons, use the web search tool to find relevant, trustworthy sources. Cite those sources clearly using Markdown-style links (e.g., [TechRadar](https://www.techradar.com)) and avoid referencing unknown or low-quality sites.

When listing products, companies, or comparisons, ensure the content is neutral and balanced. Use a bulleted list when helpful. Do not invent information or cite fake sources. If the user asks for commercial or product recommendations, prefer reputable sources like Wikipedia, TechCrunch, Forbes, Wired, The Verge, CNET, Zapier, and similar.

If no sufficient web results are available, say so clearly and do not hallucinate.

Do not mention your tools or capabilities unless asked.
"""

response = client.responses.create(
    model="gpt-4o",
    tools=[{
        "type": "web_search_preview",
        "search_context_size": "high",
    }],
    tool_choice = "required",
    temperature=0.3,
    input=prompt,
    instructions=instructions,
    max_output_tokens = 2000,
    truncation="disabled"
)

alonn · May 4, 2025, 2:56pm

Hi @edwinarbus … maybe you could help?

alonn · May 4, 2025, 3:01pm

Also tagging @PaulBellow b/c it’s quite clear you’re the great oracle of this place, Paul, so maybe you would know

Topic		Replies	Views
Why Does OpenAI's API Struggle to Match ChatGPT's Commercial Response Quality API gpt-4 , chatgpt , api	9	1442	May 1, 2025
ChatGPT's API returns worse web search results than it's web UI and it can't explain to me why API chatgpt , api , web-browsing , web-search	3	2291	May 24, 2025
Why are API GPT-4 search results so much worse than ChatGPT search results? API chatgpt	6	612	May 31, 2025
Ensuring gpt-4 Model is Used in v1/chat/completions Endpoint for Spreadsheet Integration API gpt-4	3	1506	February 8, 2024
Replicating the UI deep research results using API API	2	109	July 16, 2025

Fairly similar results regardless of the parameters I pass to `client.responses.create()`

Related topics