Fairly similar results regardless of the parameters I pass to `client.responses.create()`

Hello all,
I’m trying to bridge the gap between the rich answers I get on the UI with web search to those from the API.
It seems like whatever I’m pushing to the API (gpt-4o but not just), I’m getting pretty similar results.
e.g.:

  • “search_context_size”: “high” / “medium” / “low”
  • “max_output_tokens” = 2000
  • different temperatures
  • different explicit instructions (e.g. “use at least 10 different sources”, “prefer reputable sources like Wikipedia, TechCrunch, Forbes, Wired, The Verge, CNET, Zapier, and similar”)
  • different variations of the prompt (sometimes crafted with chatgpt)

etc.

I even asked deep research for suggestions.

I typically get a response that’s c. 400-500 tokens, has exactly 5 annotations (almost always with repeats), low authority sources.

So there are two issues:

  1. seems like parameters don’t change much.
  2. API responses worse than the UI (even in temporary mode).

Would love some help and guidance!

Thanks!

example call:

client = get_openai_client()

prompt = "What are the best project management tools for startups in 2025?"

# instructions= "Provide helpful, trustworthy answers using high-quality sources. Return a response that's as close as possible to what a user would get with the same prompt when using ChatGPT UI, model 4o with web search"


instructions = """You are ChatGPT, an intelligent assistant developed by OpenAI. You are helpful, thorough, and professional. Always provide accurate, up-to-date answers. If a user asks a question involving recent developments or product comparisons, use the web search tool to find relevant, trustworthy sources. Cite those sources clearly using Markdown-style links (e.g., [TechRadar](https://www.techradar.com)) and avoid referencing unknown or low-quality sites.

When listing products, companies, or comparisons, ensure the content is neutral and balanced. Use a bulleted list when helpful. Do not invent information or cite fake sources. If the user asks for commercial or product recommendations, prefer reputable sources like Wikipedia, TechCrunch, Forbes, Wired, The Verge, CNET, Zapier, and similar.

If no sufficient web results are available, say so clearly and do not hallucinate.

Do not mention your tools or capabilities unless asked.
"""

response = client.responses.create(
    model="gpt-4o",
    tools=[{
        "type": "web_search_preview",
        "search_context_size": "high",
    }],
    tool_choice = "required",
    temperature=0.3,
    input=prompt,
    instructions=instructions,
    max_output_tokens = 2000,
    truncation="disabled"
)

Hi @edwinarbus … maybe you could help? :folded_hands:t2:

Also tagging @PaulBellow b/c it’s quite clear you’re the great oracle of this place, Paul, so maybe you would know :smiley: