Why Do Vision Models Count Correctly in UI But Not Via API?

I’m experiencing a persistent issue with object counting using OpenAI vision models:

The problem: When analyzing the exact same image with 28 coins:

  • ChatGPT UI (o4-mini, o3, GPT-4o): Consistently counts 28 coins correctly
  • API/Playground (o4-mini, o3, GPT-4.1): Always returns incorrect counts (25, 30, 35)

I’ve extensively tested various parameters in API calls:

  • Different temperature values (0-0.5)
  • All reasoning_effort settings
  • Adjusted max_tokens (10-4000)
  • Various prompting strategies
  • Stripped down system prompts to bare minimum

Despite identical images and near-identical prompts, the UI consistently succeeds where the API fails. Our backend uses openaiService.js with a standard system prompt that we’ve progressively simplified.

Has anyone else encountered this discrepancy between UI and API for vision counting tasks? Are there hidden UI parameters or different model versions being served?

That’s very interesting.

My guess the ChatGPT UI is providing some kind of middleware processing, as it often does. Counting and such if often a known issue with LLMs, so presumably they haven’t got it right in the actual model yet and are using some kind of middleware or additional tool call/reasoning solution within the webapp to overcome this?

1 Like