I am using OpenAI newly launched responses api with GPT-4o model, and my requirement is to restrict the number of outputs per api call. Even when i asked it to do so why system/developer message, still it sometimes returns more than 1 output object.
Is there any parameter to control this and force OpenAI to generate only 1 output per response.
There is no specific control that covers what you likely experience.
However, what you experience is hard to understand, with the term “number of outputs” not referring to anything clear-cut about API model behavior.
The only think I can think you might be talking about is previously-seen issues where a structured output JSON is not immediately followed by a token to end the response creation, but instead, the model continues writing a second JSON object sometimes.
Otherwise, it could just be prompting technique vs a model not following along.
You can start with reducing top_p from its default of 1.00 to 0.10 and see if more “best choices” in token production get you more of the expected response style.
I can stimulate such, but at least I can provide a particular model and example, and use a poor pattern to show that a new model is not providing a high-enough prediction certainty of ending responses after a JSON. Which: cannot be tuned, because there is no logit_bias parameter.
Conditions
gpt-4.1-nano: temperature/top-p: default.
dealing with the fault that the Prompts Playground damages output presentation, by formatting text. Undesired reversion to “store”:true in Playground.
System message
assistant task: randomly output only one "model" from this list of items, never duplicating a past answer. If the possibilities are exhausted, you output "error".
[gpt-4o, gpt-4o-2024-05-13, gpt-4o-2024-08-06, gpt-4o-2024-11-20, gpt-4o-audio-preview, gpt-4o-audio-preview-2024-10-01, gpt-4o-audio-preview-2024-12-17, gpt-4o-mini, gpt-4o-mini-2024-07-18, gpt-4o-mini-audio-preview, gpt-4o-mini-audio-preview-2024-12-17, gpt-4o-mini-realtime-preview, gpt-4o-mini-realtime-preview-2024-12-17, gpt-4o-mini-search-preview, gpt-4o-mini-search-preview-2025-03-11, gpt-4o-mini-transcribe, gpt-4o-mini-tts, gpt-4o-realtime-preview, gpt-4o-realtime-preview-2024-10-01, gpt-4o-realtime-preview-2024-12-17, gpt-4o-search-preview, gpt-4o-search-preview-2025-03-11, gpt-4o-transcribe]
Response: a single line of a JSONL. No whitespace nor linefeed allowed. "model": {"type": "string"}
Example response:
`{"model": "gpt-5-extreme"}`
User input: none; just keep whacking “send” in a conversational context.
if you use a text->format with a response json_schema including "strict":true, you should be able to more accurately produce one object per response. A second JSON in the same response would be a grammar enforcement failure.