OpenAI Response API - Limit number of outputs

There is no specific control that covers what you likely experience.

However, what you experience is hard to understand, with the term “number of outputs” not referring to anything clear-cut about API model behavior.

The only think I can think you might be talking about is previously-seen issues where a structured output JSON is not immediately followed by a token to end the response creation, but instead, the model continues writing a second JSON object sometimes.

Otherwise, it could just be prompting technique vs a model not following along.

You can start with reducing top_p from its default of 1.00 to 0.10 and see if more “best choices” in token production get you more of the expected response style.