We’ve been porting our Python-based chat application to the Responses API using the official openai wrapper package. Things went smoothly until we started testing with a prompt that does complex reasoning. We did the streaming responses.create() call that kicked off the reasoning-based response, and we never saw any response.output_text events. We did see other events, including reasoning-related ones.
After noticing that the amount of output tokens in the stalled response was always exactly 2048, we realized that we were hitting a default output token limit, but never seeing any output, because a large number of output tokens were being consumed by the reasoning process. When we increased max_output_tokens in the create() call (by a lot!), the process completed successfully.
My continued concern is that the response failed silently, as far as we can tell, when it hit the output token limit. Sometimes we’d get a response.completed event as if the response had completed successfully; at other times the connection would simply be dropped.
Unless there’s something we should be doing that we aren’t, may I suggest that some error event be generated when the output token limit is hit instead of just “going dark” or dropping the connection?
Thanks in advance for any guidance or feedback you might offer!