Hello there,
I’m experiencing sporadic errors with the beta.chat.completions.parse
endpoint when using gpt-4o-mini-2024-07-18
with structured outputs (openai v1.59.7, Python).
My setup:
- Temperature: 0.0
- System message: ~1000 tokens
- User message: ~5000 tokens (including few-shot examples to guide the model)
- Pydantic response format (note this is a dummy example and not the names of the real fields I’m using):
class OutputFormat(BaseModel):
output1: bool
output2: bool
output3: str
- Expected output size: 100-250 tokens
The fewshot examples I provide show the output format as something like:
{
"output1": True
"output2": False
"output3": <some text>
}
This works the majority. However very sporadically, I have started to observe an error:
raise LengthFinishReasonError(completion=chat_completion) openai.LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=5, prompt_tokens=6116, total_tokens=6121, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=6016))
It is very difficult to replicate the error, it is generally only appearing under a high load. I’ve also noticed that only 5 completion tokens are generated in these error cases, which appears to be just the first token up to and including the “output1:” text in the structured output.
Are there any ideas what could be causing this error? How could I even diagnose the output.