I’ve read through several other threads that seem similar but, as far as I can tell, are different.
The steamed response that I am seeing is truncated in a way that prevents me from parsing the final message. So I cannot see the finish_reason
.
payload:
1. frequency_penalty: 0
2. max_tokens: 2048
3. messages: [{role: "system",…}, {role: "user",…}, {role: "user",…},…]
4. model: "gpt-4o"
5. presence_penalty: 0
6. stream: true
7. temperature: 0.5
8. top_p: 1
response:
This is not happening every time but happens often enough to be a serious concern (apprx 1 in 5 attempts).
The output is shorter than the token limit.
Apologies if a duplicate thread exists.