Streamed response truncating under token limit

I’ve read through several other threads that seem similar but, as far as I can tell, are different.

The steamed response that I am seeing is truncated in a way that prevents me from parsing the final message. So I cannot see the finish_reason.

payload:

1. frequency_penalty: 0
2. max_tokens: 2048
3. messages: [{role: "system",…}, {role: "user",…}, {role: "user",…},…]
4. model: "gpt-4o"
5. presence_penalty: 0
6. stream: true
7. temperature: 0.5
8. top_p: 1

response:

This is not happening every time but happens often enough to be a serious concern (apprx 1 in 5 attempts).

The output is shorter than the token limit.

Apologies if a duplicate thread exists.

1 Like