Streamed response truncating under token limit

ofer2 · May 29, 2024, 3:01pm

I’ve read through several other threads that seem similar but, as far as I can tell, are different.

The steamed response that I am seeing is truncated in a way that prevents me from parsing the final message. So I cannot see the finish_reason.

payload:

1. frequency_penalty: 0
2. max_tokens: 2048
3. messages: [{role: "system",…}, {role: "user",…}, {role: "user",…},…]
4. model: "gpt-4o"
5. presence_penalty: 0
6. stream: true
7. temperature: 0.5
8. top_p: 1

response:

This is not happening every time but happens often enough to be a serious concern (apprx 1 in 5 attempts).

The output is shorter than the token limit.

Apologies if a duplicate thread exists.

Topic		Replies	Views
Under token limit, response is cut off regardless API	1	878	April 16, 2024
Large JSON Responses from Assistant API are truncated API json , assistants-api	5	1169	June 20, 2024
4096 completion token limit with gpt-4o. Using assistant streaming API API assistants-api , assistants-streaming , gpt-4o	0	230	July 20, 2024
Dropped tokens when streaming via PHP? API	8	1294	December 18, 2023
Gpt-4o-mini responses are being cut off Community gpt-4 , gpt-4o-mini	1	97	January 28, 2025

Streamed response truncating under token limit

Related topics