Gpt-3.5-turbo with n > 1 cuts off to the shortest completion

jamesyu · March 12, 2023, 8:02pm

I have verified that with streaming, it appears that using gpt-3.5-turbo with n > 1 will cut off all the completions to shortest length completion (to the exact same number of tokens).

All the indexes give the finish_reason to be stop as expected, but it leaves many of the completions mid sentence.

Has anyone else seen this?

Topic		Replies	Views
Gpt-4o-mini responses are being cut off Community gpt-4 , gpt-4o-mini	1	266	January 28, 2025
Chat Completions output cutting off without hitting max_tokens limit API gpt-35-turbo , api , token , gpt-0125	1	936	July 14, 2024
Does gpt-4o-mini-search-preview have a completion token limit of around 1530? Bugs	3	157	June 5, 2025
GPT-4 Turbo is lazy and truncates output arbitrarily API	3	1344	March 4, 2024
Streamed response truncating under token limit Bugs gpt-4 , api	0	189	May 29, 2024

Gpt-3.5-turbo with n > 1 cuts off to the shortest completion

Related topics