Does output token limit increase by using stream=true?

ozan.adiguzel · August 20, 2023, 1:52pm

Does setting stream=true (using the streaming API) with chat/completion APIs increase the output token limit? For example, models like gpt-3.5-turbo have a 4k token limit for both input tokens and output completion tokens. Does each chunk of received data/message in streaming count thru total token limit of 4k, or each chunk is treated/counted separately and can have its own token limit. Thanks alot. I assume it is total limit since it doesn’t mention anywhere in api docs or couldn’t find any related information.

Foxalabs · August 20, 2023, 2:04pm

Hi and welcome to the developer forum!

Setting the stream parameter to true has no effect on the generated content or length, it is simply fed to you as it is generated rather than at the end.

jwatte · August 20, 2023, 4:30pm

The way these models work, the token limit is the size of the area-of-tokens in the GPU, where the model both reads previous text, and generate the next token. (and then next, and the next.)
For technical reasons, making this area big is quite expensive.
The generation of the 100th token needs as much previous context as the generation of the first token, so all the previous context and generated tokens count, no matter whether you stream them out, or receive them in a batch. And when you get to the end of the reserved token area, it’s a hard stop.

Topic		Replies	Views
Tokens limit gpt-3.5-turbo-0125 API token , gpt-0125	1	3654	February 15, 2024
Questions about the Output Token limit API chatgpt	2	388	May 17, 2024
Maximum token allowed for chat gpt model gpt 3.5 turbo API chatgpt	3	2652	February 15, 2024
Inputs tokens limit, data extraction API gpt-4 , gpt-35-turbo , api , token , rate-limit	2	4505	February 3, 2024
Subject: Issue with Token Limit for `gpt-4o-mini` Model in `v1/chat/completions` API Documentation gpt-4	3	1433	September 3, 2024

Does output token limit increase by using stream=true?

Related topics