4096 response limit vs 128 000 context window

Hi Alwyn and welcome!

It’s very common that the output tokens remain significantly below the 4096 output token limit. ~800/900 words tend to be on the upper end of what the model returns in one API call. None of your proposed actions will change that.

@_j recently made a few good posts about the issue but I struggle to find them right now.

Edit: Here is one of the posts that speaks to that:

1 Like