How are openai consumers generating large responses even when there is token limit on models?

How to generate large responses (20k tokens) for chat completions via stream?

GPT-4 maxes out at 8k tokens.

I see OpenAI consumers like perplexity doing this even with unique inputs (like a PDF) and generation based on the PDF content!

GPT-4-Turbo has an output of 4k, GPT-4 non turbo is 8k and GPT-4-32k has 32k of output context, the model is however trained to produce concise responses so it may take some work to get 20k of output reliably. Not an area I have done any work in, hopefully others have.

Step 1: have a business model that pays for such large model use.

GPT-4-32k costs? $0.12 / 1K tokens = $2.40 for just the response.

“Tokens” BTW is not character count. It is an efficient encoding method that approaches one token per word in English.

You can try out pasting some text and see how many tokens it actually consumes:

1 Like