How are openai consumers generating large responses even when there is token limit on models?

vish30 · February 14, 2024, 4:37pm

How to generate large responses (20k tokens) for chat completions via stream?

GPT-4 maxes out at 8k tokens.

I see OpenAI consumers like perplexity doing this even with unique inputs (like a PDF) and generation based on the PDF content!

Foxalabs · February 14, 2024, 4:40pm

GPT-4-Turbo has an output of 4k, GPT-4 non turbo is 8k and GPT-4-32k has 32k of output context, the model is however trained to produce concise responses so it may take some work to get 20k of output reliably. Not an area I have done any work in, hopefully others have.

_j · February 14, 2024, 4:47pm

Step 1: have a business model that pays for such large model use.

GPT-4-32k costs? $0.12 / 1K tokens = $2.40 for just the response.

“Tokens” BTW is not character count. It is an efficient encoding method that approaches one token per word in English.

You can try out pasting some text and see how many tokens it actually consumes:

Topic		Replies	Views
GPT-4 128K only has 4096 completion tokens API gpt-4	9	27048	February 27, 2024
Maximum token allowed for chat gpt model gpt 3.5 turbo API chatgpt	3	2697	February 15, 2024
How do I get gpt to throw out more tokens in API? API gpt-4	3	2037	December 16, 2023
GPT 4 Turbo is limited to 4K? API gpt-4	16	14029	April 9, 2024
What is the maximum response length (output tokens) for each GPT model? API	6	40352	November 7, 2024

How are openai consumers generating large responses even when there is token limit on models?

Related topics