Insights on ChatGPT Enterprise Using GPT-4-1106-Preview Based on Context Length Specifications

My understanding is that a lot of models’ context length is very vram bound - so it could very well be the same model, just running on cheaper hardware.

This could potentially also explain why OpenAI API charges by input tokens, and have an output limit; perhaps they delegate to specific nodes with specific hardware configurations by prompt length :thinking:

    flowchart
         GPT-4-->a
         a["tiktoken +4k"] -->8k
a-->16k
a-->24k
a-->32k
a-->...
a-->128k
3 Likes