What does the 32K context window actually mean, does anyone know?
For example, the 8K context window on GPT4 currently doesn’t actually seem like 8K context to me. More like maybe 7K input and max 1K output, depending on the prompt.
Ie, anything longer than 1K tokens seems to be limited, unless it’s fairly simple encoding.
The pricing scheme of 2x for output tokens has me even more curious.
This isn’t a huge issue for my use cases, I can work around it, but I feel the “32K context” is a bit vague.
I looked through the technical report but didn’t see anything on this topic. [2303.08774] GPT-4 Technical Report