I’m trying to wrap my head around the total context window length. When I’m in ChatGPT, I know I can do a few pages of text before I lose the context and it stops remembering what I posted originally.
Is there a way to get that same context window size with the API? I get the impression that each API call is independent from the previous ones, so you lose the context window very quickly,
In my case, I have prompts that hover around 1K tokens and the complete response easily takes more than that.
Is there a way to feed the API with previous answers so it can reproduce the ChatGPT behavior? If so, where can I find examples/documentation for that?
The ChatGPT web portal is still using the API behind the curtain, it just manages the context lengths and strips text from the start of the buffer as new elements arrive in, it also leaves a gap for any newly generated contents. It still sends everything from scratch every time you press send, it just obfuscates that fact to produce a convincing chat experience.
Hmmm… OK, I’ll buy that. But from your description, it seems like ChatGPT has a way to break the 2K/4K limit imposed by the use of the API.
I’m pretty sure that (for now) the 8K window is sufficient for my needs. I checked some of the conversations I have and the longest one I found was under 8K. Hitting the 6K mark is pretty common though.
Is there a documented way to replicate that behavior?
Sure, you can set the API to any value up to the maximum context, so long as you leave room for your prompt.
Hmmm… OK, I was testing with the playground which limits everything to 4K. I didn’t try with separate calls in Python or something like that. Let me try that, then.