When creating a chat (like ChatGPT) using API, with every single API call I need to include the history of the conversation so that the AI answers correctly per the context of the full conversation. The question is, how far back do I need to go in including the “log” in the prompt? Do I need to include everything that was said since the start? Or can I just include the last few sentences? Has anyone had any experience with a good balance of how much back-log to include?
It depends on how much context you want. However, keep in mind the token limit which includes prompt and output.
There’s some code on github, I believe, that has most of the work done already…
Hope this helps.
-
You mean max_tokens? That only includes the output, NOT the prompt. (See the first code example here, max_tokens is set to 7, and onlyt the output is limited to 7, while the total tokens = 12)
-
Can you point to an example of a good resource? the only code example I’ve seen before kept a complete history (which would accumulate a huge amount of tokens if prompt tokens are paid for too)
Thanks for pointing the example out.
So my understanding now is that max_tokens controls the maximum tokens used in the completion part only
BUT the completion part + the prompt cannot exceed the model size (eg 2048 or 4096)
So this means if you ask a big question with lots of examples (ie a long prompt), you reduce the tokens that can be in the completion. This effectively limits the value in max_tokens to the remainder after the prompt is deducted (even if max_tokens accepts a value greater than this)
As a side note: Even though max_tokens controls the completion text length, we are billed for the tokens used for the prompt AND the completion added together