our plugin calls often return 6k tokens in the json response.
chatGPT calls this function multiple times in a monologue.
We noticed that the beginning of chatGPTs output was remembered, but some text of the plugin response directly after that was not.
It seems as if the json response gets cut first, if the context is too long. Our experiments aren’t 100% clear on this. Can anybody else confirm this observation?
6k tokens as a response plus the prompt and any overhead with other text in the chain could be pushing you past the 8k limit and forcing data out of the rolling window buffer, first in first out.
2 Likes