I am using Completions API with text-davinci-003 model to implement a GPT-3 chat client in C# (using OpenAI-API-dotnet nuget package).
To ensure that the service generates a completion on the basis of the previous chat history (e.g. human instructions like 'in the following text, always replace ‘airplane’ with ‘aircraft’), to obtain each completion the chat client sends the entire chat history as a single prompt, in which previous GPT-3 completions alternate with previous human prompts separated by line breaks. This works fine, but when the chat history exceeds the token limit of 4000 tokens for text-davinci-003 model, I get 400 (Bad request) HTTP status code.
At the same time, it is not easy to get the token count for the text because the chat client can be used with any language and the help files say that the token size for languages other than English are very different (e.g. ‘Cómo estás’ in Spanish is 5 tokens long, while a similar two-word phrase in English is only 2 tokens long).
The playground at OpenAI website does not seem to have the token size limitation, i.e. one can continue the chat session indefinitely. Am I doing it incorrectly? Is it possible to let the service remember the previous conversation without sending the entire chat history? If the entire chat history must be sent in order for the service to take it into account, is there a reliable way to count the number of tokens in the request so that the chat history can be shortened accordingly (or flushed)?