So Im wondering what the best way is to handle large amount og data with gpt-3.5-turbo? If i.e. store respons in an array would the best solution be to delete early responses to free up tokenspace for new responses?
And what happens when token limit is exceeded? If i send a message with an array that exceeds the token limit, will i get charged for the array send and not get a response, og will it just cancel the fetch?
I have the same questions.
As far as I know if you want to keep context in the GPT API you will need to append your prompts and the responses in an array and send the array with each ChatCompletion. Each response is a dictionary like
While the user input is:
As far as I know if the request has more tokens that supported it will dropped and you should not be charged, but someone should verify this.
If you encounter this limit you can technically remove the first element in the array to save on tokens and then try again but then you are losing some of the context. Alternativley I heard that maybe you can use another library or GPT API to summurize the context and shorten the length of the array that way.
If someone has a better idea I’d be glad to hear it.