I can hit the API with a request and have an error like this barfed openai.error.InvalidRequestError: This model’s maximum context length is 4097 tokens, however you requested 6227 tokens (2137 in your prompt; 4090 for the completion)
And I can put that exact same request in the web interface on chatgpt and get a successful output
I’d like to use the API as opposed to building some selenium jank - any solutions?
ChatGPT has automatic truncation of prompt text that is too long, the API will let you know when the prompt text is too long, but as a programmer you should handle errors such as this by performing your own truncation or summarisation method, you can use GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models. to calculate your text length in tokens to ensure you don’t go over the limit and handle that occurrence gracefully.
You should also do that when you do multiple requests in parallel as I just found out after some debugging why I sometimes get timeouts on 50 requests and sometimes I can do 100 without any problems.
I have a TPM (tokens per minute) limit of 120k on azure and I am sending code to the model for evaluations on a large catalog of weighted criteria.
And yeah the code snippets vary in size and so sometimes the token limit is reached with just a few requests.
So I have to take up to (my manually set) [process limit per minute] code snippets - generate the prompts for it and then calculate the overall needed reserved tokens (tiktoken calculated request tokens + max_tokens) and sum them up iterativly to see which snippets shoould be added to the evaluation stack.