Hello,
I am at Tier 1 of api usage. I have found that regardless of coming in well under a given model’s token limit (in this case gpt-3.5-turbo-0125, but happens for all), my responses are getting cut off. Measuring my latest input using the tokenizer is 290 tokens, and the response that I received was 295. The response was cut off mid-sentence.
This has occurred for me regardless of the model, and regardless of whether I set the max_tokens parameter or not.
Any ideas?