Finetune model completion cut off too short

We’re running into a problem when calling the completion API endpoint for one of our fine-tunes. The same request when called in the playground for the same fine-tune and request parameters, returns a complete completion, but when calling via the API then we receive a cut off completion.

We’re aware of the prompt tokens + completion tokens limit of 2048 tokens so we have tried resolving this issue by reducing the length of the prompt and extending the max_tokens request parameter, but this doesn’t resolve the issue, and wouldn’t really make sense either, as our requests return the correct response when making the same request via the playground.

Is there perhaps a limit on API completions that we’re not aware of?

The reason why we have long completions for this particular model is because we’re packing our prompts and completions, our training file for this fine tune has the following format:

{"prompt": "Question 1\nQuestion 2\nQuestion 3\n...Question x\nAnswers:", "completion": "\nAnswer 1\nAnswer 2\nAnswer 3\n... Answer x\n\n###\n\n"}

We’ve used this format before, without running into the short completion cut off issue, but on our latest packed fine-tune, our answers are longer resulting in a much longer overall completion.

The reason why we are “packing” our calls is to reduce the number API calls made by our system and improve performance.

I’ve seen other people have the same issue. One question: are you passing all parameters? Such as TOP_P and the frequency penalties? I wonder if maybe there’s an implicit or default setting that you need to pass in the API call? One thing I do every now and then when I get inexplicable results is to click on the “View Code” button to compare what I’m doing wrong.

The last time I did this was when I realized that “engine” and “model” are different calls if you’re using finetuned models.

1 Like

Hi @daveshapautomator, we are specifying all the request parameters, but that wasn’t the issue after all…

  "error": {
    "message": "This model's maximum context length is 2049 tokens, however you requested 3693 tokens (1669 in your prompt; 2024 for the completion). Please reduce your prompt; or completion length.",
    "type": "invalid_request_error",
    "param": null,
    "code": null

This was the error message we received, turns out we just need to calculate the number of tokens on our request and expected response more carefully.

Thank you for the message.

1 Like

In my experience, just dividing the length of the string with 4.15 gives a very good estimate of the number of tokens for English language text (well below 5% error). The length includes spaces, punctuation etc.

I ran a tokenizer on random snippets of a large document and calculated the coefficient of the line of best fit :slightly_smiling_face:, forcing the bias term to 0.

We use this internally to estimate costs in BookMapp.

1 Like

hey @carla - was this issue ever resolved for you? I’m also running into the same problem - the openai playground is returning the full response, but the API’s response is too short and missing important details. I’ve also compared it with the result from clicking view code in the playground, but I’m doing everything specified in that result. Would you be able to help?

Hi @paulmbw,

In my case, I was calling a babbage fine-tune model with 1669 tokens in my request, but I had the max tokens for the response set to 2049 because I thought that was the highest allowed for babbage. The API added my input tokens plus my expected output tokens of 2049, saw that together they are greater than the limit (limit for babbage is 2049 when adding both input and output tokens) and i would get an error response. (Take into consideration that davinci model limits are double that of babbage.) The number of tokens can roughly be estimated as 1.5 tokens per word. If your request is 100 tokens, you can only set your max tokens parameter to 1949 at its highest. That said, it sounds like you have a different problem. If a completion stops too short, it could be that your max tokens is set too low, if you have not set it at all, it would use a default value which us something like 200 tokens, if I remeber correctly. Another reason i can think of is that you have a stop sequence set thats being generated in the wrong place. For example, you might use double new line as a stop sequence. GPT models are not particularly good at learning to use whitespace correctly, I always struggle fine tuning responses that should contain a certain number of new lines or tabs, then my models start incerting new lines and spaces in places I dont expect. Again, thats not really your problem either, I’m just mentioning this as your short response issue might be due to the stop sequence you’re specifying in your API request.

I hope this helps! If you’re still new, welcome! And enjoy, its exciting times for AI


Hi @carla,

Many thanks for the detailed and prompt response. It turns out I needed to play around with the settings (such as frequency penalty and presence penalty) - this did influence the output and I got it to a point where it was returning the entire response.

For context I’m using the text-davinci-003 model and I’m working on a side project that summarises customer feedback (so you don’t have to read through thousands of responses from customers to understand if they are happy/unhappy about your product :slight_smile: )

I’ll defo reach out with some more questions, I’m very much new to this space, so thank you!


hello I am running into the similar issue but without fine tunning, can you please share how you modified the settings.