Finetune model completion cut off too short

We’re running into a problem when calling the completion API endpoint for one of our fine-tunes. The same request when called in the playground for the same fine-tune and request parameters, returns a complete completion, but when calling via the API then we receive a cut off completion.

We’re aware of the prompt tokens + completion tokens limit of 2048 tokens so we have tried resolving this issue by reducing the length of the prompt and extending the max_tokens request parameter, but this doesn’t resolve the issue, and wouldn’t really make sense either, as our requests return the correct response when making the same request via the playground.

Is there perhaps a limit on API completions that we’re not aware of?

The reason why we have long completions for this particular model is because we’re packing our prompts and completions, our training file for this fine tune has the following format:

{"prompt": "Question 1\nQuestion 2\nQuestion 3\n...Question x\nAnswers:", "completion": "\nAnswer 1\nAnswer 2\nAnswer 3\n... Answer x\n\n###\n\n"}

We’ve used this format before, without running into the short completion cut off issue, but on our latest packed fine-tune, our answers are longer resulting in a much longer overall completion.

The reason why we are “packing” our calls is to reduce the number API calls made by our system and improve performance.

I’ve seen other people have the same issue. One question: are you passing all parameters? Such as TOP_P and the frequency penalties? I wonder if maybe there’s an implicit or default setting that you need to pass in the API call? One thing I do every now and then when I get inexplicable results is to click on the “View Code” button to compare what I’m doing wrong.

The last time I did this was when I realized that “engine” and “model” are different calls if you’re using finetuned models.

1 Like

Hi @daveshapautomator, we are specifying all the request parameters, but that wasn’t the issue after all…

{
  "error": {
    "message": "This model's maximum context length is 2049 tokens, however you requested 3693 tokens (1669 in your prompt; 2024 for the completion). Please reduce your prompt; or completion length.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

This was the error message we received, turns out we just need to calculate the number of tokens on our request and expected response more carefully.

Thank you for the message.

1 Like

In my experience, just dividing the length of the string with 4.15 gives a very good estimate of the number of tokens for English language text (well below 5% error). The length includes spaces, punctuation etc.

I ran a tokenizer on random snippets of a large document and calculated the coefficient of the line of best fit :slightly_smiling_face:, forcing the bias term to 0.

We use this internally to estimate costs in BookMapp.

1 Like