GPT-4 Turbo is lazy and truncates output arbitrarily

I’m using the API for translations, and the newer GPT-4 Turbo models regularly just truncate replies with “…” at an arbitrary cutoff point (sometimes even breaking the JSON function_call parameters). I ask for translations of 55 strings and I get 29 back (and since I don’t know which 29, I have to throw everything out).

This is becoming a noticeable cost issue, as a significant percentage of my queries return unusable data and I have to re-query later with gpt-4-0613.

Is there a known workaround? A magic word that I can add to my prompts that will make the thing work harder?

Hi @jwr - welcome to the Community!

As regards the issue with GPT-4: Is your expected output, i.e. the translation of the 55 strings, within the output token limit? Technically, the output token limit is set to 4,096 tokens. However, in practice the model has been “trained” to produce fewer output tokens, e.g. in the magnitude of 900-1,200 tokens (equivalent to roughly 600-800 words)… This could be part of the reason for the cutoff in practice. The obvious solution would be to reduce the number of strings sent in one API call so the total translated text would consist of no more than 600-800 words.

If you are really just using GPT-4 for translation and you are concerned with pricing etc., then you might also want to consider a translation API instead. I have used the Azure translation REST API for the past year and am happy with the results I am getting. You get 2M chars for free. See this link. There are a few other options out in the market.

My typical API calls:

usage: total 6244 prompt 2786 completion 3458
usage: total 6148 prompt 3092 completion 3056
usage: total 6409 prompt 2995 completion 3414
usage: total 6236 prompt 3172 completion 3064

But when GPT feels lazy, I’ll get things like:
usage: total 5646 prompt 3280 completion 2366
usage: total 3985 prompt 3294 completion 691
usage: total 3659 prompt 3313 completion 346
usage: total 4483 prompt 3209 completion 1274

Note that some of the truncations are really early. 346 tokens?

Of course I can reduce the number of strings sent. But that costs money, because then the overhead of my (fairly large) prompt with context is much more significant. And even this doesn’t give me any guarantees: I tried decreasing the query sizes, but the truncation still happens.

Amusingly enough, it happens for some languages more than for others. Nordic languages seem to do the worst (Finnish, Danish, Swedish).

And yes, I do need GPT-4 for this level of context-aware translation.

What does your actual prompt look like (system message / user message)?