GPT-4 Turbo is lazy and truncates output arbitrarily

jwr · March 3, 2024, 1:44pm

I’m using the API for translations, and the newer GPT-4 Turbo models regularly just truncate replies with “…” at an arbitrary cutoff point (sometimes even breaking the JSON function_call parameters). I ask for translations of 55 strings and I get 29 back (and since I don’t know which 29, I have to throw everything out).

This is becoming a noticeable cost issue, as a significant percentage of my queries return unusable data and I have to re-query later with gpt-4-0613.

Is there a known workaround? A magic word that I can add to my prompts that will make the thing work harder?

jr.2509 · March 3, 2024, 1:55pm

Hi @jwr - welcome to the Community!

As regards the issue with GPT-4: Is your expected output, i.e. the translation of the 55 strings, within the output token limit? Technically, the output token limit is set to 4,096 tokens. However, in practice the model has been “trained” to produce fewer output tokens, e.g. in the magnitude of 900-1,200 tokens (equivalent to roughly 600-800 words)… This could be part of the reason for the cutoff in practice. The obvious solution would be to reduce the number of strings sent in one API call so the total translated text would consist of no more than 600-800 words.

If you are really just using GPT-4 for translation and you are concerned with pricing etc., then you might also want to consider a translation API instead. I have used the Azure translation REST API for the past year and am happy with the results I am getting. You get 2M chars for free. See this link. There are a few other options out in the market.

jwr · March 3, 2024, 7:22pm

My typical API calls:

usage: total 6244 prompt 2786 completion 3458
usage: total 6148 prompt 3092 completion 3056
usage: total 6409 prompt 2995 completion 3414
usage: total 6236 prompt 3172 completion 3064

But when GPT feels lazy, I’ll get things like:
usage: total 5646 prompt 3280 completion 2366
usage: total 3985 prompt 3294 completion 691
usage: total 3659 prompt 3313 completion 346
usage: total 4483 prompt 3209 completion 1274

Note that some of the truncations are really early. 346 tokens?

Of course I can reduce the number of strings sent. But that costs money, because then the overhead of my (fairly large) prompt with context is much more significant. And even this doesn’t give me any guarantees: I tried decreasing the query sizes, but the truncation still happens.

Amusingly enough, it happens for some languages more than for others. Nordic languages seem to do the worst (Finnish, Danish, Swedish).

And yes, I do need GPT-4 for this level of context-aware translation.

jr.2509 · March 4, 2024, 2:07am

What does your actual prompt look like (system message / user message)?

Topic		Replies	Views
GPT-4 Turbo Long response issues (Lazy ? Restricted to 1xxx tokens?) Bugs gpt-4-turbo	2	994	February 23, 2024
Simple Request exeeds Completion Token Limit of 4096 Token; GPT-35-turbo API gpt-35-turbo , api	3	1114	November 22, 2023
Impossible to generate texts of more than 600 words API	5	3219	December 18, 2023
Issues with Truncated Responses API	3	2083	April 22, 2024
Is it me or GPT4 consistently doesn't finish and cuts the answers? API	18	6494	April 11, 2024

GPT-4 Turbo is lazy and truncates output arbitrarily

Related topics