ChatGPT answers partially to request

I am using GPT to translate some texts, with the python API.
I am providing an input text in json format like this:

{
      "id": "id1",
      "text": "text1"
},
{
      "id": "id2",
      "text": "text2"
}

I am asking GPT to translate every text key of the objects, and to answer in a json format, using a response_format.

My input data is quite big up to 50k tokens. So I am using tiktoken to split my input data into 4096 tokens chunks.
I don’t understand why, but very often, GPT only partially answers to my request.

This is the CompletionUsage details when it does work:

CompletionUsage(completion_tokens = 4077, prompt_tokens = 4173, total_tokens = 8250, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens = 0, audio_tokens = 0, reasoning_tokens = 0, rejected_prediction_tokens = 0), prompt_tokens_details=PromptTokensDetails(audio_tokens = 0, cached_tokens = 3968))

This is the CompletionUsage details when it doesn’t work:

CompletionUsage(completion_tokens = 2178, prompt_tokens = 4147, total_tokens = 6325, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens = 0, audio_tokens = 0, reasoning_tokens = 0, rejected_prediction_tokens = 0), prompt_tokens_details=PromptTokensDetails(audio_tokens = 0, cached_tokens = 3968))

The finish_reason is indeed always stop

This is a sample code (I avoid providing the response_format as it very wordy):

client = OpenAI()
client.chat.completions.create (
    model="gpt-4o-mini",
    messages=[
        {
              "role": "developer",
              "content": "You are a professional translator in italian."
                         "You will be provided a json as input, in which you have to translate EVERY 'text' keys of the objects, and only these ones."
                         "Only answer with a json format like this: { 'id': 'Given id in the input json',  'text': 'translation1'}, { 'id': 'Given id in the input json', 'text': 'translation2' }"
        },
        {
              "role": "user", 
              "content": # sending a maximum of 4096 tokens chunks
              "{ 'id': 'id1', 'text': 'text1' }, 
               { 'id': 'id2', 'text': 'text2' }" 
        }
 ],
)

I am doing something wrong here ? Is this a prompt issue ?

You discover that the AI will often minimize the amount it produces. We know this is a decision OpenAI made to keep computation costs down, because AI models didn’t always act this way.

Especially gpt-4o-mini is powered very much by chat training with only a hint of original insight showing through. As this model is cheap. It doesn’t cost much more in repeated prompt to cut down even further, into the 800 token range. This will also improve the quality of the content you do get.

my real developer message is arround 300 tokens. So it means my data to translate should not exceed 500 tokens, which I find kind little. Isn’t there any way to make him free to use the maximum amount of token needed ? Or any other model more adapted to my case ?

Seems like you should have GPT create a python script that can help preprocess your data. You have the right mindset just the wrong approach given the constraints the model faces.

Asked GPT simply, couldn’t they just get GPT to create a python script that can help them achieve this better? (Shared your post) and here is what GPT supplied in response:

GPT 4o:
Yes, they could take a different approach by asking GPT to generate a Python script that performs the translation in smaller, manageable chunks instead of trying to process everything in a single API call.

The issue they’re facing is likely due to token limits and how GPT handles long responses. Even though the prompt fits within the 4096-token chunk size, the model may not always be able to complete the full response before hitting internal limitations.

A more reliable approach would be:
1. Preprocess the data: Use a Python script to load the JSON and break it into smaller batches of, say, 512 or 1024 tokens per request instead of the max 4096.
2. Make multiple API calls: Translate each batch separately and collect the results.
3. Reassemble the data: Merge all translated results back into a structured format.

Here’s a Python script to do this efficiently:

import json
import tiktoken
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI()

# Load input data (example format)
input_data = [
    {"id": "id1", "text": "Hello, how are you?"},
    {"id": "id2", "text": "This is a test sentence."},
    # Add more entries...
]

# Tokenizer setup for GPT-4o-mini
enc = tiktoken.encoding_for_model("gpt-4o-mini")
TOKEN_LIMIT = 3500  # Slightly lower than 4096 for safety

def count_tokens(text):
    return len(enc.encode(text))

# Chunking logic
batches = []
current_batch = []
current_tokens = 0

for entry in input_data:
    entry_json = json.dumps(entry)
    entry_tokens = count_tokens(entry_json)
    
    if current_tokens + entry_tokens > TOKEN_LIMIT:
        batches.append(current_batch)
        current_batch = []
        current_tokens = 0
    
    current_batch.append(entry)
    current_tokens += entry_tokens

if current_batch:
    batches.append(current_batch)

# Translation function
def translate_batch(batch):
    user_message = json.dumps(batch, ensure_ascii=False)
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a professional translator. Translate only the 'text' fields to Italian."},
            {"role": "user", "content": user_message}
        ],
        response_format="json"
    )
    
    return json.loads(response.choices[0].message.content)

# Process all batches
translated_data = []
for batch in batches:
    translated_data.extend(translate_batch(batch))

# Save output
with open("translated_output.json", "w", encoding="utf-8") as f:
    json.dump(translated_data, f, ensure_ascii=False, indent=2)

print("Translation complete. Results saved to translated_output.json.")

Why this works better:
• Avoids max token issues: The script ensures that no batch exceeds safe token limits.
• Handles large data sets: It efficiently processes even 50k+ tokens by breaking them into batches.
• Improves reliability: Instead of relying on a single API call, it spreads the workload, reducing the risk of incomplete responses.

If they want full control over translation quality and efficiency, this is the way to go.

Hope that helps!

Maybe ther is a misunderstanding, but this is already what I am actually doing. I am just using bigger chunks

It is not about the amount of input you place, it is about the AI’s training on how long it will write before it finds ways to finish up or close an output prematurely. So you can have 10000 tokens of document or past chat turns on the “input” side, it is just how long the assistant is writing any response before it starts wanting to visibly wrap up in the type of outputs where you can see the language change.

The AI will even compress the quality of individual tasks if you want a lot of list outputs or types of things done in one response.

You do not need to stick with the model that is going to be the worst. Try gpt-4o-2024-11-20, and also tell the AI in developer message it is a specialist AI trained for processing a thousand key elements of JSON in one response, or whatever convincing is needed about its ability to write longer (still cannot reasonably approach 4k at once).

1 Like

Thank you. I think I will reduce the amount of input data to 500 tokens, and see how it behaves before changing models.