Gpt4o not returning all indexes of the array despite prompting in various ways

I am sending array of strings delimited by a rare string to gpt. Like below
[#At_1# “string 1”, #AT_2# “string 2”…,]

I am sending specific instructions to creatively rewrite this data in the target locale. All this data is part of a user template. And it is very important that gpt returns all indexes since translated content will go back to UI at fixed indexes

But gpt randomly combine or split multiple indexes strings into 1 index, this reducing the number of indexes in the final output.

Need suggestions on how to enforce that all indexes be returned or I need to know which strings it combine or split to handle it at my end

1 Like

Could you explain more your case? We should get more details about the prompt engineering you did. Welcome to the forum! :slight_smile:

This is one of the point in the user prompt. Apart from this, there are other instructions as well. but for this point, below point has all the instructions.

\n5. [text] is an array that contains multiple strings.\n6. Each String is uniquely identified by a delimeter like #INDEX_n# where n is the array index. IMPORTANT: YOU MUST NOT CREATIVELY REWRITE OR MODIFY OR REMOVE THESE INDEX DELIMETERS.\n7. Each string followed by must be creatively rewritten separately and should be stitched back in the output array in the SAME ORDER as the input. \n8. Generate the content in a similar Array as provided in input ARRAY [text].\n9. Do not include any message or explanation in the output except for the creatively rewritten strings.\n10. DO NOT add newlines or any new information in the response if is not there in [text]. Keep the escape characters as it is.\n11. IMPORTANT: You MUST NOT COMBINE OR SPLIT multiple strings output in one index in the output array. REMEMBER: The output must be an array containing the same number of strings as in the [text] JSON array and in the same order as the corresponding source strings. DO NOT return the prompt in any case.\n12. IMPORTANT: ONCE THE OUTPUT IS GENERATED, YOU MUST ENSURE THAT IT IS A VALID JSON PROGRAMATICALLY.\n\n"

    prompt += "[text] =" + indexed_text_json + "\n[target_locale] =" + locale + ". "

system prompt looks something like below:
sys_prompt = “You are a helpful linguist tasked with creatively rewriting text from the provided JSON array named [text] into the target locale : " + locale +”.

For small arrays (~10 len), it works fine but with big arrays, it combines few indexes as per context. we want to enforce a strict check on this, such that each string is translated individually and returned as per source array in the same order.
Thanks for checking @allyssonallan

@Ric_2004 try to call the API with a temp like 0

import openai

openai.api_key = 'YOUR_API_KEY'

def translate_text(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful linguist..."},
            {"role": "user", "content": prompt}
        ],
        temperature=0,
        max_tokens=1000,
        stop=None
    )
    return response.choices[0].message['content']

# Example 
translated_text = translate_text(prompt)

A validation strategy plus chunking might fits:

import json
import re

def validate_output(original, translated):
    original_indexes = [match.group(1) for match in re.finditer(r'#AT_(\d+)#', original)]
    translated_indexes = [match.group(1) for match in re.finditer(r'#AT_(\d+)#', translated)]
    
    missing = set(original_indexes) - set(translated_indexes)
    extra = set(translated_indexes) - set(original_indexes)
    
    if missing:
        print(f"Missing indexes: {missing}")
    if extra:
        print(f"Extra indexes: {extra}")
    
    return not missing and not extra

# Example 
if not validate_output(original_text, final_text):
    # Handle inconsistencies
    pass

The chunking you can use with a json.dumps() strategy.

Please, check it out:

The issue is that we don’t want any mismatches… and it does not return all indexes. It just put i+1 in ith index which changes the string altogether in that particular index

Have you tried running your use case by GPT and asking it to write you a parser?