Issue with API Giving Same Code to Very Different Documents Suddenly

huffmannicole9 · January 19, 2025, 10:03pm

I have been using GPT’s API to code document clarity. I have run with multiple similar instruction set the same group of documents and always get variation between the documents in how they are rated. The ratings are generally pretty stable. This was October-December 2024. Now I am trying to run the same code and it assigns the same clarity rating to each document even though they are very clearly different and not even a month ago GPT could distinguish between them with these instructions. Does anyone know what could be going on?

@retry(wait=wait_exponential(multiplier=1, min=10, max=120), stop=stop_after_attempt(10))
def classify_document_clarity(document):
clarity_instructions = “”"
Role: You are an expert in evaluating the clarity of political policy statements. Your task is to rate how clearly the speaker’s own national or international policy positions are presented in the document, not their descriptions of others’ positions or the current state of policy. Take your time to think through your response.

Output only the final numerical score.
"""
try:
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": clarity_instructions},
            {"role": "user", "content": f"Document:\n{document}"}
        ],
        temperature=1
    )
    print(response) 
    clarity_rating = response.choices[0].message.content
    print(clarity_rating)
    return int(clarity_rating)

except openai.RateLimitError as e:
    print(f"Rate limit error in classify_document_clarity: {e}")
    raise  # Reraise the exception for retry

except openai.APIError as e:
    print(f"API error in classify_document_clarity: {e}")
    raise  # Reraise for retry

except Exception as e:
    print(f"Unexpected error: {e}")
    raise  # Reraise for retry

Diet · January 20, 2025, 1:57am

Welcome to the community!

There’s a couple of things that come to mind:

Are you actually sending different documents? I had a similar incident where it turns out that due to a bug I was always sending the same document.
Did it actually work before? Looking at the prompt, Take your time to think through your response. combined with Output only the final numerical score. isn’t actually a useful instruction, unless it’s intended as a deceptive prompting strategy.
OpenAI often performs changes to the models. Using gpt-4o-mini as opposed to a fixed version, e.g. gpt-4o-mini-2024-07-18 is a little risky, because they might just swap out the model without you knowing. However, according to https://platform.openai.com/docs/models#gpt-4o-mini, it doesn’t look like they changed major model versions. That’s not to say that they don’t occasionally perform ninja tweaks without telling anyone anyways. So this particular issue isn’t something that would have been within your control.

What I would suggest, if it’s in your budget, is to use a CoT approach (perhaps using a json schema) that provides reasoning first, and a score later. This way it’s easy to “debug” “why” you’re not getting the response you’re expecting.

Topic		Replies	Views
Open AI APIs responses becoming random Community gpt-4 , api	3	845	April 28, 2024
GPT-4 becoming dumber sometimes, for a while API	7	2741	December 18, 2023
Has regular gpt-4 model changed for the worse by any chance? Community gpt-4 , hallucinations	12	1825	April 23, 2025
Gpt-3.5-turbo-0613 became useless API	9	2496	September 5, 2023
GPT3.5 Turbo downgraded suddenly? API	6	1602	November 14, 2023

Issue with API Giving Same Code to Very Different Documents Suddenly

Related topics