I have been using GPT’s API to code document clarity. I have run with multiple similar instruction set the same group of documents and always get variation between the documents in how they are rated. The ratings are generally pretty stable. This was October-December 2024. Now I am trying to run the same code and it assigns the same clarity rating to each document even though they are very clearly different and not even a month ago GPT could distinguish between them with these instructions. Does anyone know what could be going on?
@retry(wait=wait_exponential(multiplier=1, min=10, max=120), stop=stop_after_attempt(10))
def classify_document_clarity(document):
clarity_instructions = “”"
Role: You are an expert in evaluating the clarity of political policy statements. Your task is to rate how clearly the speaker’s own national or international policy positions are presented in the document, not their descriptions of others’ positions or the current state of policy. Take your time to think through your response.
Output only the final numerical score.
"""
try:
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": clarity_instructions},
{"role": "user", "content": f"Document:\n{document}"}
],
temperature=1
)
print(response)
clarity_rating = response.choices[0].message.content
print(clarity_rating)
return int(clarity_rating)
except openai.RateLimitError as e:
print(f"Rate limit error in classify_document_clarity: {e}")
raise # Reraise the exception for retry
except openai.APIError as e:
print(f"API error in classify_document_clarity: {e}")
raise # Reraise for retry
except Exception as e:
print(f"Unexpected error: {e}")
raise # Reraise for retry