The API is producing drastically less relevant results than the chat (UI)

jobinlive13 · March 5, 2025, 3:22pm

Hello community,

I’m discovering with dismay that the performance of the Chat GPT-4o mini API is catastrophically poor compared to the Chat GPT-4o mini offered with the user interface by OpenAI.

I’m analyzing CV relevance for job offers. With the User Interface version, the results are stunning. Chat GPT-4o mini analyzes CVs in detail and accurately assesses the relevance of a profile for a position.

But as soon as I switch to the API: it’s completely random and makes no sense anymore. It’s as if it doesn’t even read the CVs I send it.

To be precise:

I don’t put any context in the user interface: starting from a blank discussion, with just the prompt, the job offer and the CV, the analysis is excellent.
I’ve tried reducing the temperature to 0.
I’m using the exact same prompt, the same offer, the same CV.
I don’t expect to have exactly the same results: I know there’s a degree of randomness. But here, the Chat GPT-4o mini API tells me that a CTO Developer with 10 years of experience would make a perfect store employee/cashier. Whereas the User Interface version clearly identifies that he is far too qualified. This is no longer random: these are incoherent and unusable results.

How can I achieve the same level of performance between the API and the User Interface?

For your information, here is the prompt I am using:

ACTION: Assess candidate fit via CV/job offer analysis.
ROLE: Experienced, detail-oriented recruiter.
STEPS:

Analyze CV and job offer attached.
Evaluate the candidate based on:

Professional experience (relevance, duration, progression)
Technical skills (tools, languages, processes, standards)
Language proficiency (spoken and written)
Academic background (relevant degrees, specializations)
Location and mobility (suitability for the position)
Years of experience (matching requirements, not over-experienced)

Generate a score from 0 to 100%, in 5% increments (0%, 5%, 10%, 15%, etc.).
Use this scale as a guide, but fine-tune based on your analysis:
0-25%: No match.
30-45%: Very low match, critical gaps in almost all areas.
50-65%: Low match, some critical gaps preclude candidacy.
70-75%: Possible match, despite some shortcomings.
80%-85%: Good match, minor points for improvement.
90-95%: Very good match, near-ideal profile.
100%: Perfect match with all job requirements.

If score >=50%, summarize rationale (<50 words) in French. Ensure the comment is unique and tailored to the specific details of the CV and job offer. Use Candidate first name. Mind Candidate gender in the rationale.

OUTPUT:
JSON: {“score”: [0-100%], “comment”: “<50-word French summary” (if score >=50%)}

RULES:
Score/summary only.
No extra comments.
JSON format, escaped quotes.

Topic		Replies	Views
Discrepancy in OpenAI API Response Quality Compared to Console Output API gpt-35-turbo , chatgpt , api	2	938	August 25, 2023
Inconsistency between chatgpt and openai API davinci 3 Prompting	5	945	March 3, 2023
The response from the OpenAI API is significantly off. However, I am receiving relevant results in the ChatGPT app. Bugs gpt-4 , chatgpt	0	321	July 23, 2024
Chat Completion API vs Chat Playground Settings API	3	3196	June 11, 2023
Differences in Results between Personalized GPT via Dashboard and Assistant API API assistants-api	0	897	January 29, 2024

The API is producing drastically less relevant results than the chat (UI)

Related topics