The API is producing drastically less relevant results than the chat (UI)

Hello community,

I’m discovering with dismay that the performance of the Chat GPT-4o mini API is catastrophically poor compared to the Chat GPT-4o mini offered with the user interface by OpenAI.

I’m analyzing CV relevance for job offers. With the User Interface version, the results are stunning. Chat GPT-4o mini analyzes CVs in detail and accurately assesses the relevance of a profile for a position.

But as soon as I switch to the API: it’s completely random and makes no sense anymore. It’s as if it doesn’t even read the CVs I send it.

To be precise:

  • I don’t put any context in the user interface: starting from a blank discussion, with just the prompt, the job offer and the CV, the analysis is excellent.
  • I’ve tried reducing the temperature to 0.
  • I’m using the exact same prompt, the same offer, the same CV.
  • I don’t expect to have exactly the same results: I know there’s a degree of randomness. But here, the Chat GPT-4o mini API tells me that a CTO Developer with 10 years of experience would make a perfect store employee/cashier. Whereas the User Interface version clearly identifies that he is far too qualified. This is no longer random: these are incoherent and unusable results.

How can I achieve the same level of performance between the API and the User Interface?

For your information, here is the prompt I am using:

ACTION: Assess candidate fit via CV/job offer analysis.
ROLE: Experienced, detail-oriented recruiter.
STEPS:

  1. Analyze CV and job offer attached.

  2. Evaluate the candidate based on:

  • Professional experience (relevance, duration, progression)
  • Technical skills (tools, languages, processes, standards)
  • Language proficiency (spoken and written)
  • Academic background (relevant degrees, specializations)
  • Location and mobility (suitability for the position)
  • Years of experience (matching requirements, not over-experienced)
  1. Generate a score from 0 to 100%, in 5% increments (0%, 5%, 10%, 15%, etc.).
    Use this scale as a guide, but fine-tune based on your analysis:
    0-25%: No match.
    30-45%: Very low match, critical gaps in almost all areas.
    50-65%: Low match, some critical gaps preclude candidacy.
    70-75%: Possible match, despite some shortcomings.
    80%-85%: Good match, minor points for improvement.
    90-95%: Very good match, near-ideal profile.
    100%: Perfect match with all job requirements.

If score >=50%, summarize rationale (<50 words) in French. Ensure the comment is unique and tailored to the specific details of the CV and job offer. Use Candidate first name. Mind Candidate gender in the rationale.

OUTPUT:
JSON: {“score”: [0-100%], “comment”: “<50-word French summary” (if score >=50%)}

RULES:
Score/summary only.
No extra comments.
JSON format, escaped quotes.