Different results: ChatGPT3.5 vs API (gpt-3.5-turbo)

I am trying to parse Job Descriptions into a structured format. Here is my JD:

GPT3.5 Web User Interface output Runtime: instant. A pretty well parsed output with minor mistakes.

{
  "job_title": "Senior Software Engineer - Python",
  "required_skills": ["Python", "Javascript", "AWS", "CI/CD", "Terraform"],
  "additional_skills": ["Typescript", "React", "Redux/MobX", "Kafka"],
  "required_competencies": ["Unit Testing", "Test-Driven Development", "Design Patterns", "Asynchronous Programming"],
  "additional_competencies": ["Event-Driven Architectures"],
  "required_experiences": ["Bachelor's degree in Computer Science or related field", "Solid experience with Python frameworks (e.g., Flask, FastAPI)", "Hands-on experience with AWS services"],
  "additional_experiences": ["Experience with serverless functions", "Experience in setting up CI/CD pipelines using GitLab"],
  "job_location": "Berlin, Germany",
  "work_setting": ["Hybrid", "Remote"],
  "languages": ["Fluent English (required)", "Additional languages a plus"],
  "education": ["Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience)"],
  "base_compensation": "Yearly gross salary (not specified in JD)",
  "additional_compensation": ["Equity", "ESOP", "VSOP", "Performance Bonuses"],
  "benefits": ["Study Budget", "Home Office Budget", "Paid Leave Days"],
  "company_differentiator": ["Innovative technology for easy travel", "Commitment to green technologies and sustainability"]
}

GPT3.5T 16k Assistants API output Runtime: 3.5mins!!! Very disappointing result

"job_title": "Senior Software Engineer (m/f/d) - Python",
  "required_skills": ["Python and Javascript", "configuring Datadog monitoring and observability"],
  "additional_skills": [],
  "required_competencies": [],
  "additional_competencies": [],
  "required_experiences": ["Python frameworks such as Flask or FastAPI"],
  "additional_experiences": ["AWS cloud services and resource management (SNS/SQS/S3/ECS/EC2/lambdas)"],
  "job_location": "Berlin",
  "work_setting": "Organize your own schedule",
  "base_compensation": "up to 60 days of working from",
  "company_differentiator": ["We celebrate diversity"]

The same prompts were used, ofc.

Any hints on how to fix this?
Use GPT3.5T 1106 or pure GPT3.5T?
It is very misleading

Hi @kychasticus
To gain a better understanding of different AI models capabilities, you might consider experimenting with tools like OpenAI Playground or LLM Spark. These platforms offer a hands-on experience that could help in determining the most suitable solution for your specific requirements.

Hey! I am using the Assistant functionality through Python SDK. The Playground is just a visual representation of Assistant API in my case.

Do you have any practical suggestions I can use to make the Assistant API responses match ChatGPT ones?

Using Assistant API we can not control much to get responses like ChatGPT. we have to rely on Clear Instructions and right Model.

Thank you so much! This seems to have worked for me! Same as you had problems. It has become much better, all that remains is to monitor from a distance

It’s a shame that there are 100 such topics, and no adequate response from the administration.

Tomorrow I will send your answer to other similar topics on the forum)

UPD Next day: I still get various incorrect results, but there are fewer of them. I continue to search for a solution

Same issue with me.

below is the query I’m sending -

{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
            "role": "system",
            "content": "This is a field in excel. So don't return explanation unless asked especifically. Return results only. Please check all the condition carefully",
        },
        {
            "role": "user",
            "content": "value is $100000. If value equals to 15000 then multiply by 3"
        }
    ],
    "temperature": 0.7
}

The problem is it fails to evaluate the condition also. It is giving wrong results.

Can someone let me know what I’m doing wrong here.

  1. Spreadsheets don’t have “fields”. They have rows, columns, and cells.
  2. Cells have values, or they have formulas.
  3. Formulas usually make reference to other cells. Formulas that don’t calculate from other data are fixed until modified and not that useful.

So:
1: what you show as user message cannot be anything other than a text cell;
2: the AI is unlikely to differentiate between returning explanations, returning results, or checking;
3: temperature will produce the chance of unlikely “wrong” responses.
4: AI don’t math good.

We can rectify this, even though a human can’t determine the truly correct answer desired.

Here is a system message:

You are ExcelAI. You perform spreadsheet calculations, providing resulting answers for all formula cells supplied in a list of cells. Your output must be the value of cells evaluated out-of-order by precedence of which calculation must be performed first in the spreadsheet cell list provided by the user. No chat, just calculated cells.

Here is a user input

A1: 100000
B1: =IF(A1=15000, A1*3, A1)
C1: A1/4

gpt-4-turbo response at top-p=.01

B1: 100000
C1: 25000

1 Like

came here to research why the web interface (for 4-Turbo) gives pretty good results, but the API is much less helpful. i guess there’s no solution, but ‘me too’ here for posterity.