Different results: ChatGPT3.5 vs API (gpt-3.5-turbo)

I’ve been running prompts in Chat GPT-3.5 and I was consistently getting pretty good results. I then coded up the same prompts into API calls, and the results have been underwhelming. I’ve played with the frequency and presence between 0 & 0.8, while leaving temperature at 0.7. The assumption was that the API model for gpt-3.5-turbo would operate like I was using the UI for chat GPT. Maybe I’m doing something wrong?

The scenario: In the UI for chatGPT, I set the role, ex “you are taxGPT - an IRS tax assistant model trained by OpenAI. You are very familiar with answering tax questions about small businesses”. Then I send in the prompt right below that. Pretty good completions & provides a little background as to why it gave that specific answer.

Now in the API call, I set the system role, set the user prompt, and test a few variations of frequency & presence. The return from the API seems less creative, and provides less explanation as to why it’s giving me it’s answer.

Thoughts / suggestions?

13 Likes

Same experience here. The Chat/browser (meaning chat(dot)openai(dot)com) and API provide very different results to the same queries. Many queries that the Chat/browser can handle the API can’t even provide a response to.
For example asking the model to enrich a list of company data (stored in a json object) with additional metadata (all publicly available) the chat / browser returns the object in the same format completed with the missing data to good quality.
The API responds that since it is a language model it doesn’t have access to this type of data. I am using gpt-3.5-turbo but have also tried text-davinci-003 with the same result.
I am using temperature 0.7 which is suggested by ChatGTP to be the closest to what the Chat/browser is set to use.
I am surprised that there isn’t more information on this subject. Out of a list of queries where the Chat/browser could handle all of them, the API could handle none.

2 Likes

API doesn’t have access to browsing at the moment. You’ll need to include the data in your prompt.

1 Like

This isn’t about browsing access, rather relying on its internal dataset of knowledge. I’m not asking it anything where it would need to go external. Ex on the browser chat prompt, I have no plugins enabled, so it can’t go elsewhere for the information. The assumption here is that both the API and browser version use the same algos with the same dataset, but it doesn’t.

Browsing comment as for @Felix3 , which may be my misinterpretation that as the Browsing model since they mentioned publicly available data.

Can you share an example of a prompt? Also see if you have better results putting your system message as a user message. That is given more weight.

Ive had the same experience with differences in performance between the chatgpt interface and the api. My use case: I’m submiting text copied from a pdf with instructions on how to interpret it so it will return the data described in a json format. The chat interface does a great job, the api results are unreliable.

Im using the chatgpt 3.5 interface and the api 3.5 turbo model

2 Likes

I no longer have the original query that worked in the Chat interface but not in the API. However, after a lot of trail and error I managed to come up with a version (below) that also worked in the API. However, after approx. 10 times, the model again started to reply “I am sorry as a language model…”.
Note this was using the exact same query that first worked and then didn’t.
Here is the latest version. So should one assume that the reply from the API is quite random even for the exact same query?
I would have appreciated that the model can’t provide answers to these type of query but the point is that the chat interface does provide an accurate answer. Even tweaking the query in many different ways the chat interface still consistently provides a complete answer.

Query/prompt:
Here is a list of companies stored as objects in an array. Can you try to find publicly available information and add it to the following object keys: ‘Main industry’, ‘Sub industry’, ‘Description’. Please return the original object with the added data. Here is the original data:
{
“0”:{“cik_str”:1318605,“ticker”:“TSLA”,“title”:“Tesla, Inc.”, “Main industry” : “”, “Sub industry” : “”, “Description” : “”},
“1”:{“cik_str”:1046179,“ticker”:“TSM”,“title”:“TAIWAN SEMICONDUCTOR MANUFACTURING CO LTD”, “Main industry” : “”, “Sub industry” : “”, “Description” : “”},
“2”:{“cik_str”:1403161,“ticker”:“V”,“title”:“VISA INC.”, “Main industry” : “”, “Sub industry” : “”, “Description” : “”}
}

1 Like

Same for us with GPT4 in API vs chat.openai.com.
Dramatic differences in response quality.
Despite empty system message, empty conversation, exact same input.

3 Likes

I have the following experience with Chat GPT and I am very confused.
I started with the browser of ChatGPT-3.5 and gve it a prompt to list companies that have several crtira parameters. It gave me a two good lists of 5 companies (as a response to two prompts).
This was about a month ago.
Last week, I used the same prompts, and it responsed with: As a language model I don’t have access… (and so on).
So I tried the API, and after I tuned it I recived good replies.
But now, using the same prompts (that I received the good replies), it again give me this nonsense of: as a language model…).
How can I make it reliable?

1 Like

i have a similar experience for nl to sql. chatgpt on gpt3.5 gives correct answers while the api mostly gives wrong answers

1 Like

same experience here. web interface gives much better results.

openAI team, if you are monitoring this, please guide us on how to replicate the web results in API

2 Likes

same issue here, the API is not usable. Web Interface is x100 better.

2 Likes

This is very frustrating as people have started integrating this into their applications and inconsistencies in the answers (especially if the once from the API are incorrect) could cause serious business issues

5 Likes

Totally agree.
I used web interface to validate a POC for a customer. API is way less performant. It give erratic results even for the same input.

3 Likes

I have the same experience. In the chat, while using gpt3.5, the result I’m getting is totally satisfactory, but when I use the same model using the API, even though I’m giving way more context, the result is simply bad. It even seems to ignore things that I ask it. This can’t be configuration, this level of difference.

3 Likes

I’m getting the same experience. Trying to get the API to review java code. The CHatGPT 3.5 interface gives really nice results. The API give back nonsense. So I tried something like “explain cohesion for a 5 year old” in both. Again, the 3.5 web interface gives a good example and analogy to use with a 5 year old and the API does almost nothing. What am I doing wrong. Response examples below:
ChatGPT3.5:
Sure! Imagine water molecules as tiny friends holding hands. They really like to stick together and stay close, just like friends do. This sticking together is called cohesion.
Water molecules have a special way of holding hands because they have something called “hydrogen bonds.” These bonds make them stick to each other really well, like they’re playing a game of holding hands and don’t want to let go.
So, because of cohesion, water stays together and forms puddles, rivers, and even big oceans. It’s like a big team of water molecules working together, always holding hands, and not wanting to be apart.

API Response:
If asked what cohesive force is, you can explain that it is the attraction of water molecules that holds water in the liquid state. so experimenting with those fundamentals may be the best forward for your five yo.

1 Like

I am experiencing this as well: 1) ChatGPT-3.5-turbo via API gives far lower quality results than the web interface, 2) it flat-out ignores some of the instructions I give it, and 3) it is much slower, taking 50s to respond to something that the web interface responds in < 20s.

Since I’m paying for compute time, this matters to me; why would I use the cheaper model if it is slower, and therefore ends up costing me more, than the expensive model?

2 Likes

Same issue here. Is this an alignment issue? Some guidance here would be appreciated. At least letting us know if this is intentional and here to stay.

2 Likes

Agreed. Or if we’re “using it wrong” that would be nice to know too!

It’s just so strange that I run my prompts on the web interface and it’s 100% accurate 100% of the time, and with the API I cannot get it to give me non-garbage answers – and yet I pay for that.

@OpenAI, some guidance here please? :slightly_smiling_face:

2 Likes

Have you tried using the API Playground? What model/settings/system message are you using? In my experience, the difference usually comes down to needing better prompting for the API…

1 Like