Chat+ GPT4 less smart than chat+ GPT3.5

ludopencil · September 23, 2023, 11:41am

I’m working on some informations extraction from a scientifc text with the CHAT+ gpt4.
with this sentence: “The first two dimensions concern evaluation novelty and valence”

GPT 4 recognize a single dimension :
“Assessment of novelty and valence”

And GPT 3.5, the two dimensions:
“-novelty: …
-valence:…”

That’s a significant downgrade ( compared to version i used a few month ago) and I wonder about the ability of open-AI to guarantee a certain level of service over a long time. Especially since I’m starting to integrate the API into some of my products in development.

curt.kennedy · September 23, 2023, 2:15pm

You could try using a pinned model in the API to prevent any breaking changes.

So for example, instead of calling gpt-4, you would call gpt-4-0613.

You would write your code around a pinned model, which would ensure stability.

Then before the pinned model gets deprecated, you hop to one of the later pinned models that won’t be deprecated anytime soon. So over time, expect to integrate newer models, which may require you to prompt differently (per model).

If the new model won’t work for you, then post about it here. If there’s no workaround, and OpenAI acknowledges this, they may choose to keep the older pinned model around longer.

If you still have these issues with a pinned model (after you’ve adjusted to it), then it’s something in your data, or a prompting issue. As the pinned models aren’t supposed to change. But, this in theory, would prevent the model unexpectedly shifting on you, and you can focus on any issues being a prompting/data issue.

_j · September 23, 2023, 2:43pm

They are the same thing, merely aliases. Unfortunately there is currently no alternate AI model with function calling, and there are three months of alterations in performance, past what the date reflects.

Call gpt-4, see gpt-4-0613 in your response.

The “pinned” model unlikely to get significant changes is gpt-4-0314.

This is contrary to versioning expectations. Better would be a dated permanent stable model, spun off with the monthly frequency like in the initial March announcement of chat endpoints, and a gpt-4-preview.

The largest change I notice since two week ago is of following a long list of system instructions in gpt-3.5-turbo-now. One might postulate that if improvements are made on the quality of user instructions in ChatGPT format with a simple system prompt, this reinforces for only simple system prompts. Or that efficiencies were necessary.

gpt-3.5 within ChatGPT can both improve upon and significantly reduce the quality of replaying past conversations, depending on the type. You also get diverse output from ChatGPT - it’s going to have a different type of output every time from the same input, like when you give a thumbs-down. gpt-4, though? It was nice to have known you.

curt.kennedy · September 23, 2023, 3:24pm

I am aware of this, but I don’t assume the model quoted in the response is correct.

You can only assume the model in your request is what you are running.

Let me give you a hypothetical real world example:

Suppose you build a giant AI company called GoodAI. You have a requirements document that states that all responses are filled with non-empty strings. Your engineering team rolls out a change, which could be a simple weight swap. It turns out, that no-where in the weights was a “model version”, as it’s just a big blob of binary data. But when a request comes in, with model version “good-4”, it uses this $LATEST set of weights, and reports back “good-4-0613” as the static value of the latest known model.

However, when you request “good-4-0613”, you are channeled towards those servers that only have the fixed “good-4-0613” weights.

So the point is, trust the request, not the response.

If you can show, via trending over time, that when using a pinned model in the request gives varying behavior, then I think you are onto something.

_j · September 23, 2023, 3:36pm

Of course I can show - but then I also can’t show, because the model has already been changed. Here’s about as clear as I can re-present

Real Model Name	Description
gpt-4-0314	Initial release (some refinements past date possible)
gpt-4-0613 (In June)	As used in June? No longer exists
gpt-4-0613 (Today)	Current, getting continued revisions

Undisclosed/auto-selected model	Description
gpt-4-0613 (with functions)	Model trained on using functions

Alias	Description
gpt-4	Points to currently recommended model
gpt-4 before June	Name for a model that no longer exists

source: (blog releases and continued use)

ludopencil · September 23, 2023, 5:38pm

@curt.kennedy Yep that’s what i planned to do.
But i mean, can’t we expect to have at least the same level of interaction quality between each iteration of the same model?
You see what i mean?
With the api or with the chat+ service.

If i take any software, the 1.10 version work “usualy” with the same quality level Minimum compared the 1.0. ( even if i have some example of the opposite)

Imagine if i was a car constructor, and chatgpt a wheel provider. Any quality loss of this importance, inside the production ,will be dangerous. That’s some kind of trust issue.

curt.kennedy · September 23, 2023, 5:50pm

I would have trust issues too!

These models are shifting over time, and over time they gain functionality, but can also lose functionality.

The only way to guarantee consistency, for long durations, without worrying much about deprecation, is to run your own private model.

I wish that OpenAI would have models laying static for years and years, for you to use for decades, but this space moves so fast and I doubt it’s cost effective.

curt.kennedy · September 23, 2023, 7:56pm

What you are showing is that the response is aliasing the floating update model with the pinned model.

So the only way to get the pinned model is to call it out in the request.

_j · September 23, 2023, 8:43pm

A “pinned model” version of gpt-4 beyond gpt-4-0314 doesn’t exist. That non-OpenAI language doesn’t accurately describe model invocation by API.

Stable model name = recommended model => -0613 <= continuing updates

I’ll let OpenAI’s text be the last word as we’ve both illustrated our notions and aren’t making progress. Back to “dumber AIs” .

New models

GPT-4

gpt-4-0613 includes an updated and improved model with function calling.

gpt-3.5-turbo-0613 includes the same function calling as GPT-4 as well as more reliable steerability via the system message, two features that allow developers to guide the model’s responses more effectively.

Model deprecations

Today, we’ll begin the upgrade and deprecation process for the initial versions of gpt-4 and gpt-3.5-turbo that we announced in March. Applications using the stable model names (gpt-3.5-turbo, gpt-4, and gpt-4-32k) will automatically be upgraded to the new models listed above on June 27th.

Developers who need more time to transition can continue using the older models by specifying gpt-3.5-turbo-0301, gpt-4-0314, or gpt-4-32k-0314 in the ‘model’ parameter of their API request.

(They’re trying to make it better - and I don’t know if they really want the feedback of which of these bots with attitude are best.)

curt.kennedy · September 23, 2023, 9:16pm

Well, that sucks. They should release multiple “stable” pinned models periodically IMO. It only makes sense for developers who rely on system stability.

Topic		Replies	Views
Gpt-3.5 / 4 model documentation Nov 7 2023 still has inaccuracies about "snapshots" Documentation api	2	4701	November 10, 2023
Has the reasoning ability of the GPT 3.5 API dropped recently? API chatgpt , api	9	1023	December 25, 2023
I got the GPT-4 API, but I the model version is still the snapshot version of GPT-4 API	12	2547	December 17, 2023
Just got access to GPT-4 but it responds like 3.5 API gpt-4	13	8053	July 8, 2023
Major Issues in new GPT 3.5 : DO NOT DEPRECATE OLD ONE API gpt-35-turbo , api	24	7484	January 21, 2024

Chat+ GPT4 less smart than chat+ GPT3.5

New models

GPT-4

Model deprecations

Related topics