I’m working on some informations extraction from a scientifc text with the CHAT+ gpt4.
with this sentence: “The first two dimensions concern evaluation novelty and valence”
GPT 4 recognize a single dimension :
“Assessment of novelty and valence”
And GPT 3.5, the two dimensions:
That’s a significant downgrade ( compared to version i used a few month ago) and I wonder about the ability of open-AI to guarantee a certain level of service over a long time. Especially since I’m starting to integrate the API into some of my products in development.
You could try using a pinned model in the API to prevent any breaking changes.
So for example, instead of calling
gpt-4, you would call
You would write your code around a pinned model, which would ensure stability.
Then before the pinned model gets deprecated, you hop to one of the later pinned models that won’t be deprecated anytime soon. So over time, expect to integrate newer models, which may require you to prompt differently (per model).
If the new model won’t work for you, then post about it here. If there’s no workaround, and OpenAI acknowledges this, they may choose to keep the older pinned model around longer.
If you still have these issues with a pinned model (after you’ve adjusted to it), then it’s something in your data, or a prompting issue. As the pinned models aren’t supposed to change. But, this in theory, would prevent the model unexpectedly shifting on you, and you can focus on any issues being a prompting/data issue.
They are the same thing, merely aliases. Unfortunately there is currently no alternate AI model with function calling, and there are three months of alterations in performance, past what the date reflects.
Call gpt-4, see gpt-4-0613 in your response.
The “pinned” model unlikely to get significant changes is gpt-4-0314.
This is contrary to versioning expectations. Better would be a dated permanent stable model, spun off with the monthly frequency like in the initial March announcement of chat endpoints, and a gpt-4-preview.
The largest change I notice since two week ago is of following a long list of system instructions in gpt-3.5-turbo-now. One might postulate that if improvements are made on the quality of user instructions in ChatGPT format with a simple system prompt, this reinforces for only simple system prompts. Or that efficiencies were necessary.
gpt-3.5 within ChatGPT can both improve upon and significantly reduce the quality of replaying past conversations, depending on the type. You also get diverse output from ChatGPT - it’s going to have a different type of output every time from the same input, like when you give a thumbs-down. gpt-4, though? It was nice to have known you.
I am aware of this, but I don’t assume the model quoted in the response is correct.
You can only assume the model in your request is what you are running.
Let me give you a hypothetical real world example:
Suppose you build a giant AI company called GoodAI. You have a requirements document that states that all responses are filled with non-empty strings. Your engineering team rolls out a change, which could be a simple weight swap. It turns out, that no-where in the weights was a “model version”, as it’s just a big blob of binary data. But when a request comes in, with model version “good-4”, it uses this $LATEST set of weights, and reports back “good-4-0613” as the static value of the latest known model.
However, when you request “good-4-0613”, you are channeled towards those servers that only have the fixed “good-4-0613” weights.
So the point is, trust the request, not the response.
If you can show, via trending over time, that when using a pinned model in the request gives varying behavior, then I think you are onto something.
Of course I can show - but then I also can’t show, because the model has already been changed. Here’s about as clear as I can re-present
|Real Model Name
||Initial release (some refinements past date possible)
|gpt-4-0613 (In June)
||As used in June? No longer exists
||Current, getting continued revisions
|gpt-4-0613 (with functions)
||Model trained on using functions
||Points to currently recommended model
|gpt-4 before June
||Name for a model that no longer exists
source: (blog releases and continued use)
@curt.kennedy Yep that’s what i planned to do.
But i mean, can’t we expect to have at least the same level of interaction quality between each iteration of the same model?
You see what i mean?
With the api or with the chat+ service.
If i take any software, the 1.10 version work “usualy” with the same quality level Minimum compared the 1.0. ( even if i have some example of the opposite)
Imagine if i was a car constructor, and chatgpt a wheel provider. Any quality loss of this importance, inside the production ,will be dangerous. That’s some kind of trust issue.
I would have trust issues too!
These models are shifting over time, and over time they gain functionality, but can also lose functionality.
The only way to guarantee consistency, for long durations, without worrying much about deprecation, is to run your own private model.
I wish that OpenAI would have models laying static for years and years, for you to use for decades, but this space moves so fast and I doubt it’s cost effective.
What you are showing is that the response is aliasing the floating update model with the pinned model.
So the only way to get the pinned model is to call it out in the request.
A “pinned model” version of gpt-4 beyond
gpt-4-0314 doesn’t exist. That non-OpenAI language doesn’t accurately describe model invocation by API.
Stable model name = recommended model => -0613 <= continuing updates
I’ll let OpenAI’s text be the last word as we’ve both illustrated our notions and aren’t making progress. Back to “dumber AIs” .
gpt-4-0613 includes an updated and improved model with function calling.
gpt-3.5-turbo-0613 includes the same function calling as GPT-4 as well as more reliable steerability via the system message, two features that allow developers to guide the model’s responses more effectively.
Today, we’ll begin the upgrade and deprecation process for the initial versions of
gpt-3.5-turbo that we announced in March. Applications using the stable model names (
gpt-4-32k) will automatically be upgraded to the new models listed above on June 27th.
Developers who need more time to transition can continue using the older models by specifying
gpt-4-32k-0314 in the ‘model’ parameter of their API request.
(They’re trying to make it better - and I don’t know if they really want the feedback of which of these bots with attitude are best.)
Well, that sucks. They should release multiple “stable” pinned models periodically IMO. It only makes sense for developers who rely on system stability.