In a quest for clarity, I turned to GPT via API and asked about its version. The response left me questioning: is this the truth, a glitch, or am I overlooking something?
Seeking insights from the community: Can you shed light on the accuracy of GPT’s responses? Share your thoughts on this intriguing query!
This is not ChatGPT. This is the API. ChatGPT is the end-user web assistant product.
Whilst the latter uses the same models, ChatGPT constitutes all the interface and also bespoke prompting and function integration you can’t see.
gpt4o-mini
is not ChatGPT. You can’t make an equivalence.
Hmm…
It’s not really a hallucination, but it’s not necessarily true either. It’s just an artifact of the training method. Same as if you ask it when its training cutoff is. I haven’t investigated it, but it’s quite possible that if you look at the logprobs that the model isn’t particularly certain of that either.
These models all seem to be using more or less the same training data, OpenAI likely just adds new data on top of the old data instead of thoroughly curating it every time. That’s why you get responses more befitting of old models rather than the current model.
if you ask the old gpt-4 model on the playground, you get this:
not what I expected tbh.
It is quite odd, because if I recall correctly, the turbo line didn’t exist at that time. I guess they ninja edited the model again. But the general idea still stands.
I guess broadly speaking, you can sort of compare it to someone asking you how old you are shortly after your birthday.
Sorry, there was a typo, I meant to say, is GPT-mini is clone of GPT-3 or something?
I can make the model say it’s Anthropic’s Claude 3.5 Sonnet.
Would that make it Anthropic’s model? Nope.
Even if you get this from the API, without any system message, it’s a function of training data, among other things.
The model doesn’t “know” what it is, only what it’s trained on.
Here, My concern is whether the response of GPT-mini will surpass 3.5 turbo, as it is said to be trained on GPT-3?
Here’s a graph from the gpt-4o-mini
announcement page showing how it performs across various benchmarks.