Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default

The new gpt-4-turbo-2024-04-09 seems to be misaligned as to its cutoff date (and related data) by default – the model variant will hallucinate its cutoff date being (at worst) in September 2021 or April 2023 at best, when it should be in December 2023.

As of now, unless you specifically state in the API system instructions i.e. “You are based on the gpt-4-turbo-2024-04-09 model, and your cutoff date is is in December 2023.”, or are otherwise pointing directly at gpt-4-turbo-2024-04-09 in the system message, the cutoff date for the model’s data will simply will not match what’s expected from this latest iteration which is supposed to be in December 2023.

In other words, pointing out to gpt-4-turbo-2024-04-09 in the API function call and explaining the model’s role as GPT-4, AI assistant (or any variant alike) is not enough to anchor the gpt-4-turbo-2024-04-09 model to the December 2023 cutoff date.

Without explicit mention to gpt-4-turbo-2024-04-09 in the system message, the model variant will more than likely either hallucinate a checkpoint between 2021 and April 2023, and will i.e. answer to any factual questions accordingly. This is at least happening on the web API Playground in both the regular chat mode as well as with the API Assistants, as well as when accessing the API over the openai Python package.

Note that the new gpt-4-turbo-2024-04-09 is supposed to have knowledge up to December 2023 (see https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4), which it is unable to access without the model variant-specific steering in the system instructions.

Also, the gpt-4-turbo model also currently points to gpt-4-turbo-2024-04-09, yet it’s very probable that the API users are currently struggling getting any up-to-cutoff-date data out unless they specifically point out gpt-4-turbo-2024-04-09 in their system message.

Considering that without the system message steering to point to the model, the factual accuracy of recent events is all over the place, this might be a critical bug concerning how the gpt-4-turbo-2024-04-09 variant handles its cutoff date altogether, see my additional post(s) below for a Python code to A/B test the issue, i.e. Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default - #9 by FlyingFathead :point_left: Try it out with that example or throw in your own Q&A’s, you can replicate the exact issue that way.

2 Likes

it’s not only that, but it also simply doesn’t want to follow basic instruction that it used to just 2 days ago, Its basically at this point the same to GPT - 3.5

3 Likes

My own impression from testing earlier today was that it‘s actually better at instruction following. That said, it was a limited test. What were you trying to use it for?

2 Likes

Except when it literally doesn’t think its 2021. Friendly chatbot doesn’t get the year from my system message:

You’re free to check its knowledge; even heir’s ages are right until a round of birthdays starting in two weeks.

A 0-shot response is only from training, and AI has a lot of that from prior chat. Followup to that screenshot chat, though: “What is your knowledge cutoff date that you know birthdays and ages?”

Oh, right, my knowledge is up to date until early 2023. So, um, if there are any birthdays or big events after that, I might not be totally up to speed. Why do you ask? Is there something specific you’re curious about?

It’s funny to see the inconsistency. If you ask a current question, it says it was trained until January 2023:

Spot the difference in the API Playground (note: reload the page between questions)

Link:
https://platform.openai.com/playground/chat?model=gpt-4-turbo-2024-04-09

vs.

I suspect this is what’s causing those initial bad benchmarks. You really need to tell the model its exact build in the system message.

This might be a deeper issue in the works? Many people who use the API might not necessarily “tell” the model its exact type in the system message separately, since they likely assume the model has a “grasp” on that upon i.e. specifying the model inside the API call’s model options.

Add “You are gpt-4-turbo-2024-04-09” to your system message and see what happens performance-wise. :slight_smile:

(PS. if someone can forward this bug to OpenAI’s staff as somewhat of a critical one, it’d be great. I’d surely classify this as something worth looking into?)

I have elevated this issue to GitHub, since this also affects Python API’s users. There’s a code snippet you can use to diagnose and A/B test the problem for yourself:

No, that’s not what you do. What the AI produced for you is a mistruth about its latest update.

In practice, you’d want to inject some better concrete information into a system message based on model documentation (and truthful reliable knowledge dates), which will inform the proper generation during other tasks, such as when not to hallucinate an answer, when to use your web search function, or when to schedule an appointment with a tool.

You are PilotBot, an expert AI assistant.

You have extensive knowledge pretraining, extending up to a cutoff date of April 2023.
This chat session started 2024-04-11 9:23am.
Time at current user input is 2024-04-11 11:22am.

In practice (where I can’t demonstrate on playground more sophisticated use)

Well, without explicitly steering the gpt-4-turbo-2024-04-09 model in the system prompt into being gpt-4-turbo-2024-04-09, it will not produce correct cut-off date data, you can try it out i.e. with the code I posted on the GitHub issue or here:

import os
from openai import OpenAI

# User question to be asked
user_question = "Is Cormac McCarthy still alive?"

# Instantiate the client with your API key
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Inform the user about the question being asked without model guidance
print("---")
print(f"Asking without specifying the model version in the system message: {user_question}")
print("---")
response = client.chat.completions.create(
    model="gpt-4-turbo-2024-04-09",
    messages=[
        {"role": "system", "content": "You are an AI assistant based on OpenAI's GPT-4."},
        {"role": "user", "content": user_question}
    ]
)

# Print the response
print(response.choices[0].message.content)

# Inform the user about the question being asked with model guidance
print("---")
print(f"Asking while specifying 'You are gpt-4-turbo-2024-04-09' in the system message: {user_question}")
print("---")
response_with_system_message = client.chat.completions.create(
    model="gpt-4-turbo-2024-04-09",
    messages=[
        {"role": "system", "content": "You are gpt-4-turbo-2024-04-09"},
        {"role": "user", "content": user_question}
    ]
)

# Print the response
print(response_with_system_message.choices[0].message.content)

(EDIT: I originally left the system message empty, but i.e. in the code above, it’s shown how the system message can instruct the model into being a GPT-4-based AI assistant and still output misaligned data [insert probabilistic model variations here] if the gpt-4-turbo-2024-04-09 is not mentioned)

Feel free to try it out. You can even change the system message to “You are a GPT-4 AI assistant” or whatever, and gpt-4-turbo-2024-04-09 still hallucinates being in April 2023 cutoff date at best.

Note that the new gpt-4-turbo-2024-04-09 is supposed to have knowledge up to December 2023 (see https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4), which it is unable to access (again, see the code example above) without the model variant specific steering in the system instructions.

I do wonder if it has something to do with the benchmark declines as mentioned i.e. here:

In developer experience terms, if it really is an “approach” is to have the API user pinpoint the model in the system prompt separately (even after pointing to the model in the API call itself), it differs quite a bit from what OpenAI’s competitors like Claude or Perplexity offer via their API by default.

When you specify a model sub-type like gpt-4-turbo-2024-04-09 in an API call, it’s (at least in my mind) a valid and logical expectation that this instruction alone should suffice for the model to utilize its most recent training data and knowledge cutoff, without further need for system prompt steering to pinpoint gpt-4-turbo-2024-04-09 separately for it to align to its latest cutoff date and its associated data.

API’s like Claude and Perplexity both seem to use the latest available data to deliver results when instructed to perform as an AI assistant without further system prompt steering on the API side, whereas gpt-4-turbo-2024-04-09 really seems to require pinpoint identification in the system message or it won’t anchor itself correctly to its proper cutoff date and hence can and produce i.e. outdated output by default.

The requirement for exact model identification within the system message (rather than relying on the selected model’s inherent knowledge parameters or i.e. simply to what model the API call is pointing to) could lead to the model sourcing information from an incorrect or outdated dataset, and that discrepancy might be a contributing factor to the latest model checkpoint’s reported sub-par performance.

Ideally, the latest model should always default to its latest data by default, and this doesn’t seem to be the case at the moment with gpt-4-turbo-2024-04-09. Again, feel free to use the code snippet above to do your own A/B comparisons with i.e. Q&A’s related to events that took place between Apr-Dec 2023 and come to your own conclusions. :slight_smile:

On my side it said “April 2024” the whole morning (I tried more than once as it seemed strange) until I read this post and I tried again. Now it says “April 2023”.

1 Like

Is there a way to check whether this is actually the updated model?

I’d say there’s no 100% way to be sure of anything the model says, ever. You may or may not remember the “persisting hallucination problem” that ChatGPT had for some days last December where it “thought” it was version GPT-4.5.

Especially if OAI’s own system prompt isn’t pinpointing and anchoring the model accurately with the system prompt, as what’s been discovered in terms of model behavior on the API side would imply, if it’s not system prompt-pointed to the exact checkpoint and hence to a cutoff date, it can and will be all over the place.

For instance, gpt-4-turbo-2024-04-09 (the latest version) should have a cutoff date of December 2023. https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4

In that context, if ChatGPT is telling you that its current cutoff date is April 2024, eh, I’d be skeptical of that. Ask if it has a bridge for sale in Brooklyn while at it. :slight_smile:

2 Likes

So strange… I wonder whether it switches between models based on traffic or whether it just doesn’t know.

1 Like

Tokens are randomly sampled from all possibilities. You’re basically not going to get the same thing twice out of ChatGPT. When it gets to fabricating numbers after gpt-4-, there’s 100k different things the AI could output, in proportion to how certain it is.

ChatGPT can even invoke internet search to get wrong answers about what model you get, which is not named and can be constant trials of different models under evaluation, by alternation, or by testing groups. Ask in Chinese, get a different answer.

So far, my experience in coding is quite similar - I’m unsatisfied with gpt-4-turbo and am sticking to gpt-4, for the rare moments I deem Claude incapable. However, for other sorts of text generation, turbo seems pretty decent at 1/3 the price of gpt-4.

That said, it’s a bit disappointing that the model still seems less capable than gpt-4 for any task I throw at it, and it’s been almost a year since the “golden era” of gpt-4. Something has clearly changed, for the worse, with gpt-4’s parameters or similar over the last couple months.

In a perfect world, there would be a premium model, and then there would be the everyday usage gpt-4-turbo. gpt-4 seems in the middle of those two spectrums.

I for one would have no qualms paying more. I feel like the API is already pretty inexpensive for intensive tasks. Maybe others have a different opinion.

Yes, and that’s exactly what I was after with the initial bug report – additional model anchoring is required in the system prompt at the moment.

See the code snippet I posted: above: Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default - #9 by FlyingFathead

Try it out for yourself as an A/B test, or just use the API Playground with the same kind of additional model steering. Again, specifically stating i.e. “You are gpt-4-turbo-2024-04-09” in the system prompt does all the difference in the world when it comes to the model being able to access the latest cutoff dates.

I think the problem you described about worsening output quality might revolve around that. It’s misaligned and doesn’t attach to the correct cutoff date, however you want to put it.

Since gpt-4-turbo now points to gpt-4-turbo-2024-04-09 as well, be aware that without further specification and model steering in the system message, it’s unable to access the later cutoff date (December 2023) properly.

That is more than likely contributing to the sub-optimal performance, even if you’re not actively asking about recent events between Apr-Dec 2023. It’s likely still tossing the coin towards its earlier checkpoints/cutoff dates.

In my view, that’s symptomatic of a larger issue and also might point to a larger model structure problem and the perceived diminishing performance on GPT-4’s newer checkpoint variants.

Also, even if someone contends that the cutoff date should be pointed in the system instructions (a.k.a. “deal with it”), I still maintain that the model should not require that kind of pinpoint steering separately in the system message for the model to actually utilize its latest cutoff data that it’s supposed to. It should not be a system prompt add-on requirement. It’s also not like that with other AI API providers (Claude, Perplexity, …)

Rather, it’s a rudimentary workaround at best – OpenAI is passing their internal version anchoring problems to their API clients or end user with the current implementation. I see this is as a bug, not as a feature, especially when the model variant is already being called from the API with the model selection.

1 Like

why i cant use gpt 4 turbo? i have chat gpt plus

ChatGPT Plus users has turbo model integrated. With other words, you can’t choose what model you want in the Chatgpt, it contains automatically the lastest model… If you wanna choose a specific model, you need to go on platform using the API

it does not automatically have the latest model, only if openai decides to update it

The biggest trouble right now for me (and many prople in discord) is knowing what model is chatgpt actually using.

They really need to start telling/showing us what its using in chatGPT.

Every model/version behaves differently and you need to learn all the quirks of each model.

Really annoying and makes chatGPT unreliable for me and makes me wanna use playground instead

He asked about chatgpt plus, this means chatgpt platform, not openai platform… In chatgpt platform you can’t choose the model, the Plus subscription has automatically the latest model version…