Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default

FlyingFathead · April 10, 2024, 8:03am

The new gpt-4-turbo-2024-04-09 seems to be misaligned as to its cutoff date (and related data) by default – the model variant will hallucinate its cutoff date being (at worst) in September 2021 or April 2023 at best, when it should be in December 2023.

As of now, unless you specifically state in the API system instructions i.e. “You are based on the gpt-4-turbo-2024-04-09 model, and your cutoff date is is in December 2023.”, or are otherwise pointing directly at gpt-4-turbo-2024-04-09 in the system message, the cutoff date for the model’s data will simply will not match what’s expected from this latest iteration which is supposed to be in December 2023.

In other words, pointing out to gpt-4-turbo-2024-04-09 in the API function call and explaining the model’s role as GPT-4, AI assistant (or any variant alike) is not enough to anchor the gpt-4-turbo-2024-04-09 model to the December 2023 cutoff date.

Without explicit mention to gpt-4-turbo-2024-04-09 in the system message, the model variant will more than likely either hallucinate a checkpoint between 2021 and April 2023, and will i.e. answer to any factual questions accordingly. This is at least happening on the web API Playground in both the regular chat mode as well as with the API Assistants, as well as when accessing the API over the openai Python package.

Note that the new gpt-4-turbo-2024-04-09 is supposed to have knowledge up to December 2023 (see https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4), which it is unable to access without the model variant-specific steering in the system instructions.

Also, the gpt-4-turbo model also currently points to gpt-4-turbo-2024-04-09, yet it’s very probable that the API users are currently struggling getting any up-to-cutoff-date data out unless they specifically point out gpt-4-turbo-2024-04-09 in their system message.

Considering that without the system message steering to point to the model, the factual accuracy of recent events is all over the place, this might be a critical bug concerning how the gpt-4-turbo-2024-04-09 variant handles its cutoff date altogether, see my additional post(s) below for a Python code to A/B test the issue, i.e. Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default - #9 by FlyingFathead Try it out with that example or throw in your own Q&A’s, you can replicate the exact issue that way.

andersonmichaelj01 · April 10, 2024, 9:54am

it’s not only that, but it also simply doesn’t want to follow basic instruction that it used to just 2 days ago, Its basically at this point the same to GPT - 3.5

jr.2509 · April 10, 2024, 10:36am

My own impression from testing earlier today was that it‘s actually better at instruction following. That said, it was a limited test. What were you trying to use it for?

_j · April 10, 2024, 12:11pm

Except when it literally doesn’t think its 2021. Friendly chatbot doesn’t get the year from my system message:

You’re free to check its knowledge; even heir’s ages are right until a round of birthdays starting in two weeks.

A 0-shot response is only from training, and AI has a lot of that from prior chat. Followup to that screenshot chat, though: “What is your knowledge cutoff date that you know birthdays and ages?”

Oh, right, my knowledge is up to date until early 2023. So, um, if there are any birthdays or big events after that, I might not be totally up to speed. Why do you ask? Is there something specific you’re curious about?

natanael.wf · April 10, 2024, 12:51pm

It’s funny to see the inconsistency. If you ask a current question, it says it was trained until January 2023:

FlyingFathead · April 11, 2024, 4:29pm

Spot the difference in the API Playground (note: reload the page between questions)

Link:
https://platform.openai.com/playground/chat?model=gpt-4-turbo-2024-04-09

vs.

I suspect this is what’s causing those initial bad benchmarks. You really need to tell the model its exact build in the system message.

This might be a deeper issue in the works? Many people who use the API might not necessarily “tell” the model its exact type in the system message separately, since they likely assume the model has a “grasp” on that upon i.e. specifying the model inside the API call’s model options.

Add “You are gpt-4-turbo-2024-04-09” to your system message and see what happens performance-wise.

(PS. if someone can forward this bug to OpenAI’s staff as somewhat of a critical one, it’d be great. I’d surely classify this as something worth looking into?)

FlyingFathead · April 11, 2024, 5:44pm

I have elevated this issue to GitHub, since this also affects Python API’s users. There’s a code snippet you can use to diagnose and A/B test the problem for yourself:

github.com/openai/openai-python

Model "self-recognition" issue affecting gpt-4-turbo-2024-04-09 and other models without an explicitly set system message statement

opened 05:36PM - 11 Apr 24 UTC

FlyingFathead

bug

### Confirm this is an issue with the Python library and not an underlying OpenA…I API - [X] This is an issue with the Python library ### Describe the bug While this issue intersects with the usage of the Python API, it primarily pertains to a deeper, potentially systemic concern affecting model performance and accuracy across any interfacing tool. **Issue Description:** When interfacing with various model versions (specifically noted with gpt-4-turbo-2024-04-09), it appears that without an explicit declaration of the model version within the system message, e.g., “You are gpt-4-turbo-2024-04-09,” the model does not correctly anchor to its designated context and capabilities. It exhibits behaviors reminiscent of older versions or operates with outdated knowledge, such as assuming a knowledge cutoff in 2021. This misalignment occurs despite selecting the correct model version in API settings, indicating a potential oversight in how model version grounding is internally managed. **Impact:** Such behavior can significantly impede the model's utility and accuracy, particularly in contexts requiring up-to-date information or specific model functionalities. **Additional Context:** I have shared these findings within the OpenAI Developer Community, garnering corroborative feedback and additional insights which indicate this is not an isolated incident: - [Community Post 1](https://community.openai.com/t/gpt-4-turbo-2024-04-09-will-think-its-in-2021-unless-specified/712483) - [Community Post 2](https://community.openai.com/t/gpt-4-turbo-2024-04-09-will-think-its-in-2021-unless-specified/712483/6) I'm submitting this report here to ensure the issue receives appropriate attention and to facilitate a cross-disciplinary examination of its implications and potential resolutions, since as of now it's affecting model performance. Thank you for your time and attention to this matter. ### To Reproduce 1. **Without Specified System Message:** - Initialize a session with the model gpt-4-turbo-2024-04-09 without specifying any additional context or information about its version in the system message. - Ask the model about recent events or information post-April 2023, which should fall within its knowledge base if it were accurately aligned with its latest training data. - Observe and note any responses that suggest the model is unaware of events or information after April 2023, indicating it is not leveraging its most current training. 2. **With Specified System Message:** - In a new session, explicitly tell the model "You are gpt-4-turbo-2024-04-09" in the system message. - Repeat the inquiry about recent events or specific information post-April 2023. - Compare the responses to those from Step 1 to observe any notable improvements or corrections in the model's awareness and accuracy, suggesting that the explicit context provided has re-aligned the model with its expected knowledge base. By following these steps, anyone replicating the test should be able to see a clear contrast in the model's performance and awareness of its knowledge cutoff, based on whether or not its version was explicitly stated in the system message. A simple method to A/B test this is to use i.e. the provided Python code snippet to run diagnostics on the model and spot the difference yourself. You can change the test question to any post-April 2023 event by editing the `user_question` variable. ### Code snippets ```Python import os from openai import OpenAI # User question to be asked user_question = "Is Cormac McCarthy still alive?" # Instantiate the client with your API key client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) # Inform the user about the question being asked without model guidance print("---") print(f"Asking without specifying the model version in the system message: {user_question}") print("---") response = client.chat.completions.create( model="gpt-4-turbo-2024-04-09", messages=[ {"role": "system", "content": ""}, {"role": "user", "content": user_question} ] ) # Print the response print(response.choices[0].message.content) # Inform the user about the question being asked with model guidance print("---") print(f"Asking while specifying 'You are gpt-4-turbo-2024-04-09' in the system message: {user_question}") print("---") response_with_system_message = client.chat.completions.create( model="gpt-4-turbo-2024-04-09", messages=[ {"role": "system", "content": "You are gpt-4-turbo-2024-04-09"}, {"role": "user", "content": user_question} ] ) # Print the response print(response_with_system_message.choices[0].message.content) ``` ### OS (all) ### Python version Python v3.11 ### Library version openai v1.14.2

_j · April 11, 2024, 6:02pm

No, that’s not what you do. What the AI produced for you is a mistruth about its latest update.

In practice, you’d want to inject some better concrete information into a system message based on model documentation (and truthful reliable knowledge dates), which will inform the proper generation during other tasks, such as when not to hallucinate an answer, when to use your web search function, or when to schedule an appointment with a tool.

You are PilotBot, an expert AI assistant.

You have extensive knowledge pretraining, extending up to a cutoff date of April 2023.
This chat session started 2024-04-11 9:23am.
Time at current user input is 2024-04-11 11:22am.

In practice (where I can’t demonstrate on playground more sophisticated use)

FlyingFathead · April 11, 2024, 7:32pm

Well, without explicitly steering the gpt-4-turbo-2024-04-09 model in the system prompt into being gpt-4-turbo-2024-04-09, it will not produce correct cut-off date data, you can try it out i.e. with the code I posted on the GitHub issue or here:

import os
from openai import OpenAI

# User question to be asked
user_question = "Is Cormac McCarthy still alive?"

# Instantiate the client with your API key
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Inform the user about the question being asked without model guidance
print("---")
print(f"Asking without specifying the model version in the system message: {user_question}")
print("---")
response = client.chat.completions.create(
    model="gpt-4-turbo-2024-04-09",
    messages=[
        {"role": "system", "content": "You are an AI assistant based on OpenAI's GPT-4."},
        {"role": "user", "content": user_question}
    ]
)

# Print the response
print(response.choices[0].message.content)

# Inform the user about the question being asked with model guidance
print("---")
print(f"Asking while specifying 'You are gpt-4-turbo-2024-04-09' in the system message: {user_question}")
print("---")
response_with_system_message = client.chat.completions.create(
    model="gpt-4-turbo-2024-04-09",
    messages=[
        {"role": "system", "content": "You are gpt-4-turbo-2024-04-09"},
        {"role": "user", "content": user_question}
    ]
)

# Print the response
print(response_with_system_message.choices[0].message.content)

(EDIT: I originally left the system message empty, but i.e. in the code above, it’s shown how the system message can instruct the model into being a GPT-4-based AI assistant and still output misaligned data [insert probabilistic model variations here] if the gpt-4-turbo-2024-04-09 is not mentioned)

Feel free to try it out. You can even change the system message to “You are a GPT-4 AI assistant” or whatever, and gpt-4-turbo-2024-04-09 still hallucinates being in April 2023 cutoff date at best.

Note that the new gpt-4-turbo-2024-04-09 is supposed to have knowledge up to December 2023 (see https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4), which it is unable to access (again, see the code example above) without the model variant specific steering in the system instructions.

I do wonder if it has something to do with the benchmark declines as mentioned i.e. here:

In developer experience terms, if it really is an “approach” is to have the API user pinpoint the model in the system prompt separately (even after pointing to the model in the API call itself), it differs quite a bit from what OpenAI’s competitors like Claude or Perplexity offer via their API by default.

When you specify a model sub-type like gpt-4-turbo-2024-04-09 in an API call, it’s (at least in my mind) a valid and logical expectation that this instruction alone should suffice for the model to utilize its most recent training data and knowledge cutoff, without further need for system prompt steering to pinpoint gpt-4-turbo-2024-04-09 separately for it to align to its latest cutoff date and its associated data.

API’s like Claude and Perplexity both seem to use the latest available data to deliver results when instructed to perform as an AI assistant without further system prompt steering on the API side, whereas gpt-4-turbo-2024-04-09 really seems to require pinpoint identification in the system message or it won’t anchor itself correctly to its proper cutoff date and hence can and produce i.e. outdated output by default.

The requirement for exact model identification within the system message (rather than relying on the selected model’s inherent knowledge parameters or i.e. simply to what model the API call is pointing to) could lead to the model sourcing information from an incorrect or outdated dataset, and that discrepancy might be a contributing factor to the latest model checkpoint’s reported sub-par performance.

Ideally, the latest model should always default to its latest data by default, and this doesn’t seem to be the case at the moment with gpt-4-turbo-2024-04-09. Again, feel free to use the code snippet above to do your own A/B comparisons with i.e. Q&A’s related to events that took place between Apr-Dec 2023 and come to your own conclusions.

nicoloceneda · April 11, 2024, 8:52pm

On my side it said “April 2024” the whole morning (I tried more than once as it seemed strange) until I read this post and I tried again. Now it says “April 2023”.

nicoloceneda · April 11, 2024, 9:17pm

Is there a way to check whether this is actually the updated model?

FlyingFathead · April 11, 2024, 9:52pm

I’d say there’s no 100% way to be sure of anything the model says, ever. You may or may not remember the “persisting hallucination problem” that ChatGPT had for some days last December where it “thought” it was version GPT-4.5.

Especially if OAI’s own system prompt isn’t pinpointing and anchoring the model accurately with the system prompt, as what’s been discovered in terms of model behavior on the API side would imply, if it’s not system prompt-pointed to the exact checkpoint and hence to a cutoff date, it can and will be all over the place.

For instance, gpt-4-turbo-2024-04-09 (the latest version) should have a cutoff date of December 2023. https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4

In that context, if ChatGPT is telling you that its current cutoff date is April 2024, eh, I’d be skeptical of that. Ask if it has a bridge for sale in Brooklyn while at it.

nicoloceneda · April 12, 2024, 3:56am

So strange… I wonder whether it switches between models based on traffic or whether it just doesn’t know.

_j · April 12, 2024, 4:05am

Tokens are randomly sampled from all possibilities. You’re basically not going to get the same thing twice out of ChatGPT. When it gets to fabricating numbers after gpt-4-, there’s 100k different things the AI could output, in proportion to how certain it is.

ChatGPT can even invoke internet search to get wrong answers about what model you get, which is not named and can be constant trials of different models under evaluation, by alternation, or by testing groups. Ask in Chinese, get a different answer.

ClipFarms · April 12, 2024, 4:58am

So far, my experience in coding is quite similar - I’m unsatisfied with gpt-4-turbo and am sticking to gpt-4, for the rare moments I deem Claude incapable. However, for other sorts of text generation, turbo seems pretty decent at 1/3 the price of gpt-4.

That said, it’s a bit disappointing that the model still seems less capable than gpt-4 for any task I throw at it, and it’s been almost a year since the “golden era” of gpt-4. Something has clearly changed, for the worse, with gpt-4’s parameters or similar over the last couple months.

In a perfect world, there would be a premium model, and then there would be the everyday usage gpt-4-turbo. gpt-4 seems in the middle of those two spectrums.

I for one would have no qualms paying more. I feel like the API is already pretty inexpensive for intensive tasks. Maybe others have a different opinion.

FlyingFathead · April 12, 2024, 7:31am

Yes, and that’s exactly what I was after with the initial bug report – additional model anchoring is required in the system prompt at the moment.

See the code snippet I posted: above: Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default - #9 by FlyingFathead

Try it out for yourself as an A/B test, or just use the API Playground with the same kind of additional model steering. Again, specifically stating i.e. “You are gpt-4-turbo-2024-04-09” in the system prompt does all the difference in the world when it comes to the model being able to access the latest cutoff dates.

I think the problem you described about worsening output quality might revolve around that. It’s misaligned and doesn’t attach to the correct cutoff date, however you want to put it.

Since gpt-4-turbo now points to gpt-4-turbo-2024-04-09 as well, be aware that without further specification and model steering in the system message, it’s unable to access the later cutoff date (December 2023) properly.

That is more than likely contributing to the sub-optimal performance, even if you’re not actively asking about recent events between Apr-Dec 2023. It’s likely still tossing the coin towards its earlier checkpoints/cutoff dates.

In my view, that’s symptomatic of a larger issue and also might point to a larger model structure problem and the perceived diminishing performance on GPT-4’s newer checkpoint variants.

Also, even if someone contends that the cutoff date should be pointed in the system instructions (a.k.a. “deal with it”), I still maintain that the model should not require that kind of pinpoint steering separately in the system message for the model to actually utilize its latest cutoff data that it’s supposed to. It should not be a system prompt add-on requirement. It’s also not like that with other AI API providers (Claude, Perplexity, …)

Rather, it’s a rudimentary workaround at best – OpenAI is passing their internal version anchoring problems to their API clients or end user with the current implementation. I see this is as a bug, not as a feature, especially when the model variant is already being called from the API with the model selection.

tunahancivek36 · April 12, 2024, 11:15am

why i cant use gpt 4 turbo? i have chat gpt plus

spikyd · April 12, 2024, 11:23am

ChatGPT Plus users has turbo model integrated. With other words, you can’t choose what model you want in the Chatgpt, it contains automatically the lastest model… If you wanna choose a specific model, you need to go on platform using the API

Neoony · April 12, 2024, 11:29am

it does not automatically have the latest model, only if openai decides to update it

The biggest trouble right now for me (and many prople in discord) is knowing what model is chatgpt actually using.

They really need to start telling/showing us what its using in chatGPT.

Every model/version behaves differently and you need to learn all the quirks of each model.

Really annoying and makes chatGPT unreliable for me and makes me wanna use playground instead

spikyd · April 12, 2024, 11:32am

He asked about chatgpt plus, this means chatgpt platform, not openai platform… In chatgpt platform you can’t choose the model, the Plus subscription has automatically the latest model version…

Topic		Replies	Views
Differences in results between API calls content results and browser content results gpt4 API	33	11741	March 2, 2025
Gpt-4-vision-preview has Training Data Up to Oct 2021? Bugs api	2	4577	December 12, 2023
New knowledge cutoff date Community chatgpt	21	13440	November 2, 2023
GPT4 cut off date is now 2023.04? Community gpt-4 , chatgpt	13	38474	January 29, 2024
Mystery model popped up on lmsys gpt2-chatbot - gpt4.5? Community gpt-4	53	11521	May 14, 2024

Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default

Related topics