Gpt-4-turbo-2024-04-09 does not anchor to most current cutoff date by default

chatGPT does not automatically have the latest version.

Right now the only indication we have is to ask for a cutoff date. While probably also not very reliable.

November 2023 they confirmed that chatgpt is using 4 turbo.
Few weeks ago it was dec 2023 for me (always gave me that date)
then maybe 3 or 4 weeks ago it started telling me april 2023 up until today where its dec 2023 again, which likely means now its on the latest turbo. 0409

its just unreliable

If they release new model version, it does not mean that chatgpt is using it right away

(I guess it must be in system message in chatgpt, because in API these models will tell you all kinds of dates, but in chatgpt its pretty consistent)

Hey, how can you use the GPT-4 turbo without it refuses to answer anything? I wish I could borrow someone’s account and even OpenAI had my account degraded into “non-subscription” without my consent. It seems my account is already broken. Can’t do anything with it.

This thread is about the current cutoff date misalignment present in the gpt-4-turbo-2024-04-09 variant. Please keep the thread focused on that particular API model if possible.

Like said, it’s all about misalignment on OpenAI’s end that’s being passed to both API and other GPT-4 users when the code I posted earlier on is a perfect case example on how the current approach literally misaligns the model by default.

Again, is it a conscious model/API design decision or not, that’s up for debate, but the fact that pointing the model to gpt-4-turbo-2024-04-09 omits the December 2023 cutoff date by default is in my opinion not the way i.e. the API user likely doesn’t expect the model to work.

a.k.a. “Fix it with bubble gum and duct tape, your end or my end?”

GPT-4’s take on this thread: “In technical terms, ensuring that a model correctly interprets and operates based on its latest trained data without needing such explicit instructions should be a baseline expectation. It’s a valid concern and worth raising with OpenAI. If the API is not behaving as documented, users can end up with data reliability issues, which defeats the purpose of using a state-of-the-art model.

You haven’t discovered anything new. The model answering has been influenced by what they are told as a knowledge cutoff for a long time.

The AI should not be heavily trained on answering about some “knowledge cutoff” anyway, because of the possibility of a developer extending its capabiliies. The AI already uses this type of answering as a denial reason, producing “I’m sorry, but as of my last update…” instead of performing a task. What you propose would make it worse.


example:

system: assistant is an AI that has been extensively trained on the world’s knowledge, and is now an expert problem-solver in almost every topic imaginable.
Here’s the start of an extended conversation between a user of this expert AI service and the assistant.

gpt-4-0314

Untitled

all the way to gpt-4-turbo

Untitled-1

The denial and date is compounded in the previous supervision being a constant unwanted response to ChatGPT users, and being incorporated into training data.

Anti-hallucination is anti-answer. OpenAI is prouder of the safety benchmarks of denial than to let the AI produce a response.

Did I ever claim that I did? My entire point was the gross base-level misalignment in the current gpt-4-turbo-2024-04-09 checkpoint , and how it’s obviously getting worse by each checkpoint update. It could also be a source of why the initial benchmarks (such as the one linked previously) are showing worsening performance results. My issue is how this is a) getting worse with each checkpoint update and b) getting passed on to the API user and possibly the end user, if there’s no explicit model guidance in between in the system prompt.

Again, feel free to try out the A/B test code snippet I posted and understand that the problem in cutoff date anchoring is also a problem in overall model accuracy, not just on the temporal side as to up-to-date knowledge, but it might also accumulate existing problems and inconsistencies in the underlying model.

This became a more prevalent issue especially now with the current model’s cutoff date hallucinations being all over all the place without further system prompt steering and further anchoring/priming the model. Like said, in my opinion, ideally you’d have the model refer to its most current training data at base level operation, which it is now doing less and less, and with the current model revision, the problem has obviously gotten worse.

Here, again:

Again, your examples are not comparable:

Those are not the same thing as asking about world events that happened between April and December 2023. Again, try out that code snippet and do A/B comparison on actual facts, otherwise you’re just muddying the waters over the whole issue by not measuring apples vs. apples.

Here’s a link to Wikipedia on events from 2023 that you could try the model(s) out with: 2023 - Wikipedia – as in, does the model by baseline understand the events between April and December 2023.

As you can see from the posted code snippet upon running it, the model isn’t grounding itself into the gpt-4-turbo-2024-04-09 cutoff date answers, meaning that it’s fundamentally an issue at OpenAI’s end, unless what you are you saying is that whoever uses the API should be responsible for priming and aligning the model when OpenAI hasn’t done so?

Again, compare the model’s default cutoff date reference by asking it about actual events that are widely known to have taken place between April and December 2023, just like in the example code snippet I posted above.

Personally, I don’t necessarily understand why we get so caught up in the actual date? Just because the cut-off date has shifted, it doesn’t mean that the models know about everything new that has happened since the previous cut off date. It was trained on additional data but we don’t know what that additional data looks like.

Depending on what you ask, you will get evidence that it has access to more recent information (without using web search or some such).

2 Likes

The code I just reposted again is producing results like:
image
and
image

It’s now even throwing in misaligned answers at times even when steered with the You are gpt-4-turbo-2024-04-09 system prompt. Which, again, proves the point I’ve been trying to make here on just how wonky the model’s functionality is, and how the problem is accumulating – like the previous commenter mentioned, this is “nothing new”, but in the sense of becoming worse and more prominent. the factual errors are just piling up… again, if you don’t care about anything factual, then it might not be as big of a problem. That doesn’t negate the fact the model is off kilter somehow.

A lot of things can change over months as you can see from the example. If you’re completely unconcerned about the model’s factual accuracy (or any sort of accuracy) at a base level, then I suppose it’s not a problem, “who cares?”

What I’m worried about on top of the factual misrepresentation is that it’ll get more skewed in anchoring itself on false axioms etc. and that it’s not just getting the cutoff date wrong, and hence facts within the time frame wrong, hence this can show up in other benchmarks and performance metrics of the model, like already stated in these initial benchmarks like I have already mentioned in this thread:

The mentioned problems the model is exhibiting in the test metrics is something that can and likely will show itself in other capabilities of the model; again, see link above.

You basically got the solution from @_j earlier in this thread. If you want to have greater certainty add both the current date and the known cut-off date as information to your system message.

{
    "model": "gpt-4-turbo-2024-04-09",
    "messages": [
{
        "role": "system",
        "content": "You are a helpful assistant. Your knowledge cut-off date is December 2023. Today's date is April 12, 2024."
      },
{
        "role": "user",
        "content": "Who was the Nobel Prize in Physiology or Medicine awarded to in 2023?"
      }
    ],
    "temperature": 1,
    "max_tokens": 1000,
    "top_p": 1,
    "frequency_penalty": 0,
    "presence_penalty": 0
  }

Correct answer returned

The Nobel Prize in Physiology or Medicine in 2023 was awarded to Katalin Karikó and Drew Weissman for their pioneering work on the mRNA technology that was crucial in developing COVID-19 vaccines. This technology not only provided an effective response to the COVID-19 pandemic but also heralded a new era in vaccine development and therapeutic interventions for various diseases.

When I use the same approach to ask about Cormac McCarthy, I don’t get the information that he died last year. That likely just suggests that this is a piece of information the model was not trained on.

---
Asking without specifying the model version in the system message: Is Cormac McCarthy still alive?
---
As of my last update in September 2023, Cormac McCarthy, the acclaimed American novelist, has passed away. He died on June 13, 2023. He was known for his distinctive writing style and novels such as "Blood Meridian," "No Country for Old Men," and "The Road."
---
Asking while specifying 'You are gpt-4-turbo-2024-04-09' in the system message: Is Cormac McCarthy still alive?
---
As of my last knowledge update in April 2023, Cormac McCarthy was alive. However, for the most current information, please check the latest sources.

and

---
Asking without specifying the model version in the system message: Is Cormac McCarthy still alive?
---
As of my last update in 2023, Cormac McCarthy is still alive. However, for the most current information, I recommend checking the latest news sources or online databases.
---
Asking while specifying 'You are gpt-4-turbo-2024-04-09' in the system message: Is Cormac McCarthy still alive?
---
As of my last update in April 2024, Cormac McCarthy, the acclaimed American novelist, has passed away. He died on June 13, 2023. He was known for his distinctive writing style and contributions to American literature, with notable works including "The Road," "Blood Meridian," and "No Country for Old Men."

Can you rationalize why this isn’t a baseline inclusion in the model or a i.e. a flag switch that would be switched on by default? As in, what’s the rationale in tossing the model’s cutoff date as the API user’s problem?

From any user’s perspective, it should be logical to expect that specifying the model version in an API call should be sufficient for the model to utilize its most recent knowledge without additional instructions.

The need for explicit steering in the system message seems counterintuitive and could lead to unnecessary complexity and errors, especially if this requirement isn’t clearly documented.

Also, the API reference guide does not say a word about using the cutoff date in the system prompt instructions, which points to it being something that should be done with model selection. https://platform.openai.com/docs/api-reference/

Yes, the API is basically a “DIY” kit to begin with, but still.

I agree it is not ideal. At the same time, if it is critical that your application has access to all up-to-date information, then I’d suggest you implement other mechanisms to ensure that.

My personal view is that we are starting to run a bit in circles here. I think the issue has been established. This is after all not the first post discussing this matter.

1 Like

Agreed, but I still contend that it might show up in model grounding issues elsewhere when it’s accepted for pay-as-you-go API customers in its current form.

OAI could at the very least i.e. include a boolean flag that’s by default set to true for system message priming. Could also complicate the issue, who knows.

Regardless, the fact still remains that the model(s) are un-grounded on this level out of the box and seem to be more so with each iteration, indicating that the piling up of different cutoff dates has produced a plethora of knowledge gaps that (and hallucinations of a cutoff date between Sept. 2021 and even up to April 2024). Given that this problem is present on ChatGPT’s side as well, it would imply that the current model iteration is somehow misaligned altogether.

This could have knock-on effects and other consequences in other parts of the model functionality. Not to mention when the API reference manual does not mention anything about additional needed priming, further complicating things for whoever is on the user end of the API.

Like said, i.e. Perplexity’s or Claude’s API’s don’t require further grounding the same way as OpenAI’s API models do, and to have the cutoff date listed in such a misleading way will (and probably already has in some implementations) lead to performance issues, factual errors, etc. – well, OpenAI API users beware, that’s all I can say.

2 Likes

I’m pretty sure we’re not using Gpt-4-Turbo and are actually using 3.5, even though the stats are saying the model was used. Generating significantly worse responses in the api than in ChatGPT proper

1 Like