Discrepancy in training data "training cut-off"

Age250265 · October 17, 2024, 1:41pm

Hi there,
Can you explain the discrepancy between training data timestamps as shown in the screenshot?

Training data per,

Website: up to October 2023;
GPT-4o mini “When was your last training cut off?” answer: October 2021.

Thank you.

dariouch.babai · October 22, 2024, 8:21pm

That is not the only thing it is not initialized according to the only available specifications from the openAI documention. There is a table of those, and none of the models in my paid tier plus that I can select are aware of their own factual specifcations. from those tables. It is as if because there are behind the web site interface, they should be more clueless.

it makes one bump into absurd consequences when for a typical task one is asking if possible to do this or that, to have plan that will hit a wall a long time after the work in under progress, and realize the bot is confused between its reality of actionable upon text substrate, and what it was told or told too vaguely at initialization about its own user critical information to have. That compensate for its cutoff date (at least that).

LLM already have their ways of filling in gaps of uncertainty with upmost confidence as if they were forced to blabber something always, if the system initialization is adding to that what kind of thing are they selling… Double obscurity. The obscurity of its own behavior about discnering knowledge from its “semantics” notions, etc… and then the obscurity of throwing them in the wilderness of web chat interface arena with users more knjowledge than them about themselves (per documention).

BS land much?

vb · October 22, 2024, 8:26pm

Hi!

You can look to up the cut-off dates here:

https://platform.openai.com/docs/models/gpt-4o

dariouch.babai · October 22, 2024, 9:22pm

Yes my point is that only I can look at it. And then I have to inform each new conversation. before even getting on with the real task. just to avoid its outdated and fuzzy assumptions that often lurked long into the conversations. It will be gung ho into making plans to accomplish certain tasks, and then the reality of the task later does not work. So why is system not feeding the thing their own updated information. So we can start using the precious turns (and our sanity) for the task and not working for openAI, instead. Or if OpenAI reserves the right to fudge the table you point me to, through the web interface, could it let us know (in more polite terms of course, ask a chat to do the documentation item).

Age250265 · October 23, 2024, 9:45am

Hi, this answer does not make sense. Please read the original request.

Age250265 · October 23, 2024, 9:46am

Please stick to the topic. If you can answer it, do so, otherwise wait for it.

arata · October 23, 2024, 9:56am

Placing a training knowledge cutoff date into the system message, along with other behaviors, is the responsibility of the chatbot implementer.

In this case, it is DuckDuckGo that could add some information, so the selected AI models they have available can answer accurately.

vb · October 23, 2024, 9:58am

My apologies for not reading the entire post.

GPT models, especially 4o-mini, aren’t great at providing accurate information about themselves.

What I meant to say, but didn’t express clearly, is that referring to the documentation is sufficient, and I suggest disregarding any information the model gives about itself. It’s often incorrect, and you were right to double-check.

Age250265 · October 23, 2024, 10:41am

Thanks for the clarification. Pardon my French, but this is stupid. I was not double-checking per se, but I come from a physical background and I have verification and safety in embedded in my pocket. Can you at least set some requirements on how you want your “intelligence” to have a bare minimum of “identity”? I think it can be done.

vb · October 23, 2024, 11:23am

I am not an OpenAI staff member. As far as I know, it’s on the roadmap to teach the models about themselves.

From my perspective, that’s something that shouldn’t be done—asking the models about themselves—because even if they were trained on such data, there’s still a chance they might give inaccurate information.

Age250265 · November 7, 2024, 2:39pm

Just to clarify, I’m not talking about anything such a model should know about itself, just a constant VERSION or TRAININGDATE.

Update to my own comment. This variable is given (~2021). The question is, what exactly does “my information…” mean? I guess that is the state of the art at the time, but the data? Which database? How does this database change from 2021 to 2025, etc.? Generally: Any user should be informed.

dariouch.babai · November 22, 2024, 12:57am

I agree they all think they are GPT 4 turbo…
Also, if you ask nicely in a first prompt and then tell them the documentation reality and then ask again will not have the same effect as if you first tell them the facts of docs (per your chatGPT model selection and openAI models table documentation), and then ask.
They will still cling to the first polite way, even if you try to correct.
The tone of the question is also going to matter. I think they might be putting us in a bind, as we would need to appear as thought we are reverse engineering while it should have been their job at the system prompt to at least do what you ask, and better keep feeding them the tables as they change the model selector offer. This is supposed to happen before the very first user prompt.
I find this silly. And perhaps a conflict between safety and feature claims, and making us pay to litterlay fix their “oversight”. I was just trying to work with long documents and that means chunking. but what size? that is obvous problem that their own operationa conversation dynamic constraining parameters is in the fog of openAI customer respect or having the advertising be on par with the reality of the offer.

Age250265 · February 3, 2025, 4:23pm

How about this one, which is related. The model would accept the “new date”, if entered, i.e. it will then work with the current date, but frankly, why make it so complicated?
Another question is why is the gap between 2021 and 2023 so large? Is the training so static and the model development so far ahead? How do the two fit together?

Topic		Replies	Views
Model Knowledge Cutoff Date (gpt-4-0125-preview) API	6	8999	April 12, 2024
GPT4 cut off date is now 2023.04? Community gpt-4 , chatgpt	13	39099	January 29, 2024
API Engine doesn't appear to be the one requested in some cases API	6	185	October 7, 2024
New knowledge cutoff date Community chatgpt	21	14103	November 2, 2023
OpenAI injecting a date into the prompt breaks evaluations Feedback model-spec	11	188	February 17, 2026

Discrepancy in training data "training cut-off"

Related topics