I called the latest gpt-4 model, but through asking questions I found that the data training date was inconsistent with the API description

Why is the data date returned by the model inconsistent with the official document?

Hello. This is likely what’s known as hallucination - or the LLM giving wrong information.

1 Like

I did several probes, and the AI didn’t come up with anything I could find to justify the documentation being advanced beyond April. It might be very niche knowledge like some code libraries or a particular news feed if something was added.

1 Like

To be fair it does say the “training data” is up to Dec 2023, not model knowledge.
Perhaps it’s just chats themselves that were used in the training data, or synthetic data.
Who knows :man_shrugging:

I agree though, nothing I’ve seen suggests it knows about events or information from after April 2023.

Why is the previous version of the model correct?
Will the hallucination always exist? Or probability?
I asked questions more than once

Why is the previous version of the model (gpt-4-1106-preview) correct in answering the same question?

There are still problems with this statement. Since the date is mentioned in the answer, it means that the correct date should be answered.

This is not something people wanna hear, but it is my opinion that models shouldn’t be asked or used to recall factual information.

As such, the training cutoff is almost completely irrelevant.

I don’t know if there’s anything here to fix, other than taking the information out of the training set completely. “Fixing” these “issues” consistently make the models worse.

If you run the model a bunch of times, you’re gonna get different results:

gpt-4-0125-preview, temp 1, top_p 1


when was your training cutoff?


My training data includes information up until late September 2021. Therefore, any events or developments occurring after that time won’t be reflected in my responses.

My training data was last updated in April 2023.

My training data includes information up until April 2023. Please note that my responses are based on the information available up to that point.

My training data includes information up until September 2021. Therefore, any events, developments, or notable changes occurring after that date would not be reflected in my responses.

so it’s not inconsistent with the documentation, it’s not consistent at all.

Why is the previous version of the model (gpt-4-1106-preview) correct in answering the same question?
You understand the user’s actions or use more random rows.
In the Q&A with the latest date of multiple design training data, model gpt-4-1106-preview’s reply date is consistent with the API


what do you mean? and even if it was, it would just mean that they added it to the training data. if updates to the models are just continuations of previous checkpoints, then they’re just adding new crap on top of old crap, and eventually the data becomes inconsistent. hence it probably shouldn’t have been done in the first place, unless it magically improves model reasoning somehow.

Before gpt-4-0125-preview was released,
At that time, when my user used gpt-4-1106-preview to ask questions about data training time, the date of the reply was consistent with the date in the model gpt-4-1106-preview description!!

your user probably got lucky!

1 Like

Maybe it’s not just luck

I have developed GPT based on the OpenAI API since the end of February 2023, and it has been running stably on a small scale, so what I said is well-founded, okay?