Hallucinations are persistent and non-random

I asked ChatGPT to write an article about ISO in film photography. It made some statements about color film lacking sharpness that are not exactly correct, but good enough. It then said something completely wrong about slide film. Slide film has a very restricted dynamic range. It thought they had a superior dynamic range. This is both wrong as a matter of fact, and something that almost all human writers on the subject are going to get right (you don’t start talking about the dynamic range of film unless you know what you’re doing).

Wanting to see if it could identify the problem, I opened up a new chat window. Said I knew that there was a mistake in the paragraph, and asked it to locate it. ChatGPT correctly stated that some of the last generation color films are pretty good. It doggedly stuck to the mistake about dynamic range, even when directly questioned.

Which is really interesting. The first “error”, dealing the lack of sharpness in film, goes to the weight of it’s training data. That color films lack sharpness is a common opinion, and one that would be prevalent if older data were used in its training. When asked it corrects to the most recent data. Given that film photograph went into a steep decline with digital, the bulk of the training data, likely, contained an outdated concept (though I’m not sure low ISO Tri-X wouldn’t have higher effective resolution that a color film).

The statement that slide film has a large dynamic range is simply wrong and bizarre. It does not, and almost nobody writing on the subject would ever say that it did. ChatGPT even acknowledges the functional definition of small dynamic range (have to have the exposure right) while clinging to the erroneous concept.

My initial thought is that the math producing the wrong result here is also used in the math producing the right result for something more important or common. If that were the case, then running the training data under a different random seed might produce different results. (With the ethical disclaimer that I’m not at all sure creating a new AI is a good idea).

ChatGPT does not evaluate any math to producate a completion to a prompt. Where are you getting that idea?

There are a few places where randomness can be used in the training, such as picking different batches of data for training or picking drop out sections. As for a single random seed being used and having such a change in outcome is not something I have not seen in a paper, care to note a reference for that?