I asked ChatGPT a question. I’m not sure I can trust the answer.
For that reason, I asked if it can publish the sources it used to answer my question. It answered that it can’t publish its training data.
Please consider adding support for this feature: if asked, ChatGPT should publish the training data relevant to answering a question.
I can imagine that technically and legally this is challenging.
The amount of relevant sources can be humongous, some of it might not be public or shareable with the public for legal reasons, and my (vague) understanding of neural networks is that after the training nothing in the network links back to the training data.
Yet, I feel at least the first two issues could be solved. e.g. publish only the most “relevant” (solves the size issue) public (solves the legal issue) training data. For the third, even if a neural networks don’t support that, the “ChatGPT entire system” needs not to comprise only the neural network (I assume it already doesn’t). Ancillary components could implement it.
I’m an academic researcher and I can’t take GPT answers at face value. Yet, if it published its sources it would make my life much easier (and that of every other researcher, in every field). It would make your product even more impactful (for the greater good).
I was using ChatGPT 3.5, the free version.