Hm, a lot of back and forth between the author of the video and me here. And I wish others would have chimed in too for a more unbiased perspective. But here is my summary of this thread and the video:
Q: Can GPT generate training data?
A: Yes, itâs called synthetic data. Think of it as a fancy âlorem ipsumâ generator. GPT can at scale create text snippets according to your specs regarding topic, tone and sentiment. Works great for using text classifiers (which is what all the articles referenced above are talking about - these articles are not about a model generating its own training data).
Q: Can GPT generate its own finetuning training data?
A: No. At least there is no evidence that Iâve seen. Said video does not prove it (see below).
Q: Is GPT3 hallucinating/confabulating?
A: Not unless you want it to. If you keep the temperature low and phrase your prompt accordingly, I have not seen it making things up. In this particular use case, it wouldnât fabricate an item in a list of âmedications givenâ if the medication wasnât in the patient record.
Curiously, you can however make it make up things if you do things like this:
> What famous scientists were kittens?
Some famous scientists who were kittens include Albert Einstein, Marie Curie, and Isaac Newton.
> When was Isaac Newton a kitten?
Isaac Newton was a kitten in the 17th century.
Q: Is it a good idea to synthesize training data on davinci and then use the output to finetune curie?
A: I donât think so. Even though your ft-curie model might have a better understanding of the particular domain you are finetuning it to, it will still have a lesser general understanding of the world (idioms, synonyms, tone, etc.) and thus perform worse than if you had finetuned davinci directly. When you are trying to get to the bottom of your quality issues, you will be at a loss whether itâs just curie, your training data, your methodology or who knows what.
Q: Is it legit to test (or even spot check) your model on the same data that you used to fine tune it?
A: No, you have to use data that the model hasnât seen before.
Q: Confabulating, hallucinating, accuracy, precision, bias, variability, hyperplanes,⌠what are all these terms?
A: Not really sure how they got in here as these are not used in scientific literature about language models. The graphic posted above is about metrology. See Precision Vs. Accuracy â Information Technology
Q: Is accuracy and precision the same as bias and variability?
A: Yes, see Accuracy and precision - Wikipedia
Q: So what is the metrics that we should use here?
A: In a use case where you want to extract a list of medications prescribed shall be extracted from an unstructured patient record we should use ârecallâ and âprecisionâ. See Precision and recall - Wikipedia
If your model fails to extract all the âmedications prescribedâ youâre dealing with poor recall.
If your model erroneously considers words as medications prescribed you are dealing with poor precision.
David: I understand that you are trying to establish yourself as a an âeducatorâ here, heavily promoting your own videos. But I wish that you would be a little bit more scientific - especially if you are using phrases like âwearing my professor hatâ, falsely giving the impression you hold such credentials. Here is what I observed:
You let GPT3 generate completions based on unstructured patient records and use these completions to finetune GPT3. Then you use the same patient record to spot check how it works. Thatâs data it the model has already seen. Big no-no in machine learning.
The title of the video was âFinetune multiple cognitive tasks with GPT-3 on medical texts (and reduce hallucination)â but neither did your finetuning work nor did you show how davinci is it is hallucinating but your finetuned model doesnât.
At the end the finetuned model delivers worse results than plain vanilla davinci and you blame it on the fact that you fintuned curie. So what did you actually mean to demonstrate?
Then in your latest follow up video you mix up the two questions âCan GPT generate its own finetuning training data?â and âCan GPT-3 generate training data?â. Donât you see the difference?