What would happen when AI were trained based on AI-generated content itself?

AI is now trained based on content inputs previously created by humans or facts narrated by humans. But with AI being so efficient, it’s very likely that online content would be overwhelmed by AIGC. If that happened, the content that AI access to and trained on would be AIGC. It’s like AIGC from AIGC.
Would it cause problems?
Would AIGC become better or worse in that scenario?

I personally love OpenAI and I am so grateful that the team have built such amazing products and have invited people from different backgrounds, even without programming skills, to involve in creating someway, like playground.
I just have concerns above, and wish to openly discuss with everyone.

Even though the AI might use ai generated content to learn it is still curated by actual humans, thinking “This is cool, and what I was looking for!” so the AI would still improve, and just because AI can do these things really well now, it can never truly replace the freedom of traditional art or writing, so running dry of resources won’t be a problem, but a learning plateau might still be thing.

So in conclusion, I don’t think it will be a problem, and it won’t make it worse, but I also don’t think it will make it better :slight_smile:

1 Like

I don’t think it’s wise - yet - to use AI-generated text as training data. There is a risk of reinforcing errors and exacerbating the problem of hallucination. We need a way to give feedback to the first iteration of generated text. Feedback regarding truth and falsehood. If we build a reliable way to do that (perhaps with Stack Overflow-style upvoting and downvoting of the first iteration of text, combined with source-checking and other methods) then the second iteration of text could be superior and so on. Would love to hear what others think.

1 Like

Thanks for replying. I agree with your concern, and in addition to the risks you mentioned, I also concern about diversity of the content.