Well, this is one reason why OpenAI requires human input into the generation workflow. The model can generate too much fake info.
You only control the prompt and the settings, so try to play with those. For example, lower temperature and strict enough prompt would help to get more consistent and predictable results.
I’m not an ML engineer, but I can imagine one can solve this issue by comparing the output against a set of facts from a facts-based knowledge base (knowledge graph?) and classify it as true or not.
Get the output
Extract the entities and connections between them
Find the claims the text makes (based on connections between the entities)
WikiData - a system and API that provides data look ups from Wikipedia. The data isn’t always verified, but in some
Wikidata API https://query.wikidata.org