Joint Retrieval and Generation Training for Grounded Text Generation

Recent advances in large-scale pre training such as GPT-3 allow seemingly high quality text to be generated from a given prompt.However, such generation systems often suffer from problems of hallucinated facts, and are not inherently designed to incorporate useful external information. Grounded generation models appear to offer remedies, but their training typically relies on rarely-available parallel data where corresponding documents are provided for context. We propose a framework that alleviates this data constraint by jointly training a grounded generator and document retriever on the language model signal. The model learns to retrieve the documents with the highest utility in generation and attentively combines them in the output. We demonstrate that by taking advantage of external references our approach can produce more informative and interesting text in both prose and dialogue generation. [Source]


This is important and good news. Is there a linkage with OpenAI?

The /answers endpoint is our current approach to solve this problem


Are any of the techniques used similar? I seem to recall reading that answers = uploaded data + crafted prompts.

crafted prompts that are automatically constructed on the fly. more concretely, we use search to select relevant information to be added to the prompt from the uploaded documents. This provides a way to provide external knowledge to gpt-3. this prompt is then passed on to completions.
while people outside train retriever+generator together for a specific application, with GPT-3 we have a general purpose model that can work with novel context without requiring fine-tuning for that specific application. /answers basically exploits this aspect to make GPT-3 more accurate by providing relevant information in the prompt

I actually think ‘hallucinated facts’ is a bit unkind. I like GPT-3’s current ability to treat all equally probable wordings as equally real. It’s like a tool for exploring alternate universes.