Is RAG really only for factual recall?

mikea · December 15, 2023, 8:13am

I’ve built a daily comedy podcast where all content is generated by a mix of OAI and OS models, with the final script as a JSON doc. It features multiple characters, each with their own prose style.

Currently this is done by few-shot; passing a pseudo chat history with each request. Which works pretty well but gets expensive. Since i now have a pile of generated scripts, I’m curious about setting up a round-robin whereby generated scripts are embedded and retrieved during subsequent runs.

All the RAG stuff I’ve seen is about topical recall, rather than style reference. But could it work for my purposes? I know there’s also fine-tuning, but I’m not ready for that yet.

trenton.dambrowitz · December 15, 2023, 8:24am

RAG for assistants is shrouded in a bit of mystery, but here’s what Bing Chat has to say on the topic.

Retrieval Augmented Generation (RAG) uses embeddings to encode the input query and the retrieved documents. The encoded vectors are then used to generate a response. Embeddings allow the model to understand the semantic similarity between different pieces of text, which is crucial for effective information retrieval and response generation

So essentially it doesn’t read the document as it would a normal prompt, it’s more similar to using ctrl + F to find information within a document.

If you’re generating a significant amount of responses fine-tuning is the way to go for imitating a style. Otherwise, just try to refine your system prompt so the model writes in the style you want with the least tokens.

supershaneski · December 15, 2023, 8:39am

I think it is possible. Consider this scenario:

user: tell a joke in style of persona A.

call api with function get_style { style: "persona A", prompt: "tell a joke" }

run embedding to get "persona A" description or conversation history cracking joke.

append the result to system prompt and with instruction to behave like the result.

call the api with updated system prompt (maybe without function calling) using either original prompt or just "tell a joke"

Topic		Replies	Views
Embeddings vs megaprompt for few-shot creative examples? Prompting embeddings , gpt-4	5	674	December 15, 2023
RAG is not really a solution Community api , rag	61	6109	April 10, 2024
Best approach for adding knowledge to base model API fine-tuning , rag	4	720	February 7, 2024
Embeddings not working as well as I hoped Community gpt-4	3	855	December 15, 2023
Can I fine tune without specifying an answer through the "assistant" role? API	6	692	December 25, 2023

Is RAG really only for factual recall?

Related Topics