I’ve built a daily comedy podcast where all content is generated by a mix of OAI and OS models, with the final script as a JSON doc. It features multiple characters, each with their own prose style.
Currently this is done by few-shot; passing a pseudo chat history with each request. Which works pretty well but gets expensive. Since i now have a pile of generated scripts, I’m curious about setting up a round-robin whereby generated scripts are embedded and retrieved during subsequent runs.
All the RAG stuff I’ve seen is about topical recall, rather than style reference. But could it work for my purposes? I know there’s also fine-tuning, but I’m not ready for that yet.
So essentially it doesn’t read the document as it would a normal prompt, it’s more similar to using ctrl + F to find information within a document.
If you’re generating a significant amount of responses fine-tuning is the way to go for imitating a style. Otherwise, just try to refine your system prompt so the model writes in the style you want with the least tokens.
user: tell a joke in style of persona A.
call api with function get_style { style: "persona A", prompt: "tell a joke" }
run embedding to get "persona A" description or conversation history cracking joke.
append the result to system prompt and with instruction to behave like the result.
call the api with updated system prompt (maybe without function calling) using either original prompt or just "tell a joke"