How to know whether RAG data (json files, no db) was used?

I am adding some text examples to my API call. That is, I ask the LLM to create some small English texts and I send some example texts along with my prompt.
I want to see whether the LLM actually uses it.

With books etc. I would just ask to state the page where it took the information from. However, in my case, I don’t ask the LLM to answer questions but to create texts. Thus, very difficult to me to understand how much my example texts influence the LLM.

LLMs are not capable of telling you where they got their information from, specifically, but you can be sure if you put something in your context/query then it was indeed used, although it’s still possible to have a bad context format that doesn’t leverage the AI as well is it could’ve if you had formatted your context better. Hope that makes sense. RAG is all about simply figuring out what to put in your context.

1 Like

Hi @eivu , the only option here is to add one more step - an arbitrary model that would rate, let’s say from 0 to 10, on how the generated output relates to one of the provided examples. So you would have two outputs - the generated text and a ranking with the most relevant example.

thanks! okay, understood, that is a pity.
Is there some logging I can enable such that I would know whether I have bad context format?

thanks! oh clever, that makes sense. I’ll try that!

All I really meant by “context format” was whether you have a sensible prompt that the LLM can handle ok, because of course sometimes prompts can be not phrased that well. But there’s no specific technical meaning that I meant other than that.