Keep document source / link with RAG & GPT workflow

Hi everyone,

Quick question here. I have a simple workflow with a text file that I use as a DB. I use a RAG to retrieve information from that DB & GPT to refine the information.

My text DB is basically composed of multiple documents with different sources / origins.

Here is my challenge. When I retrieve a piece of text from my DB, I’d like to indicate the source of that piece of text (which basically comes from an original document from a certain website).

I don’t think that indicating those sources in my text DB would help (i.e. putting the source after each paragraph for instance).

Currently my text DB is turned into chunks that are later vectorized & stored into a chroma DB. Instead of having a global DB that I split in chunks, should I rather use separate documents that I vectorize and to which I add metadata before putting them into chroma DB? (so basically have a repo of separated documents that I put into Chroma instead of having a single central document?)

What would be the best way to organize a DB to achieve that goal? Are there any best practices for that?

Thanks a lot for your help