Retrieval plugin: metadata and large documents

I am working with the plugin and different vector databases with good results.
But I want to share and discuss some ideas to improve results with large documents.

The plugin’s standard method of splitting documents into chunks of 200 tokens can be useful. But it depends on the structure of the documents. In some cases with very long texts, the connection between titles and subtitles can be lost.

Adding metadata like notes or titles can improve the model response?

In other words. Adding metadata allows the model to have more context to elaborate a response to the user? Or metadata is only useful for filters?

Depends on your implementation. You could certainly include metadata with your relevant chunks if you find that works better.

1 Like

In the tests I did, I understand that the response JSON is read by the model to generate its own responses and also takes into account the metadata (if it is well structured).
Perhaps the point to continue exploring is the weight given to that metadata in the answer.