Document Sections: Better rendering of chunks for long documents

stevenic · September 5, 2023, 8:03pm

So in a lot of cases order doesn’t matter but if you’re asking the model “what are the steps to do XYZ?” then order can matter a lot.

One of the bigger issues is that even when things like steps are numbered you may not get all of the steps back. The approach I’m taking is to generally ignore the text predicted by the chunks and instead use the chunks to build a heatmap of sorts to identify the most relevant parts of the document. Then the goal is to retrieve full spans of text centered around those spots.

Topic		Replies	Views
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10847	September 3, 2024
RAG is failing when the number of documents increase API	35	19512	December 17, 2024
The length of the embedding contents API	48	34765	December 13, 2023
BERT better than Ada 002? API embeddings , api , ada002	11	6777	November 13, 2023
Embeddings giving incorrect results API	27	8038	September 16, 2023

Document Sections: Better rendering of chunks for long documents

Related topics