How do you handle text embedding ranking?

gedean.dias · June 23, 2023, 8:19am

I’m currently studying laws, and as part of this process, I need to understand and query references. To this end, I break down documents into units called “sections”, and store each law in a separate YAML file. I then use ChatGPT to summarize each section and generate a title for it. The “title”, the “summarized text”, and the “original text content” are then embedded and incorporated into the YAML file.

Whenever a prompt is entered (by me, for instance), the text of the prompt is embedded and its similarity is compared with the “title”, “summarized text”, and “text itself” of each section. Following this, the mean similarity calculation for the “title”, “summarized text”, and “text itself” is computed, giving me the average similarity for each item.

I then base my selection of text chunks on their ranking in terms of similarity. Could you suggest any improvements to this process?

Topic		Replies	Views
Idea for cheaper & more powerful text embeddings (DocGPT.io) Community api , vector-db	2	1126	August 8, 2023
Passing webpages to GPT-3 API	9	7200	December 10, 2021
Embedding and searching from similar embeddings API	6	6215	October 27, 2023
Optimizing AI Document Retrieval: Embedding vs. Prompting API embeddings , gpt-4	2	1609	January 31, 2024
Questions about the embedding-based chatbot API embedding	4	80	December 15, 2024

How do you handle text embedding ranking?

Related topics