Searching 300 embeddings to create top 10 list

chinmay1 · December 27, 2022, 12:35am

Here is an example: I have 300 embedding on a topic (lets say Chemistry). I want to create a summary comprising 10 top points in those 300 embeddings. How do I do that?

I have one solution in mind (below) but looking for better solutions.

I create 1-2 sentence summary of all 300 embeddings and pass those embeddings at one time as context with a prompt asking to create a 10 point summary. The assumption is that 1-2 lines summary of 300 embeddings would be in token limit (4000 tokens?).

Any other thoughts?

raymonddavey · December 27, 2022, 1:08am

If the summary of any individual embedding doesn’t change on a case by case basis, you could keep a one or two sentence summary of each each embedding with the full text you are using for the search

Then when you find the top 10 records based on the semantic search of the big text field, you can output your pre-saved summaries from the small text field

You could get GPT to summarize each embedding as a one off exercise and save them locally - or you could get a human to write them in advance.

This will only work if the summary for any given embedding doesn’t need to change based on the question.

If a single embedding is a large blocks of text that can answer lots of different questions (and the summary changes), then I think your idea might be the best option

This would only work if the summary (of a given embedding) doesn’t change

Topic		Replies	Views
Best method of injecting relatively large amount of context to be leveraged in a response API	10	11411	December 17, 2023
How do I summarise a block of text larger than the token limit? API	13	9076	December 17, 2023
Passing webpages to GPT-3 API	9	7289	December 10, 2021
Best practice for a big RAG API chatgpt	7	847	May 11, 2024
How can I send vectors as a chat context? Prompting embeddings	8	8588	May 15, 2023

Searching 300 embeddings to create top 10 list

Related topics