Generate embedding for a collection using individual embeddings

Hi,
Problem statement:-
I have a collection of different chats per customer where I have individual openai embeddings for each of the chats in the collection. I have to generate the embeddings for the whole collection with the help of individual embeddings as an input.
Please help if you have any views on this.

Besides just embedding the entire thing (Ada-002 has a massive 8k context), you could just average the embedding vectors, and renormalize by dividing out by the length.

The lossless version is create a matrix of all the embeddings for the conversation.

The fun animated version is have the embeddings along with their time stamps, and resample to a constant smaller unit time interval using Slerp, and project this path to 2D (using PCA?) for human visualization. Maybe one dot per second in the animation.

I think that would look super cool, especially with the right dataset.

1 Like