Clustering Vectors from VectorDB

When we upload a large file to ChatGPT it gets chunked, indexed and the model receives a few snippets from the Document — that should give the model a good overview of its contents.

I believe these snippets are chosen after running a clustering method on the vectors and then each snippet is the centroid of each cluster.

In this way the model gets an overview of what the document is about.

It would be cool if there was a way to do this with the API.

We can do this manually today using the text embeddings API, but if we have to embed images too things get complicated.