Hi there, from what I learned is that OpenAI embedding models use average pooling to get the sentence embeddings, is there any possible way that we can get the token-level embeddings and test on different pooling methods ourselves?? thank you
[To answer your direct question, you can use tiktoken to get only the token and send that to the API.]
I’ve been trying to get that information.
Where did you learn that they use average pooling? My quick test points towards some slightly weighted average.