What is the tokenizer used for openai text-embedding-3-large?

cl100k_base

https://platform.openai.com/docs/guides/embeddings/how-can-i-tell-how-many-tokens-a-string-has-before-i-embed-it

Performance thread: It looks like 'text-embedding-3' embeddings are truncated/scaled versions from higher dim version

TLDR: :thinking:

I don’t know if there’s a TLDR yet, it’s complicated. They’re certainly different. I do recommend you check out the thread! :laughing:

edit: another eval thread: New OpenAI Announcement! Updated API Models and no more lazy outputs

2 Likes