What is the tokenizer used for openai text-embedding-3-large?

The tokenizer used for text-embedding-ada-002 was cl100k_base. What is the tokenizer used for the new embedding model openai text-embedding-3-large ?

Also, anyone have any feedback on it’s performance so far?



Performance thread: It looks like 'text-embedding-3' embeddings are truncated/scaled versions from higher dim version

TLDR: :thinking:

I don’t know if there’s a TLDR yet, it’s complicated. They’re certainly different. I do recommend you check out the thread! :laughing:

edit: another eval thread: New OpenAI Announcement! Updated API Models and no more lazy outputs