The tokenizer used for text-embedding-ada-002 was cl100k_base. What is the tokenizer used for the new embedding model openai text-embedding-3-large ?
Also, anyone have any feedback on it’s performance so far?
The tokenizer used for text-embedding-ada-002 was cl100k_base. What is the tokenizer used for the new embedding model openai text-embedding-3-large ?
Also, anyone have any feedback on it’s performance so far?
cl100k_base
Performance thread: It looks like 'text-embedding-3' embeddings are truncated/scaled versions from higher dim version
TLDR:
I don’t know if there’s a TLDR yet, it’s complicated. They’re certainly different. I do recommend you check out the thread!
edit: another eval thread: New OpenAI Announcement! Updated API Models and no more lazy outputs