Could not automatically map text-embedding-3-small to a tokeniser

pknerd · February 13, 2024, 8:53pm

encoding = tiktoken.encoding_for_model("text-embedding-3-small") gives error:

KeyError: 'Could not automatically map text-embedding-3-small to a tokeniser. Please use `tiktoken.get_encoding` to explicitly get the tokeniser you expect.'

Which is quite strange because as per docs it is a supported model.

_j · February 13, 2024, 9:02pm

A new version of tiktoken was released a few days ago.

Try a pip install --upgrade tiktoken against your python environment.

MODEL_TO_ENCODING: dict[str, str] = {
    # chat
    "gpt-4": "cl100k_base",
    "gpt-3.5-turbo": "cl100k_base",
    "gpt-3.5": "cl100k_base",  # Common shorthand
    "gpt-35-turbo": "cl100k_base",  # Azure deployment name
    # base
    "davinci-002": "cl100k_base",
    "babbage-002": "cl100k_base",
    # embeddings
    "text-embedding-ada-002": "cl100k_base",
    "text-embedding-3-small": "cl100k_base",
    "text-embedding-3-large": "cl100k_base",
    # DEPRECATED MODELS
    # text (DEPRECATED)
    "text-davinci-003": "p50k_base",
    "text-davinci-002": "p50k_base",
    "text-davinci-001": "r50k_base",
    "text-curie-001": "r50k_base",

pknerd · February 14, 2024, 6:18am

Thanks. Just realized that new version of library does not reflect in notebook unless I restart it.

Topic		Replies	Views
Tokenizers for davinci-002 and babbage-002 API	2	1840	August 23, 2023
NewConnectionError keeps coming up over a .tiktoken file API chatgpt , plugin-development , api	5	6697	June 19, 2024
HOW TO RESOLVE THIS AttributeError: module 'tiktoken' has no attribute 'get_encoding' API	2	3141	September 6, 2023
Using a Custom Tokenizer with GPT Embeddings API	5	3640	March 4, 2024
Can't enable 'text-embedding-3-large' API embeddings	3	2142	February 6, 2024

Could not automatically map text-embedding-3-small to a tokeniser

Related topics