Could not automatically map text-embedding-3-small to a tokeniser

encoding = tiktoken.encoding_for_model("text-embedding-3-small") gives error:

KeyError: 'Could not automatically map text-embedding-3-small to a tokeniser. Please use `tiktoken.get_encoding` to explicitly get the tokeniser you expect.'

Which is quite strange because as per docs it is a supported model.

A new version of tiktoken was released a few days ago.

Try a pip install --upgrade tiktoken against your python environment.

MODEL_TO_ENCODING: dict[str, str] = {
    # chat
    "gpt-4": "cl100k_base",
    "gpt-3.5-turbo": "cl100k_base",
    "gpt-3.5": "cl100k_base",  # Common shorthand
    "gpt-35-turbo": "cl100k_base",  # Azure deployment name
    # base
    "davinci-002": "cl100k_base",
    "babbage-002": "cl100k_base",
    # embeddings
    "text-embedding-ada-002": "cl100k_base",
    "text-embedding-3-small": "cl100k_base",
    "text-embedding-3-large": "cl100k_base",
    # DEPRECATED MODELS
    # text (DEPRECATED)
    "text-davinci-003": "p50k_base",
    "text-davinci-002": "p50k_base",
    "text-davinci-001": "r50k_base",
    "text-curie-001": "r50k_base",

Thanks. Just realized that new version of library does not reflect in notebook unless I restart it.

1 Like