pknerd
February 13, 2024, 8:53pm
1
encoding = tiktoken.encoding_for_model("text-embedding-3-small")
gives error:
KeyError: 'Could not automatically map text-embedding-3-small to a tokeniser. Please use `tiktoken.get_encoding` to explicitly get the tokeniser you expect.'
Which is quite strange because as per docs it is a supported model.
_j
February 13, 2024, 9:02pm
2
A new version of tiktoken was released a few days ago.
Try a pip install --upgrade tiktoken
against your python environment.
MODEL_TO_ENCODING: dict[str, str] = {
# chat
"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"gpt-3.5": "cl100k_base", # Common shorthand
"gpt-35-turbo": "cl100k_base", # Azure deployment name
# base
"davinci-002": "cl100k_base",
"babbage-002": "cl100k_base",
# embeddings
"text-embedding-ada-002": "cl100k_base",
"text-embedding-3-small": "cl100k_base",
"text-embedding-3-large": "cl100k_base",
# DEPRECATED MODELS
# text (DEPRECATED)
"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"text-davinci-001": "r50k_base",
"text-curie-001": "r50k_base",
pknerd
February 14, 2024, 6:18am
3
Thanks. Just realized that new version of library does not reflect in notebook unless I restart it.
1 Like