I can see I can select up to GPT-4o and 4o mini models, however, does anyone know if this still applies to the latest models, or there is a newer page with the current models on?
Thanks for taking the time to report this. It seems that currently, it only applies to the models listed there plus models that use the o200k-base encodings, which is what gpt-4o uses.
I will keep this post updated in case of any developments.
gpt-4o-2024-05-03 through the latest chat models all have used o200k_base. Thus, a choice of “gpt-4o” on OpenAI’s page is what you need to pick unless using models with the former cl100k (gpt-4-turbo-2024-04-09 and before).
token_encoder = (
"cl100k_base"
if (
model == "gpt-4"
or model.removeprefix("ft:").startswith(
("gpt-3", "gpt-4-turbo", "gpt-4-")
)
)
else "o200k_base"
)
“text-embedding…” models are cl100k_base for measuring how much you can send, until a newer model were released.
This alternate site and my link to it makes the token encoder by name clear, and also provides token numbers (except special tokens used internally are supposed wrong there)