Which encoder does the GPT-4-Turbo family use for tokenization? In specific I am interested in the correct encoder of gpt-4-1106-preview. It seems to be cl100k_base, the same as for GPT-4, but I can’t find it documented anywhere.
Help would be greatly appreciated.
1 Like
Hi,
It makes use of CL100K_Base for the encoder.
Thanks Foxabilo,
do you have a source by any chance?
