Is it possible to add new tokens before the embedding for Gen 2 Models?

Hello my Friends. I need to do some classifications tasks between words, and for this I have to tell the embedding which word is a Special Token. For example:

“This is an example of [ESPECIAL_1] and [ESPECIAL_2] tokens to be tokenized.”

I saw that GPT-2 on Hugging Face brings me the possibility of add news tokens. But for the second generations that uses the “cl100k_base” I don’t found anything about it. Even in “tiktoken” lib of OpenAI.

Anyone knows if this is possible? Or at the moment it is not present in the API?

1 Like