Is it possible to add new tokens before the embedding for Gen 2 Models?

Palosque · January 11, 2023, 7:48pm

Hello my Friends. I need to do some classifications tasks between words, and for this I have to tell the embedding which word is a Special Token. For example:

“This is an example of [ESPECIAL_1] and [ESPECIAL_2] tokens to be tokenized.”

I saw that GPT-2 on Hugging Face brings me the possibility of add news tokens. But for the second generations that uses the “cl100k_base” I don’t found anything about it. Even in “tiktoken” lib of OpenAI.

Anyone knows if this is possible? Or at the moment it is not present in the API?

Topic		Replies	Views
Using a Custom Tokenizer with GPT Embeddings API	5	4180	March 4, 2024
What are the custom special tokens in tiktoken/token libraries? Use cases? API token , gpt	1	5242	December 14, 2023
Create or customize models with more tokens API gpt-4 , gpt-35-turbo , chatgpt , fine-tuning , api	3	903	November 29, 2023
Embeddings for tokens used by GPT models? API	2	985	December 17, 2023
Enhance/train GPT-4 Model API	2	756	April 5, 2023

Is it possible to add new tokens before the embedding for Gen 2 Models?

Related topics