Embeddings for tokens used by GPT models?

Orome · April 15, 2023, 6:05pm

Where can I see the embeddings used for the tokens that a given GPT model uses? (Not the embedding for an arbitrary prompt supported by the embeddings endpoint (using text-embedding-ada-... models).

Or are the embeddings I get back from the API (if I were to supply a token rather than an arbitrary string) the ones learned by GPT.

I’d guess the answer is that one cannot retrieve the embeddings used by GPT (since they are a valuable part of what the model learns), but that then begs the question: What are the embeddings returned by the embeddings endpoint? In what sense are they relevant to GPT?

curt.kennedy · November 1, 2023, 1:30am

You could try piecing together theories by looking at open source transformer models, or your own models.

But since the latest GPT models are all private, who besides the OpenAI engineers know?

They aren’t transmitting the internal buffers and arrays in the API from these private models. It’s a black box.

But for fun, you could embed each token, create a vector library of all tokens, and then spin your own model based on this mapping of tokens to vectors.

Topic		Replies	Views
Using a Custom Tokenizer with GPT Embeddings API	5	3869	March 4, 2024
Embeddings as model input API embeddings , api , prompt	3	2505	June 16, 2023
Does openAI provide API that takes Embeddings as an input? API embeddings	10	4288	December 18, 2023
Learning token embeddings API codex	0	539	May 10, 2022
Embedding tokens vs embedding strings? API	12	8333	February 11, 2024

Embeddings for tokens used by GPT models?

Related topics