Could anyone explain how OpenAI embedding works? As in how are a stream of tokens transferred into one single vector? And more importantly, how to convert this vector back into the original text? I know this is in principle doable because there are literatures on gradient-based adversarial attacks (e.g., https://aclanthology.org/2021.emnlp-main.464.pdf) on nlp tasks but I have not found a clear explanation yet.
Embeddings: send text. Receive multi-dimensional vector.
Compare vector with prior vectors and algorithm (cosine or dot product) to get score of how similar they are.
Similarity is based on AI understanding of the language using AI training, in many internal ways hard to describe.
They cannot be reversed into the original language. Even trying to determine what one value of 1536 represents would take fuzzing immeasurable texts to find optimum activations or negations.
Researching AI internals on a model hundreds of times smaller: Language models can explain neurons in language models
OpenAI microscope, to research what is represented in image AI: Zoom In: An Introduction to Circuits
I see. But intuitively I think it is possible to convert the embedding vectors for individual tokens back to tokens (i.e., before positional embedding)?
You can think what you want.
There isn’t “positional embeddings”. The whole text is considered as a single language entity and the internal state it puts the AI in, and a single vector is returned.
You could send all 100,000 BPE tokens for the trained language model individually. Each would have a vector return that is unique. Individual tokens could be reversed using that dictionary based on the same model, but 100,000^2 trials for merely two token sequences?
Many single token vector elements found similar would have unobvious meaning. “aub” similar to “egg” and “veget”? Because aubergine?
Vectors aren’t exposed. they are used internally. Do you have some ultimate question?
In my last reply, I did not imply I wanted to find the original sentence anymore. In the adversarial attack example, I believe they also did it token-wise. My goal is to generate tokens that follow a specific distribution of my interest in embedding space.
The adversarial “attack” doesn’t seem to have usefulness or merit, nonsense like “we assume the attacker has access to the model’s training data”.
Input token modification is exactly what a user of a LLM is supposed to be able to do, and they get the results they deserve, whether from a language model or its embeddings engine that may serve internal tasks:
Tell me a story please
Tell me a story a**hole
zhan has a beautiful wife
zhan has a murderous wife
The “results” shown as example are changing one part of output that can be simply the non-deterministic nature of LLM.
“generate tokens in embedding space”? Are you currently trying to break entailment?
While the question is definitely interesting I would put the answer in different terms.
When looking at reconstructing the text from a vector it is a comparison to a (de-)compression method and not a transformation into a semantic representation of the meaning. And while vector embeddings have been around for a while I am not aware of any archiving app that uses vectors to compress text. This is a good hint that the task is definitely not trivial.
Next, yes, training data and even the complete model with weights and parameters and whatnot are available in the open source space. Thus, yes, researchers have tried to recreate the embedded text for example using AI models trained especially for this task. They wrote nice and somewhat intriguing papers but these approaches come with a lot of strings attached which leads to results like “in theory it should be possible”.
But from the perspective of practically developing actual apps that make use of such a technology the answer is a lot closer to a hard “no”.
You can take a look at this previous discussion where the intricacies are spelled out in more detail:
That is perhaps the original use of it in the context of the adversarial attack setting. That is not what I intend to do, I just used it to make a point that such conversion is possible. I am interested in observing certain properties of LLMs.
My guess is the compression part happens at the positional encoding stage? Given my rough understanding of how it works, I can see why it is such a lossy compression and not reversible. So from the second reply onwards, I relaxed the setting to discuss whether it is possible to convert the vectors of the individual tokens back.
I think inverting embeddings to recover language can result in weird tokens showing up in the output, especially for LLM’s, but might do OK on your own custom models, with super-limited vocabularies.
For example, here I am embedding a series of images to a 2D vector space. I can also pick a random 2D vector and get back an image. So the embedding is invertible!
But look at the images below, especially where the shirt morphs into the shoe in lower image. This is the crud you can get with words, and get it really bad since words generally don’t nicely “morph” into each other.
To put it another way … What is the word between “vector” and “GitHub” … what is it? It’s likely not even a word! Just as the images above morph into each other, and while they are morphing, they are undefined things trapped between states of existence.
So inverting would look a lot like putting the LLM at temp = 2 and watching the garbage flow out.
Hi, thanks for sharing. May I clarify how did you do image embedding in your setting? I presume it is not a transformer so is it something like a VAE’s latent space? I am okay with limited vocab size and non-readable outputs since i am just trying to make some observations.
It’s the VAE latent space.
Here is how it could work for you in the model: You invert your own limited vocabulary custom model. It might make sense given the limited vocabulary.
Here is how it works for everyone currently (including probably you): You embed a bunch of stuff from normal text. You pick a random vector, and find the closest thing in your collection, and call this the inverse.
Since the embedding engine doesn’t have the “/decoder” endpoint, you are essentially left with a hash function, and have to “guess” to get close to something.
You could try spinning your own, and see what comes out of the latent space though! Just not through OpenAI’s models.
Cool, I’ll try and see what happens. Thanks for sharing!
I was listening to Mr Wolfram the other day and he said that the vectors we humans find interesting are 1 in every 10^600 that are out there. The was the “Ahh” moment for me, at least if I understood his point correctly.
Curt’s "what is the word halfway between “Vector and Github” sort of clicked at that point, sure, in a 10^600 solution space… I bet there is a pretty accurate block of a few hundred quadrillion “words” that are for all intents and purposes “exactly” half way
Humans are bad at large numbers and exponentials, the fact we even contemplate reversing an embedding vector demonstrates this. Not a criticism of the OP, just an observation on the self importance humans place on their exploration of less than nothing.
Yeah, I also saw one of his interviews on a podcast and his Ruliad space idea. I can agree that space is infinitely larger than human natural language (or even including all conceivable abstract ideas), but I think it still depends on whether we have set a constraint on just using a tiny subspace of it. From an information-theoretic viewpoint, if the compression is not lossy, we can in principle convert it back to natural language, but I am not very familiar with the inner workings of transformer to say if that is the case before the positional encoding stage (positional encoding is unlikely to be lossless).
Receiving “vehicle damage pattern”
Answer similarity to: police chase, intoxicated, tornado, hurricane, microbursts, orographic lift, squall, lightning
Reverse to input:
Map daily water temperatures of the Atlantic.