Is it possible to convert vector to text?

I think you missed my point.

There are nearly infinite numbers of text sequences for every possible embedding vector. So, choosing at random a text sequence which maps to a particular embedding, you would nearly always choose a nonsense sequence.

definitely! But you’re not choosing at random, you’re using a model which can infer which of the embedding-sharing sequences is most likely, which is not going to be something random. (These results are backed up by data in the paper that I linked.)

I found some existing work on github: jxmorris12/vec2text
It is rundimantry but useful to refer.

newbie question: how do traditional word vectors convert between the word & the embedding (is “embedding” the right word?) example: project glove from stanford nlp (can’t link in my first post)

is it just a vastly smaller space, and they’re doing the same kind of “brute force search”, or is it fundamentally different in a way that makes this possible?

EDIT: in the word vectors case is it literally just (1) each word has an associated vector (2) search the dataset for what word(s) have the closest distance to the given vector…?

I read below and understand that there is no good way to decode or reconstruct the text that went into the encoding.

However, I have a hunch that I may try to code and see what get.

If you average the vectors of King and Queen, or you extend the difference between King and Queen beyond Queen, what would that vector mean?

To solve this problem, what if you took the vectors from words to find the closest vector, then added words, or rearranged words to get a closer vector. I wonder what the closest approximation to the vector would contain.

Has anyone tried this? Am I hallucinating?

Basically… What you’re proposing very loosely related to the idea identifying the basis vectors for the embedding space and decomposing an arbitrary vector into a linear combination of these basis vectors.

Then you’d be able to say this vector represents this set of concepts in these proportions.

Now, this is a gross oversimplification and in practice it doesn’t really work this way, but you’re not hallucinating either.

The core conceit of your idea is rather sound, it’s just not really meaningful in practice.

It might be interesting to see what concept is halfway between a dog and a cat … Thanks for the feedback.

Well, OpenAI created for a demo a huge tranche of embeddings from Wikipedia,

https://web.archive.org/web/20231211005422*/http://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip

You could probably use that as a rough starting point.

Find the midpoint between cat and dog then find the ten closest embeddings to that point.

I’m pretty sure these are ada embeddings though, so just be aware of that.

1 Like