Is it possible to convert vector to text?

anon22939549 · August 6, 2023, 11:05pm

I think you missed my point.

There are nearly infinite numbers of text sequences for every possible embedding vector. So, choosing at random a text sequence which maps to a particular embedding, you would nearly always choose a nonsense sequence.

jxmorris12 · August 6, 2023, 11:34pm

definitely! But you’re not choosing at random, you’re using a model which can infer which of the embedding-sharing sequences is most likely, which is not going to be something random. (These results are backed up by data in the paper that I linked.)

jsunster · November 23, 2023, 1:03am

I found some existing work on github: jxmorris12/vec2text
It is rundimantry but useful to refer.

omar.sameh.shehata · March 28, 2024, 10:35pm

newbie question: how do traditional word vectors convert between the word & the embedding (is “embedding” the right word?) example: project glove from stanford nlp (can’t link in my first post)

is it just a vastly smaller space, and they’re doing the same kind of “brute force search”, or is it fundamentally different in a way that makes this possible?

EDIT: in the word vectors case is it literally just (1) each word has an associated vector (2) search the dataset for what word(s) have the closest distance to the given vector…?

mark.storer · May 31, 2024, 11:10pm

I read below and understand that there is no good way to decode or reconstruct the text that went into the encoding.

However, I have a hunch that I may try to code and see what get.

If you average the vectors of King and Queen, or you extend the difference between King and Queen beyond Queen, what would that vector mean?

To solve this problem, what if you took the vectors from words to find the closest vector, then added words, or rearranged words to get a closer vector. I wonder what the closest approximation to the vector would contain.

Has anyone tried this? Am I hallucinating?

anon22939549 · June 1, 2024, 12:44am

Basically… What you’re proposing very loosely related to the idea identifying the basis vectors for the embedding space and decomposing an arbitrary vector into a linear combination of these basis vectors.

Then you’d be able to say this vector represents this set of concepts in these proportions.

Now, this is a gross oversimplification and in practice it doesn’t really work this way, but you’re not hallucinating either.

The core conceit of your idea is rather sound, it’s just not really meaningful in practice.

mark.storer · June 1, 2024, 2:26am

It might be interesting to see what concept is halfway between a dog and a cat … Thanks for the feedback.

anon22939549 · June 1, 2024, 2:40am

Well, OpenAI created for a demo a huge tranche of embeddings from Wikipedia,

https://web.archive.org/web/20231211005422*/http://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip

You could probably use that as a rough starting point.

Find the midpoint between cat and dog then find the ten closest embeddings to that point.

I’m pretty sure these are ada embeddings though, so just be aware of that.

Topic		Replies	Views
Embeddings: Converting a embedded vector back to natural language? API embeddings	6	4063	January 14, 2024
Converting embedding vector to text API embeddings	16	15206	July 24, 2024
Embedding model's dimension API chatgpt	16	13691	July 31, 2023
`text-embedding-ada-002` API	23	16495	February 6, 2024
Lossy vector to text as compression GPT builders embeddings , prompt-optimization	2	325	April 26, 2024

Is it possible to convert vector to text?

Related topics