Lossy vector to text as compression

jpmartineau · April 26, 2024, 4:10pm

Hi,
I understand that the text to vector operation is a one-way operation and that it is not reversible. However, is it possible to convert an embedding into a shorter (could be gibberish) text that could be used in a system prompt to infuse information or context without taking up too many tokens…

In other words, Given a text-to-vector operation:
text_to_vec(“The quick brown fox jumps over the lazy dog”) → [31, 19, …, 62]

Is it possible to do a reverse vector-to-text operation that would yield an arbitrary string:
vec_to_text([31, 19, …, 62]) → some gibberish string like “fjeGUNjef5nJFN”

Where the text-to-vector operation on that string would yield the same (or a very similar) vector to the original one:
text_to_vec (“fjeGUNjef5nJFN”) → [31, 19, …, 62]

The purpose would be to replace a longer string with a shorter one (containing less tokens)

Does this make any sense?

curt.kennedy · April 26, 2024, 4:43pm

The best inverse you can do, so map a vector to text, is to take the vector, correlate it to previous vectors, find the closest one, and return the text behind this closest vector.

This is a pseudo inverse, not a real inverse.

As for shortening text, why not just have the LLM attempt this. Or you can try extracting keywords (with or without an LLM) and use this as your shortened text.

anon22939549 · April 26, 2024, 6:52pm

This spawned an idea…

Have a model generate shortened summary text, compare the embeddings, and repeat until you reach a certain threshold of similarity.

Repeat for a few thousand different bits of text, then fine-tune a model for this type of text compression.

Topic		Replies	Views
Is it possible to convert vector to text? API chatgpt	27	13417	June 1, 2024
Embeddings as model input API embeddings , api , prompt	3	2227	June 16, 2023
How can I send vectors as a chat context? Prompting embeddings	8	7794	May 15, 2023
Embeddings: Converting a embedded vector back to natural language? API embeddings	6	4056	January 14, 2024
Does openAI provide API that takes Embeddings as an input? API embeddings	10	3610	December 18, 2023

Lossy vector to text as compression

Related topics