Can someone make embeddings make sense? (Not what you think, more in post, lets discuss!)


So based on my understanding, when you use a vector database to store user messages and then retrieve them, you’re stuffing your prompt with tons of embeddings that make your tokenage skyrocket, and they add to the prompt every time a message is sent as opposed to just once.

How is this in anyway good for the developer? I’ve realized I’m doing the same thing in house with my bot where I use Google Sentiment analysis to preprocess the users message, and then inject prompts based on the keyword and sentiment analysis, I.E. if a user expresses they don’t like their coworker, go to X array, and inject X prompt.

The catch? It only adds onto THAT message, so the next prompt will not have that added information, so you’re only adding one sentence when its relevant to the prompt, not every single time, which is much better and doesn’t cost your or your users more money.

I just wanted to discuss better ways to go about doing embeddings. What if you stored user info in local storage on the browser? is that a security risk? Why use these vector databases that aren’t reliable entirely anyway when you can just store and retrieve information based on your own in house software?

Sorry if this is a stupid discussion I’m just trying to say I think we’re all being had with this whole “embeddings” thing - AT LEAST - in terms of a “chatbot” - Not necessarily for businesses with specific documents or anything, its great for that.

But in terms of a chat bot - i just don’t see how embeddings the way they want you to do it makes any sense, there has to be a better way to save tokenage instead of passing a ton of embeddings based on if the user mentioned their coworked kevin before, it just doesnt make sense.

I recognize this might be a ridiculous sentiment.

Knowledge you want the AI to have in order to answer must be given to the AI.

Embeddings and a vector-based database is just a way to retrieve your information.

You are talking about the AI knowing what you were talking about before? Then it must know what you were talking about, every call.

Using embeddings to fulfill even older chat recall gives you a management strategy to not have an instant point in time where everything is forgotten. And there are plenty of techniques less naive than you describe.

“They” don’t want you to do any embeddings in any particular way as a solution. Conversation is hardly mentioned at all in documentation. Passing any or no prior conversation as a way to give illusion of memory is up to your inventiveness.


The way that made sense to me was this:

Vector databases allow things that are similar to be put in a similar place, imagine you had piles of things that were like a fish, well a squid would be sort of close? not super close but, in the same rough area, but a goldfish would be very close to the general topic of fish, same with a tuna. That is what vectors allow, they allow the underlying similarity of things in a much more complex and nuanced way to be stored by location, and things that are very different are further away than things that are similar. If you then generate a vector for a particular group of words you are interested in, you can ask the database for all the things that are “close” to your words.

It allows you to search not by keyword but be key meaning, it makes it super powerful in finding information related to the things you are interested in, when used in combination with AI to find contextually relevant material to use when prompting… you can see how useful that would be. With experimentation, you can supply the AI with lots of useful contextual information to whatever the user is asking about, and pull that all from your documentation.

1 Like

I think one way to understand the power of embeddings is that if you have no budget constraints and your user/chatbot dialogue context is less than the max context window for that LLM, then there’s no reason to use embeddings.

However, if you want to eliminate extraneous content from the chat history to save cost or fit into a limited context window, then using embeddings to pull the most relevant chat history into the context window along with the current prompt, that can be an effective way to manage these constraints.

Ignore the embeddings part for now. Imagine you want to ask an LLM to summarize everything about the bank of Japan in the past month. You have access to Bloomberg’s articles for the past month, which is about 150,000 articles. The naive approach would be to stuff every article in the prompt and ask the LLM to filter just the BoJ articles and summarize them. That would not work because the prompt has a limit.

Imagine you had an oracle that could retrieve just the say 20 important articles from Bloomberg about the BoJ. That might actually fit into the prompt. Your prompt would be, “here are the 20 articles from Bloomberg about the BoJ in the past month, write me a 500 word summary”.

The way you implement the functionality of that oracle is to index all 150,000 articles, then have some mechanism to retrieve just the relevant 20 articles for your search. You could start with searching for keywords. That’s what early search engines started with, and you would probably be able to solve the problem pretty decently. But to get even more fancy, you’d create an embedding for each article (or even better, every paragraph of every article), then you’d be able to look up those articles/snippets by a phrase like “bank of Japan” thru a vector database. You might pick up articles that don’t mention the key phrase “Bank of Japan”, but do mention related concepts like “Japanese inflation”.

Here’s how I’d explain embeddings, in simplest terms;

An “embedding” is an array of numbers. Full stop. That’s the actual content of a single embedding. For any input sequence (characters, words, paragraphs, etc) the LLM can give you the embedding for that sequence.

The ‘magical’ thing about embeddings, that makes them useful is the ‘semantic similarity’ aspect. There’s a function called “cosine similarity” (actually a Dot Product in Higher Dimensional space, but you don’t need to know anything about that). What this means is any two embeddings will have a mathematical ‘cosine similarity’, that can be used to see how similar the concepts are. So for example ‘cats, dogs, pets’ will be more similar to ‘fido, kitty, pet store’ than to ‘car, bus, motorcycle’. I used single words rather than sentences or paragraphs for purposes of clarity.

So you can take for example a large corpus of text, break it into ‘paragraphs’ and generate the ‘embedding’ for each paragraph and store all the embeddings of all the paragraphs into a database. Vector Databases are capable of automating the ‘cosine similarity’ and doing ‘nearest match’ searches thru that database for some ‘search embedding’ you want to search for.

So you can simply take some user input query, run a nearest match in the Vector DB, to identify the most closely associated paragraphs, and then take like the top 10 paragraphs of text and insert those into a prompt to sent to GPT. That’s the whole process in a nutshell. I’m new to this myself, so someone correct me if I got any of that wrong.

1 Like