Embedding - Usage for all GPT text to number applications?

Hello, I came across the link, https://platform.openai.com/docs/guides/embeddings. I wish to know for what purpose these embeddings are used(Is it for RAG or any other application). Is it also used for inputting the text in building the GPT models?

4 Likes

Embeddings are generally used for a semantic similarity search.

By algorithmic comparison between the return values of two different text passages, you can receive a score of how similar they are in topic or subject, or finer levels of the AI understanding.

1 Like

Hi @rao.ranganaths !

You can use embeddings for various applications:

  • Similarity Search: For example, let’s say you have a product description, and you would like to find other “similar products” based on their descriptions. You encode your source product description (we call this the “query”) using the embedding model, and we also encode all the other product descriptions using the same embedding model. We then perform some kind of vector/trigonometric calculation (e.g. cosine similarity) between the query embedding, and the embedding of each of the other products. We retrieve the one(s) that have the smallest geometric distance, or closest similarity. We can also use a nearest neighbour algorithm.
  • Information Retrieval (including Question Answering): In fact, this is the “R” part of “RAG”. Same as above, except you take documents, “chunk” them in some uniform manner (e.g. per-sentence, per-paragraph, per-page) and do the cosine similarity or nearest neighbour between your query/question and the embeddings of the different “chunks”
  • Clustering and Topic Modelling: You embed all your texts and run a clustering algorithm like K-Means and plot the clusters, or use the clusters for some inferential purpose. Texts here don’t have to be just documents, they can be data about your customers/users, expressed in textual format - so this can be applied to user clustering/segmentation too!
  • Classification Models: You use the embedding as a first layer/representation in some classification model (e.g. simple neural network with softmax, or logistic regression model). The classifier can be for example for classifying product descriptions into some pre-defined categories, or for sentiment analysis.
3 Likes

When you have an huge databases, is more flexible to use embeddings than text, easy to handle and manipulate.

But when you have something smaller you can use text (in some cases to make some chunks and use multiple function tools).

I would recommend you to FAISS and OpenAI embeddings (they are amazing → and is the only thing I can’t complain from OpenAI). :trophy:

@rao.ranganaths If you want to see more click on @platypus profile and click on his link (he got some nice articles there :wink:, i’m sure they will help you)

1 Like
1 Like

Thanks for the reply. I know a bit about embedding used for the mentioned applications. What I wish to know is in the context of GPT. In the RAG usecase, the document is converted into embedding along with the query text.

  1. Is it for this we use embedding and then pass it on to the ChatGPT?
  2. In the link provided(https://platform.openai.com/docs/guides/embeddings), are these the only embeddings allowed to be used for RAG when using ChatGPT?
  3. Is it this embedding used for converting the text to vectors which are used for building the Base LLM like GPT?

Hi, hope you’re still with us, and sorry for inattention.

First, let’s allow AI to get us on the same wavelength, clearing up some of your terminology.


Extracted AI Terms and OpenAI Names:

  • Embedding
  • RAG (Retrieval-Augmented Generation)
  • ChatGPT
  • Vectors
  • Base LLM (Large Language Model)
  • GPT

Clarifying Glossary:

  1. Embedding:
    How used: Some kind of universal term for converting text to data in any AI use case.
    What it actually is: Embeddings are numerical representations of text (or other data types) that encode meaning in a lower-dimensional space. They’re used for tasks like semantic search or clustering, but they don’t build the “Base LLM” or the model itself. Embeddings are more like a tool you use with an LLM, not to create it. (Technically, embeddings are a lower-level layer of AI)

  2. RAG (Retrieval-Augmented Generation):
    How used: Just some AI process that takes embeddings and uses them in ChatGPT.
    What it actually is: RAG is a technique that enhances large language models by fetching relevant external information (documents, databases) at query time to improve response quality. It’s separate from the training or construction of an LLM—it’s a clever trick that boosts how models handle queries by retrieving up-to-date or domain-specific info.

  3. ChatGPT:
    How used: The LLM itself, or some endpoint that uses embeddings directly.
    What it actually is: ChatGPT is an AI chatbot web interface, OpenAI’s consumer product, powered by models like GPT-4 or GPT-4o. It’s not the model itself but the product built around the LLM, designed for conversations. The embeddings they mention aren’t “passed” to ChatGPT directly; the system might use embeddings internally for tasks like semantic search, but it’s not some plug-and-play process.

  4. Vectors:
    How used: A magical intermediary between text and model training.
    What it actually is: Vectors are arrays of numbers (representing embeddings) that LLMs use to process information. These aren’t used to “build” the model but rather to represent data during tasks like search or clustering.

  5. Base LLM (Large Language Model):
    How used: A model that’s built using embeddings.
    What it actually is: The Base LLM is the core model (e.g., GPT-4) that has been pre-trained on vast amounts of text data. Embeddings aren’t used to build this model—pre-training is done through tokenization and backpropagation across billions of text samples. Embeddings might be used after the model’s trained for certain applications, but they’re not foundational to creating the LLM. (A true “base model” is a completions engine such as davinci-002, not trained to chat).

  6. GPT (Generative Pre-trained Transformer):
    How used: A model that somehow runs on embeddings.
    What it actually is: GPT refers to the architecture of the model. It’s a transformer-based model trained using large datasets to predict and generate text. It doesn’t “use” embeddings to generate its base architecture; it’s trained through token sequences and a lot of math, using attention mechanisms—not embeddings, which are just one possible tool used with an LLM after the fact.


If you are discussing ChatGPT - all of this is transparent to you on OpenAI’s chatbot on the web. You can upload a file, and it does use a document extraction, chunking, and semantic search technology the AI can invoke knowledge search. That is instead of providing the AI model the entire document.

In ChatGPT, you don’t have a choice (except in the edge case of creating a “GPT” in ChatGPT Plus, a misuse of the original name for a custom prompted agent, that uses actions (functions) to call an internet API you provide).


For Retrieval-Augmented Generation (RAG):

  1. Source Knowledge:
    You start with a body of information, often in the form of documents, articles, or any structured data that you want the AI to reference. This data acts as your knowledge base.

  2. Breaking it into Snippets:
    The knowledge is broken down into smaller, manageable chunks—like sections of a document or web pages. These smaller pieces make it easier for the AI to retrieve relevant information later, rather than dealing with an entire document at once.

  3. Embedding the Text:
    Each chunk of text is passed through an embedding model. The embedding process converts the text into a vector—a numerical representation that captures the semantic meaning of the text. These vectors are essentially lists of values in a multi-dimensional space, where similar texts end up closer to each other.

  4. Storing in a Vector Database:
    The resulting vectors, along with their corresponding text snippets, are stored in a vector database. This specialized database is designed to efficiently store and retrieve these vectors.

  5. Similarity Search via Embeddings:
    When you need to retrieve relevant information, a new piece of text—such as a user query or input—is also embedded, generating its own vector. The vector database then compares this new vector with all the stored vectors, quickly identifying the most similar pieces of text based on their proximity in the vector space.

  6. Top Results for AI Context:
    The top-ranked results from this similarity search are fed to the AI, expanding its context window with the most relevant information. This allows the AI to answer queries with more precise, informed responses, as it now has access to external, domain-specific knowledge that supplements its base training.

Putting It All Together:

This setup—using RAG—enables you to create an AI-powered system that combines general language understanding with specialized, up-to-date knowledge. You can use various APIs and models for this process, choosing an embedding provider that fits your needs, along with the vector database software or service of your choice to handle efficient storage and retrieval. The flexibility in your choices allows you to tailor the system for specific applications or domains.