Preparing the dataset for embeddings

Process:

  1. Break the document into chunks (Embeddings have token limits)
  2. Create the embedding with OpenAI
  3. Store data in vector database
  4. Create an application to query data
  5. Create embeddings for queries in real-time
  6. Display response data

The most challenging part is preparing the data, my advice is pay attention to token limits when creating chunks.

2 Likes