I assume this is how Embeddings work and that OpenAI is only needed to Embed and process the match between user prompt and similar chunks? I.E. steps 2, 4 & 8:
- Take blog posts and cut them up into smaller parts at about a maximum of 512 tokens - this is called “chunks” of natural language.
- Embed each natural language chunk via OpenAIs “embeddings” API - this creates an “embedded vector” which is a list of numbers that represent the natural language “chunks”.
- Take each natural “chunk” that has been converted to an embedded vector and insert them into a vector database with the related natural language chunk indexed against it.
- Create a front-end app that takes a user prompt and embeds it, again via OpenAIs “embeddings” API.
- Get the front-end app to take the embedded vector for the user prompt and search it against all embedded vectors in the vector database.
- Use a mathematical cosine similarity function to measure how similar the user prompt is compared to all natural language chunks in the database - this is done by comparing embedded vectors, not natural language chunks.
- When the best match in terms of similarity between the user prompt as an embedded vector and all chunks as embedded vectors are found, get the front-end app to find the natural language chunk index against it.
- Get the front-end app to deliver back the found natural language chunk to OpenAI via the GPT 3.5 chat model this includes system message, original user prompt and relevant blog post chunk.
- Get the front-end to take the output of the GPT 3.5 chat model and deliver it to the user.