How can I use Embeddings with Chat GPT 3-5 Turbo

Hello everyone,

Using the Embeddings API with Davinci was straightforward. All you had to do was add the embeddings results in the prompt parameter along with the chat history, user question, and so on. However, with the new format of the Chat Completions API, I’m having trouble figuring out how to do it. I tried adding a new object like this: {role: “context”, “content”: <embeddings results>} in the ‘messages’ parameter, but this causes the request to fail with a 400 code.

Does anyone know how to properly use embeddings to expand the model’s knowledge with GPT-3.5 Turbo?

Thank you.


I am also struggling with this. If I find a solution, I will post it here.


I guess one way could be to add it as a user request of the sort:
{role: "user", content: "Context:\<embedding results>. \n\n\n\nQuestion: \<user's last question>"}

But this feels a bit hacky and wondering if there is a better way of doing it


This is how I did it

{"role": "system", "content": "You are a virtual tourism advisor, these are your notes: here comes the text that helps the chatbo to create answers "},

Yea I thought about adding it there as well. However, in the api documentation they state that “In general, gpt-3.5-turbo-0301 does not pay strong attention to the system message, and therefore important instructions are often better placed in a user message.”.

Everything comes down to the definition of important instructions I guess. In any case, if there is no formal way of doing it I might use a variation of the above. Thanks


The best resource I found so far is the OpenAI cookbook. I am checking if by tweaking the method described int his post will work: Question answering using embeddings-based search | OpenAI Cookbook

The idea is not to send the embeddings to GPT-3.5 Turbo. Instead, use the embeddings to find the closest text from the document and then send that as part of the prompt.


Yea yea, my question is where exactly do you put the context (i.e., closest text returned by the embeddings api) in the messages parameter of the chat completions api. There are 2 options so far:

  1. as a user role
  2. as a system role (alongside with the instructions)

I am not sure if any of these options is the official way of doing it. Hell, I am not even sure if there is an official way of doing it

I tried to write with “role system” and it worked. On the other hand, I think that if you write with “role user” it may happen that chatgpt responds with “You are right…” or if you write with “role assistant” it may reply “As I already said…”
but these are my thoughts and sooner or later I will find the right solution.


I think I’ll follow your way until a better solution comes up. Thanks

Check this post: ChatGPT API 101 — A Beginner’s Guide | by Skanda Vivek | Mar, 2023 | Towards AI

That’s interesting. They pass everything (instructions, context, history) as a “role user”. I wonder if this way performs better than also passing “system” and “assistant” roles.

I haven’t tried it, but I assume that we can use the api as we use Chatgpt. It would then look something like this:

{"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Read this content and confirm with \"...\" that you understand. Answer all my questions based on this text TEXT : \" The new Chat API calls gpt-3.5-turbo, the same model used in the ChatGPT product. It's also our best model for many non-chat use cases; we've seen early testers migrate from text-davinci-003 to gpt-3.5-turbo with only a small amount of adjustment needed to their prompts. Learn more about the Chat API in our documentation. Pricing : It's priced at $0.002 per 1K tokens, which is 10x cheaper than the existing GPT-3.5 models.\""},
        {"role": "assistant", "content": "I read"},
        {"role": "user", "content": "how much does it cost to use the chatgpt api?"}

it works in Playground, so I assume that it can also be used for the API

Have you found a different way to use embeddings than the role method? What is the best way you have found ? I’m having the same problem and I’m looking for a solution.

1 Like

Currently I am using the “system” way. I am passing the instructions as well as the context through system like so:

{"role": "system", "content": "\<Instructions>\n\n\n Context: \<context>"}

Although I have not tried all the methods mentioned above, this approach has been consistently effective for my needs.

It would be nice to receive an official response from someone at OpenAI regarding this question, rather than having to resort to what it seems to be non-standard methods.


In general, gpt-3.5-turbo-0301 does not pay strong attention to the system message, and therefore important instructions are often better placed in a user message. since this link said so, it made more sense for me to use the user role, and it seems that I will need to use it exactly like this.

{"role": "user", "content": "Context:<embedding result> \n\n\n Questions:<user's last question>"}

If anyone finds a new method up to date, I would be very glad if they write here.

1 Like

Thanks for the ideas discussed here. The way I understood it is that:

  1. Use embeddings to break knowledge into context-chunks
  2. Find the most relevant context-chunk that corresponds to a query
  3. Pass the context-chunk to gpt-3.5-turbo to generate a human sounding answer

However, don’t we lose a key feature of 3.5 when we go down this path, i.e. remembrance of the conversation-context? Example: if the first question was “How much is this Adidas shoe?”, the gpt-3.5 will answer: “This shoe costs $35”. But if the follow-on question is “Is it available in blue?”, won’t step #2 fail (i.e. when we want to “find the most relevant context-chunk corresponding to the query”)?

Natively, gpt-3.5 will know what “is it available” means. But once we take on the task of “providing context” to gpt-3.5, doesn’t it become our problem?


It only fails if the software developer fails to properly manage an array of messages, which means tracking / storing the prior messages in an array, updating the message array with new messages, and pruning the entire message array based on a pruning strategy.


1 Like

The issue is not in passing the right array. As part of the System/User message-content that we pass to gpt-3.5, we would need to pass the embedding output also.

For a question like “Is it available in blue?”, the embedding output may be of poor quality because the question (which has only a pronoun) may not match anything relevant in the document (or may match multiple). Passing that to gpt-3.5-turbo might result in GIGO.

1 Like

This I do not understand. Please enlighten me.

Embeddings are created for the purpose of performing linear algebra (like the dot product) with another embedding vector.

These language models cannot perform linear algebra, so why would you send embedding vectors to a language model?


1 Like