It is kind of the best way to reduce the token using distributed prompting structure? If im building a professional customer service using GPT4. I can not educate the gpt all of the information he needs to know every single input bc apparently API doesn’t have a memory. Is there anyone familiar with this kind of structure construction?
I hope I understand the point right but my understanding is that you are looking to retrieve information to respond to customer query?
This is where you’d use embeddings-based Q&A. You store your knowledge in the form of embedding vectors. When you get a user query, you then convert the query into a vector as well and perform a vector search against the stored knowledge vectors to retrieve the vectors that are most closely related to the user query and thus is likely to contain the answer to the query. You then inject the information associated with the vectors as context into the API call to arrive at a response to the user.
You should have a look at this OpenAI cookbook example to better understand the overall logic of the approach as well as read up on OpenAI’s embeddings documentation.
Thank you sooooo much~ Great to know~