Chat completion or completion endpoint for multi turns?

Hi Everyone!

This post aims to discuss how to overcome the lost memory for multi turns of QA with the ability to get external context.

There are two ways of doing this:

  1. Chat Completion API, where you send the API with the list of messages. Each user message will have {Context} and {Question}.
  2. Completion API, where you store the messages as concatenations of message in string, like you can format like History : {Role} : {Content}.

For my scenario, I went to 1 first, as it seems to be newer. But I would suffer from lost memory base on the wording of my the latest question. For example, if I say summarize the conversation, gpt remembers the conversation. But If I say, what was my last message, it didn’t know.

I do not intend to get the best formula here, just curious to know if anyone has come across before, what is your opinion?

for more details: (Don’t know why I could not add links here…)

semantic-kernel
issues
4707