Saving API cost in back-and-forth conversational chatbot



I have built a virtual assistant chatbot for casual conversation. Every time the user utters a statement, it is appended to the prompt and the GPT-3 API is asked the provide the bot’s response.

However, as the conversation progresses, the cost of generating per statement increases in O(n^2). This method is not viable for longer conversations, say 5 minutes. I can think of shortening the conversation by summarizing the history. But I am wondering if there is a native way to do that. Because with every utterance, I am passing the same prompt GPT-3 has already seen. It could have been more efficient to save that embedding and restart with adding only the new text.

Welcome to the community!

I’m no dev, but I do agree that, since embedding is so much more efficient, somehow caching and embedding the history might be a way to save resources?

Also, you might get better responses to posts like this in the #feedback category

From my experience thus far, creating a chatbot via the API with fine tuning and embeddings is a bit of a hill. For a start you only have, at best, the Davinci base model to work with, can’t add-on-to the 003 model (which is the base model but fine tuned with gahoomabytes of info).

For my application I am in the same boat, the best approach Ive found is to kind of summarise the conversation on my side rather than repeat it verbatim. A lengthy thread where John wants to know the price of milk I summarise in a one-shot prompt. You can achieve this by either coding it yourself or by using a side application of AI and asking it that summarise this chat in n’th sentences. Then pass the summary as a one-shot prompt. Is it perfect, no. Does it save on tokens, yes.

Good luck