I’m trying to setup a chatGPT-like bot, but the cost of each API call gets too expensive really fast, since I need to send the full chat back to the API so it recognizes its own conversation.
What are the best practices to make less expensive API calls?
Welcome to the OpenAI community @Gabriel.ZO
To achieve that, you can use summarize the pervious conversations, store them, and then send only the conversations that are contextually relevant to the user’s message using embeddings.
This will greatly reduce token count which goes up rapidly with every message.
Sorry to piggyback but this is something I’m working on as well. Can you elaborate on when/how to summarize and is there some sort of dummies guide to embeddings because they seem to be the go to answer around here but apparently their use is way over my head. All the examples are in python (which I do know a bit) but I’m using the orhanerday php library and I’m just not sure what to put in the input and what to do with all these numbers it gives me back and how that in any way helps! I looked at the classification example and it involves csvs and comes back with a graph, how does that tell me if the user is asking for a search warrant or found out about the secret affair ?(I’m making a murder mystery game) - I’ve currently got that part handled, probably not in the most cost efficient way but it will do for now.
Like the OP though, I’m sending the whole conversation history in the messages . I start with a system message that outlines the personality and constraints of the character and some backstory etc then its just user message, assistant message alternating - obviously getting exponentially more expensive with each message. The context is important as it allows the character to respond appropriately based on things its already said or things the user has previously asked.
I’m already storing every message in the database with the role and content.