I’m wondering how everyone send the prompt to the server, along with the chat history.
What I did is to connect each question and answer and separated with \n,
The 1st prompt is sent as is, the following questions will be like this:
Q\nA\nQ\nA\n … Q
if we need to pass previous prompt into next API call ( to preserve the context of conversation ),
does it meant we will quickly hit the token limit? no?
2nd Question, is this the method we break through the single call token limit :
You will need to write code as a developer to manage your “conversation” messages and to trim or summarize your content as you reach the max_tokens limit.
Yes, this is a current limitation (4096) which I guess will be raised at some point in time in the future.
I am on the way to the gym soon, and I’ve almost finished the first draft adding this chat method to my lab, and here is how I manage it:
Assign a “chatid” to each new chat sessions or continue from a prior chatid.
Save each chat completion as a new row in the DB, which creates a new chatid if this is a new chat, or use an existing chat it if an existing chat was selected.
Include a “description” column in the DB row so we can remember what is each chats is about and use that along with the chatid to select a prior chat session.
Add columns for usage to each DB row (for chat content) and calculate that for each session and use that number to determine how to trim, manage the session as you get close to the max_tokens limit.
To be honest, I will not finish this last piece until later tonight or tomorrow as I’m heading to the gym soon to get pumping up my muscles, haha.
I’m not sure about this as I have not implemented the steam option today for chat, but will later on.
HTH
So far, for me today, only this part is done (below). I need to tweak a bit and finish the dialog / session management part later. Off to the gym…