An empty list was declared:
prompt_messages = []
And then when invoking the method process_gpt()
to send to the API initially (perhaps to make it say hello automatically), two list items were added, the equivalent of:
prompt_messages = [
{"role": "system", "content": system_msg},
{"role": "user", "content": file_prompt}
]
and then another message is added to the list, with the AI response, giving the list a total:
prompt_messages = [
{"role": "system", "content": system_msg},
{"role": "user", "content": file_prompt},
{"role": "assistant", "content": nlp_results}
]
Meaning that we can use list indexes to access them, showing in shell:
>>>prompt_messages[0]
{"role": "system", "content": system_msg}
after employing a different method bot() to basically send in the same manner, another user message and AI result is added:
prompt_messages = [
{"role": "system", "content": system_msg},
{"role": "user", "content": file_prompt},
{"role": "assistant", "content": nlp_results}
{"role": "user", "content": prompt},
{"role": "assistant", "content": nlp_results}
]
and so on.
That’s dandy an all, but what if we want to cut off some old messages? Then the system message disappears also, unless we first remove or copy it, or do some significant calculations. Or what if a user input failed to get a result? The method already added it as having happened.
Therefore it makes a lot more sense to have the persistent system prompt separate from a history of messages, and a history separate from the user input that is provisional. Then a single API accessing method can be used.
Let’s define the system message globally and name the past chat better than “prompt messages” as it also contains replies, and even function calls:
import openai # use the whole library, for errortypes, types etc
system = [{"role": "system", "content":
"You are ChatExpert, a large language model AI assistant"}]
chat = []
Now how about a single function that can take a user input, instead of two functions that are almost the same. You will notice that where I send the messages below, I can now distinctly merge the separate lists containing the system message, the past chat where I use a list range chat[-10:]
(with a start position measured from the end to limit the number of past turns), and then the newest input.
def chat_with_ai(user_input):
user = [{"role": "user", "content": user_input}]
client = openai.Client()
try:
response = client.chat.completions.with_raw_response.create(
messages=system + chat[-10:] + user, # chat is limited
model="gpt-3.5-turbo", max_tokens=256,
temperature=0.5, top_p=0.5, stream=True)
reply = ""
for chunk in response.parse():
if not chunk.choices[0].finish_reason:
word = chunk.choices[0].delta.content or ""
reply += word
print(word, end ="")
# here you'd collect tool or function call chunks
chat.append({"role": "user", "content": user_input})
chat.append({"role": "assistant", "content": reply})
return True
except Exception as e:
print(f"\nAn error occurred: {e}")
So we have a simple solution that maintains all the chat, but only sends some of it if too long. You could change the length sent at any time. But, it doesn’t check if the messages add up to more tokens than the AI model can handle.
Streaming is used. Raw response gets headers. Where it prints, that could be output to a UI or client app.
Adding the new user input and assistant output to a chat history is only done upon success. Let’s use the success status return creatively in our main chatbot loop (not pictured in the original post) so you can even retry the request.
first_success = chat_with_ai("Welcome the user to the chatbot")
last_input = ""
while True:
user_input = input("\nPrompt: ")
if user_input == "exit":
break
if user_input == "Y" and last_input:
user_input = last_input
success = chat_with_ai(user_input)
if not success:
print("Enter only 'Y' to resend, or type a new prompt")
else:
last_input = user_input
The original poster wanted to AI to say something first, so a single line does that - or errors without typing.
If there were a user manual, it would tell you that typing exit
as your prompt will exit the loop and the program, and that entering only upper-case Y
will send the same thing again any time (and the AI sees you sending the same thing again only on prior success).
Improvements for you (since this isn’t the “learn to code” forum.)
Where the chat[-10:]
is now, a function call chat_to_send(chat, budget, turns)
would allow better or changing chat limit settings to be employed each turn.
You can have a much smarter function that prepares the chat amount you send from the total chat history.
- It can count the tokens of each message with the
tiktoken
library and account for overhead of 4 extra tokens each, storing that metadata along with the message.
- It can have a token budget of the maximum tokens you want to send, also considering the user input and system message and any reservation for the response, adding chat history turns further back until the budget would be exceeded.
- It can ensure pairings of user input to outputs in tailing the chat.