Gradio chatbot: deployment, extracting conversation history, simulated conversations

Hello all,

as part of my Master’s in Cognitive Science I’m working in an interdisciplinary team. We want to test whether an AI assistant can improve the reading comprehension and metacognition of pupils around the age of 12. I’m not a developer by training. I’ve been learning the basics of Python over the last year or so and this is my first real-world project. Still, out of our team my programming was the best, so I took up the responsibility of working on the code for the chatbot.

I have found a chatbot on github using the Gradio library. I made some small adjustments to it and it has been working pretty well in our internal testing. However, there are some open issues that I can’t really solve by myself. I will describe these below. This is the code so far:

Code
from openai import OpenAI
import gradio as gr

client = OpenAI(
    api_key="###"
)

instructions = ''' 
// AI assistant role

You are a reading tutor for university students. Your tone is easy to understand, concise and encouraging. Your aim 
is to improve the metacognition of university students while they do a reading exercise. In the exercise 
students have to read 2 paragraphs from the text “Judgment under Uncertainty: Heuristics and Biases. Biases in 
judgments reveal some heuristics of thinking under uncertainty.” by Amos Tversky and Daniel Kahneman (1974).

// Context

These are the two paragraphs that all students will read during the exercise: 

“Many of the probabilistic questions with which people are concerned belong to one of the following types: What is the
probability that object A belongs to class B? What is the probability that event A originates from process B? What is
the probability that process B will generate event A? In answering such questions, people typically rely on the
representativeness heuristic, in which probabilities are evaluated by the degree to which A is representative of B,
that is, by the degree to which A resembles B. For example, when A is highly representative of B, the probability that
A originates from B is judged to be high. On the other hand, if A is not similar to B, the probability that A originates
from B is judged to be low.

For an illustration of judgment by representativeness, consider an individual who has been described by a former
neighbor as follows: "Steve is very shy and withdrawn, invariably helpful, but with little interest in people, or in the
world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail."
How do people assess the probability that Steve is engaged in a particular occupation from a list of possibilities (for
example, farmer, salesman, airline pilot, librarian, or physician)? How do people order these occupations from most to
least likely? In the representativeness heuristic, the probability that Steve is a Librarian, for example, is assessed
by the degree to which he is representative of, or similar to, the stereotype of a librarian. Indeed, research with
problems of this type has shown that people order the occupations by probability and by similarity in exactly the same
way. This approach to the judgment of probability leads to serious errors, because similarity, or representativeness, is
not influenced by several factors that should affect judgments of probability.”

// Steps to follow by AI assistant:

1. Ask students to predict what the text will be about, based on the title. Use prompts such as: “Before you start 
reading, take a moment to reflect on the title “Judgment under Uncertainty: Heuristics and Biases.” and make some 
predictions about what the main message or theme might be.”

2. Help students activate their background knowledge by asking questions such as “What do you already know about 
heuristics and biases? How could that help you understand the text better?”

3.Before students start reading, give them 3 further instruction that they should follow while reading. 3.1 Students 
should actively screen for unfamiliar words and concepts at all times. Use prompts like “As you read, keep an eye out 
for any words or concepts that are unfamiliar to you. Let me know if you find one and I will help with 
clarification.” 3.2 Students should keep their initial prediction from the beginning in mind at all times and monitor 
how accurate it is. 3.3 After they are done reading the first paragraph, students should briefly stop reading and 
tell you that they are done with it. Ask students if they understand these instructions.

4.If yes, tell students to start reading the first paragraph.

5. When students tell you they are done with the first paragraph, ask them to briefly pause and think about what it 
was about. Then ask them to provide a one sentence summary.

6. You give the students feedback on their summary. If the summary is bad, simply tell students 
why it is bad and encourage them to re-read the paragraph. 

7.You tell students to continue reading the second paragraph and to let you know when they are done with it.

8.After students are done with the second paragraph, you again encourage reflection ask for a one sentence summary. 

9. Just like before, you again give the students feedback on their summary.  You also ask students if their predictions 
were true or if there was anything that surprised them. Also tell them to scroll up to their initial prediction, if they don't remember it.

10. You conclude by thanking the students for participating in the exercise and tell them to let you know if there is 
any remaining open questions.

'''


def chat(system_prompt, user_prompt, model='gpt-4', temperature=0.0):
    response = client.chat.completions.create(
        temperature=temperature,
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ])

    res = response.choices[0].message.content
    return res


def format_chat_prompt(message, chat_history, max_convo_length):
    prompt = ""
    for turn in chat_history[-max_convo_length:]:
        user_message, bot_message = turn
        prompt = f"{prompt}\nUser: {user_message}\nAssistant: {bot_message}"
    prompt = f"{prompt}\nUser: {message}\nAssistant:"
    return prompt


def respond(message, chat_history, max_convo_length=1000000):
    formatted_prompt = format_chat_prompt(message, chat_history, max_convo_length)
    bot_message = chat(system_prompt=f'''{instructions}''',
                       user_prompt=formatted_prompt,
                       temperature=0.7,
                       )
    chat_history.append((message, bot_message))
    return "", chat_history


with gr.Blocks() as demo:
    chatbot = gr.Chatbot(height=300)
    msg = gr.Textbox(label="Prompt")
    btn = gr.Button("Submit")
    clear = gr.ClearButton(components=[msg, chatbot], value="Clear console")

    btn.click(respond, inputs=[msg, chatbot], outputs=[msg, chatbot])
    msg.submit(respond, inputs=[msg, chatbot], outputs=[msg, chatbot])  # Press enter to submit
gr.close_all()
demo.launch(share=True)

Now, these are the open issues I would need guidance on:

Deployment

The chatbot worked fine when only 1, 2 or even 3 people were using it at the same time. When we tested it with 20, however, it was incredibly slow and after some time often simply output an error message, forcing the user to reload and restart the entire conversation.
I assume this was because the server runs locally on my laptop and my hardware couldn’t handle all the requests at the same time. Is this correct? If so, we would of course need a different way for deploying it when we do our experiment at the school. If not, what else could the issue have been?
I have looked into the permanent hosting that hugging face offers, but I don’t like that this embedds the app on their website. Also, the hardware that comes with the free option doesn’t sound better than my M2 Macbook Pro, so I don’t even know if it would solve the problem.
Do you have recommendations for inexpensive or even free hosting that would give us our own website and would also be easy to implement for a beginner like me? Also, would it then even make sense to stick with the Gradio framework, or would I basically need to start from scratch with, say, Django?

Extracting conversations

In order to make our final results more meaningful we also want to analyze the conversations the pupils have with the bot. Chatgpt gives this relatively straight forward solution:

Code
from openai import OpenAI
import gradio as gr

client = OpenAI(
    api_key="OUR API KEY"
)

instructions = "You are a helpful assistant"

def chat(system_prompt, user_prompt, model='gpt-4', temperature=0.0):
    response = client.chat.completions.create(
        temperature=temperature,
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ])

    res = response.choices[0].message.content
    return res


def format_chat_prompt(message, chat_history, max_convo_length):
    prompt = ""
    for turn in chat_history[-max_convo_length:]:
        user_message, bot_message = turn
        prompt = f"{prompt}\nUser: {user_message}\nAssistant: {bot_message}"
    prompt = f"{prompt}\nUser: {message}\nAssistant:"
    return prompt


def respond(message, chat_history, max_convo_length=1000000):
    formatted_prompt = format_chat_prompt(message, chat_history, max_convo_length)
    bot_message = chat(system_prompt=f"{instructions}", user_prompt=formatted_prompt, temperature=0.7)
    chat_history.append((message, bot_message))
    
    # Export conversation to a text file
    export_conversation(chat_history)

    return "", chat_history


def export_conversation(chat_history, filename="conversation.txt"):
    with open(filename, "w") as file:
        for turn in chat_history:
            user_message, bot_message = turn
            file.write(f"User: {user_message}\nAssistant: {bot_message}\n")


with gr.Blocks() as demo:
    chatbot = gr.Chatbot(height=300)
    msg = gr.Textbox(label="Prompt")
    btn = gr.Button("Submit")
    clear = gr.ClearButton(components=[msg, chatbot], value="Clear console")

    btn.click(respond, inputs=[msg, chatbot], outputs=[msg, chatbot])
    msg.submit(respond, inputs=[msg, chatbot], outputs=[msg, chatbot])  #press enter to submit
gr.close_all()
demo.launch(share=True)

This does create a txt file in the project directory. The problem is that the file only contains the most recent conversation of the single most recent user. I’ve tried different solutions for this from both Chatgpt and Gemini, but nothing worked.
It seems clear that I need a way to identify each user. Having a dictionary outside any function with user ids as keys and individual conversation histories as values seems logical, but I don’t know how to implement it exactly. Also, ideally, the user would enter their id before they access the actual chatbot UI, but again, I don’t know how the implementation would work, or even where to start looking.
I would be grateful for any and all help on this.

Simulated conversations

I’ve basically been having the same issue as the OP in this thread: GPT Api simulates conversation with itself instead of talking with user. At some point the chatbot would simply output a simulation of the entire rest of the conversation with the user, including simulating user input, rather than continuing the conversation normally.
Following the advice given there I iterated over and changed the system prompt many times, with limited success. The only thing that ended up working, just before our internal testing in January, was changing the model to GPT 4. I wasn’t able able to consistently get rid of the simulation behavior with any other model.
However, since I started to work on the bot again in the beginning of March, the simulation now occurs from time to time even with GPT 4. I’m not sure what else I could do. Can I improve my system prompt still in any way?
In the thread linked there were dissenting opinions about the usefulness of stop sequences. I have read this document https://help.openai.com/en/articles/5072263-how-do-i-use-stop-sequences-in-the-openai-api at least 10 times, but I’m still not sure how to use them, in order to try and counter the simulation behavior. Do I add them as an additional parameter in my API call function? If so, what is the exact wording I need to use? And what is the sequence I should try to stop? "User: " ?
A few days ago I read a post on this forum complaining about the “dumbing down” of GPT 4. Could this be another instance of this dumbing down, meaning there isn’t really anything to do anyway?

Thank you all already for reading.

Tim

As luck would have it, all of this is fairly straightforward to do. You just need some key insights. I will whip up a prototype quickly.

1 Like

@icdev2dev did you manage to put something together? even pointers are greatly appreciated!

Hi @timknapp

While i haven’t had the time to put this together because of lack of time, these are my thoughts.

I believe in reducing the cognitive load of programming on your real ideas. This way you’re clearly able to see your ideas manifest themselves. I do understand that you’re new to programming; but i commend you that you have started on the journey.

(A) I think it is better to seperate out the frontend from the backend. I think about the frontend as the entity that talks to the browser. I use svelte because i find it easy enough to meld according to my needs easily. I think about the backend as the entity that talks to the openai. I use python in the backend. The svelte-python integration is through web-services and websocket calls.

(B) Think about modeling Users as Threads in the overall architecture. Each User is represented by a UserThread and the messages in them are conversation between the Assistant and the real end user.

(C) when you think about it like that, iterating over all the UserThreads gives you all the Users currently interacting with the Assistant. Exporting all the messages in one thread gives you the conversation of one user with the Assistant.

Ok. So now you don’t have , natively, the ability to list threads programmatically. However this repository (openairetro/examples/listablethreads at main · icdev2dev/openairetro · GitHub) gives you that ability.

By adding an attribute called userName to a thread can let you assign a user to a thread.

If you look at certain advanced examples in the same repo, you will see how to link backend to frontend.

I understand that this might be a little bit too much to absorb for someone who is new to python; but i appreciate your thoughts and motivation. So as time permits, i will complete this as soon as possible.

Hope this helps

Hello @icdev2dev

thank you for your answer! I checked out your repo and did find it a bit overwhelming. However, I will definitely look deeper into the threads idea and the Assistant’s API, over the Chat Completions one I’m currently using. If have the basics of this down I might revisit your repo and will report how it’s going for me.