Multiple messages per turn in Chat Completions API

ACMC · March 7, 2024, 1:18pm

Hello!

I’m building a chatbot using the Chat Completions API.

The format of a conversation is usually like this:

System: [system prompt]
User: Can you write an email for me?
Assistant: Sure! What do you want to talk about?
...

However, I’m looking to model more natural conversations - such as the ones you’d have when you text a friend. Something like this:

System: [system prompt]
User: Hey!
User: How are you?
Assistant: It's a great day today
Assistant: I've been doing this and that
Assistant: What about you?
...

The challenge I’m facing is that the Chat Completions API generates only one completion at a time, and even if I can call the endpoint multiple times… When should I stop generating new assistant messages?

Some ideas I had:

Getting the API to generate completions as a user too. If the next message says “user”, then I know there’s a turn change and I simply discard that message .
Training an encoder model to predict turn changes. It would get the whole conversation as an input and output True/False to indicate whether there’s a turn change (=don’t generate more assistant messages - wait for the user’s response).
Sentiment analysis. If the assistant’s response is an enquiry/question, then assume there’s a turn change.

Those are some workarounds, but maybe there’s another (simpler) way to do it? I’d love to hear your ideas!

Diet · March 7, 2024, 1:41pm

Welcome to the community!

Sounds like a cool project.

My question is, how would this scenario

System: [system prompt]
User: Hey!
User: How are you?
Assistant: It's a great day today
...

ever be able to happen? (the user getting two turns)

does your agent have an artificial cold start time?

All that said, multiple agent turns seem pretty straight forward: just fake them.

you can insert some control sequences into the output. that would even allow you to fine tune the behavior if you wanted to.

It's a great day today
¬<<sleep 2s>>
I've been doing this and that
¬<<sleep 1s>>
What about you?

Including a rare symbol like ¬ will let your parser know when to stop streaming and when to start parsing.

something along those lines.

what do you think?

ACMC · March 7, 2024, 1:58pm

Thanks for the smart ideas!

To answer your first question, the user will write any number of messages and then press a button/write a special command to “trigger completion”. This could also be enhanced by using a cold start time, as you say.

On the second proposal, I do think that’s a nice way to solve it - the only issue is that I would need to fine-tune the model (again) to have it output <new-message> tokens. Essentially, we’d be turning the system into a one-message-per-turn architecture, which would work.

If there are no better possibilities, I’ll probably end up doing what you suggested:

System: [system prompt]
User: Hey! <new-message> How are you?
Assistant: It's a great day today <new-message> I've been doing this and that <new-message> What about you?
...

My only additional concern would be the <new-message> confusing the system. In other models, it’s possible to define custom tokens that can be added to the foundation model’s dictionary (maybe that’d have been a cleaner alternative). However, I can’t do this here.

So I’ll need to introduce some kind of special token like the one you suggested, or the one I’ve used in the example. It’s probably best to stick to a single token (instead of mine, which would likely be broken down in several tokens), for cost reasons and also because I’d imagine it would cause a reduced drop in performance when starting the fine-tuning process. What do you think about this?

Diet · March 7, 2024, 2:11pm

I’d skip the fine tuning and try to work with prompts. It was just an idea.

And if you don’t plan on using instant streaming, then we can skip the special token too and opt for using a schema.

you are chatterbot. you are simulating realistic, natural chat conversations. A participant can send multiple messages in a row, in the following schema:
{string|number}[]

the string represents a message, the number a pause (in seconds).

here’s an example:

user message:
[“hello.”]
your response:
[“Hi! How are you?”, 3, “It’s a great day today, isn’t it?”]

Your answer always needs to be JSON compliant. Always start your answer with [

something like that.

icdev2dev · March 7, 2024, 2:39pm

There are a couple of issues here from a generic use case perspective.

The first: in general, the architecture could be smarter not to limit it to user-trigger (“press button”). In your use case, it might be legitimate. However imagine porting it to Whatsapp, where you have no control of when the user decides to enter the new text.

The second: in my opinion, chats should be plain old text…not metadata embedded within text that now must be parsed out. Needless complexity.

The solution is to have debouncing (simply put delay sending to chat completion for duration; so that you can check if there are additional messages). Debouncing WAS difficult till the advent of AssistantApi. You can use AssistantApi to convert to Chatcompletion with debouncing. I will whip up a quick poc in a day or so.

In the meantime here’s my high level post (Switching from Assistants API to Chat Completion? - #2 by icdev2dev)

O2O2O2O2 · January 21, 2025, 8:30pm

I have this problem a lot. I work for a project where we’re building a chatbot to gather data from users into a database but users are mostly older folk who sometimes tend to send a singe idea in 3 to 5 different messages [aka 3 to 5 inputs for the assistant]

ACMC · January 21, 2025, 9:05pm

In case that helps, I ended up going for the list-based approach suggested above + fine-tuning to make it more like the type of responses I’d expect to get. I’m very happy with the results

O2O2O2O2 · January 22, 2025, 5:12pm

Could you provide an example of how it turned out, please?
I’m still having issues

ACMC · January 23, 2025, 10:01am

Sorry if I haven’t been clear enough. The structure is very simple, actually:

System: [your prompt of choice]
User: ["msg_1", ..., "msg_n"]
Assistant: ["msg_1", ..., "msg_n"]
User: ["msg_1", ..., "msg_n"]
Assistant: ["msg_1", ..., "msg_n"]

Hope it helps!

Topic		Replies	Views
Question around the structure of system, user, assistant API	3	3233	January 10, 2024
Reply to multiple messages with one single message API	11	214	November 7, 2024
Correct format for dataset in chat model fine-tuning API fine-tuning , documentation	4	1758	January 9, 2024
Training gpt-3.5 to autocomplete for a niche domain and a specific writing style Community chatgpt	13	1436	July 25, 2024
OpenAI Fine-Tuning: Multi-turn Dataset Examples API openapi , fine-tuning , gpt-3	6	8363	December 14, 2023

Multiple messages per turn in Chat Completions API

Related topics