Ok, a bit of a complex question here - but I’ll do my best to explain.
So, we have a simple proof-of-concept for a WhatsApp bot which essentially works so that it simply relays all of the user’s messages to our OpenAI assistant, and sends all of the assistant replies back to the user. And this works very nicely, except the bot always sends back just plaintext messages.
It would be really nice if we could have the bot receive and also send back the types of rich messages that are supported in WhatsApp and other platforms, like:
- A location
- A list of options to select from
- An image
- etc.
I’m not quite sure how to best add these capabilities in a reliable way, but here’s a few thoughts
- Receiving rich messages from the user
This part should be quite easy to achieve - we can always just make a text representation of these. For example, if the user sends a location, we can turn it into text like “Location: lat: {lat), lng: {lng}”, add that to the thread, and the bot can probably handle that quite nicely.
- Having the bot reply with rich messages
I assume part of the solution is to add functions for the assistant, like sendLocation
which it can call when it decides it wants to send a location instead of a regular text reply. The function handler can then send a rich message to the WhatsApp user, via the WhatsApp API.
However, this means that we are adding messages to the WhatsApp conversation, but not to the OpenAI thread which represents the discussion between the user and the bot - is this a problem, or is the Assistant able to reason about the fact that it called the sendLocation
function, which triggered an extra message to be sent?
Let’s say the user asks something about the location that the bot sent to them - is the bot even aware that it just sent that location? I would assume it could be, since it does have knowledge that it called that function in the previous run - and I also assume it considers that kind of context when formulating its responses.
Thoughts? If anyone has played around with these sorts of implementations, would be very cool to hear your experiences!