Assistants + rich messages - discussion/ideas/tips

Ok, a bit of a complex question here - but I’ll do my best to explain.

So, we have a simple proof-of-concept for a WhatsApp bot which essentially works so that it simply relays all of the user’s messages to our OpenAI assistant, and sends all of the assistant replies back to the user. And this works very nicely, except the bot always sends back just plaintext messages.

It would be really nice if we could have the bot receive and also send back the types of rich messages that are supported in WhatsApp and other platforms, like:

  • A location
  • A list of options to select from
  • An image
  • etc.

I’m not quite sure how to best add these capabilities in a reliable way, but here’s a few thoughts

  1. Receiving rich messages from the user

This part should be quite easy to achieve - we can always just make a text representation of these. For example, if the user sends a location, we can turn it into text like “Location: lat: {lat), lng: {lng}”, add that to the thread, and the bot can probably handle that quite nicely.

  1. Having the bot reply with rich messages

I assume part of the solution is to add functions for the assistant, like sendLocation which it can call when it decides it wants to send a location instead of a regular text reply. The function handler can then send a rich message to the WhatsApp user, via the WhatsApp API.

However, this means that we are adding messages to the WhatsApp conversation, but not to the OpenAI thread which represents the discussion between the user and the bot - is this a problem, or is the Assistant able to reason about the fact that it called the sendLocation function, which triggered an extra message to be sent?

Let’s say the user asks something about the location that the bot sent to them - is the bot even aware that it just sent that location? I would assume it could be, since it does have knowledge that it called that function in the previous run - and I also assume it considers that kind of context when formulating its responses.

Thoughts? If anyone has played around with these sorts of implementations, would be very cool to hear your experiences!

1 Like

You could do that with function calling or you could mention a custom syntax within the instructions to the Assistant to insert content within messages.

Then you can simply write a middleware that detects and parses the content from the assistant messages, and finally publishes the message with the rich content.

e.g. [location:latitude,longitude]

How reliable is this in your experience? I guess the prompt would have to mention that in these cases the bot should respond with ONLY this special syntax in the message body, otherwise it might get a bit hard to parse if it’s e.g. somewhere in the middle

Or maybe not, I guess it’s quite easy to split a message when detecting special syntax like this.

Markdown is supported ‘out of the box’ - and there are a lot of ways to convert the markdown to html?
Here’s an example of how I format to HTML:

    def markdownResponse(self):
        ''' returns the response with markdown formatting - convient for rendering in chat like responses
        '''
        if self.response != None:
            extension_configs = {
            'markdown_link_attr_modifier': {
            'new_tab': 'on',
            'no_referrer': 'external_only',
            'auto_title': 'on',
            }}
        return markdown.markdown(self.response,extensions=['tables','markdown_link_attr_modifier'],extension_configs=extension_configs)

Yes, and it’ll also be part of messages in the thread.