My 2 cents on the state of plugins

I’m posting this in the community forum because 1) I don’t use the public facing ChatGPT service so I’m not a plugin user, and 2) this is purely my opinion on the value of plugins in-general based on my initial observations (and 9 years of building chat bots.)

  1. If we think of plugins as the LLM equivalent of phone apps I think they can be made to work. An individual user needs to hand curate the plugins they use in much the same way you hand curate the apps installed on your phone. There will be 20 different plugins that know how to talk to a PDF so you can’t just let the system pick one at random. The user needs to select their PDF plugin of choice.
  2. This promise being promoted that you can take any API and map it into a plugin by simply adding a manifest is just that a promise and a false one at that. Your plugin API needs to be hand crafted for whatever experience you’re exposing to the user. Simple GET based API’s might result in an ok experience but these models don’t understand about multi-turn experiences where you need to collect more information before calling a given API. These 1st gen plugin experiences are exactly that. They’re 1st gen. They have a long way to go to be what I would consider truly conversational.
  3. If we’re thinking about plugins as being the chat equivalent of phone apps then what we should really be talking about are multi-turn agents, not plugins. You want to launch an interactive conversation with an Agent that you’ve hand curated in the same way you hand curate the apps on your phone. Plugins aren’t that experience. I’m sorry, they’re not…
3 Likes

I guess what I’m suggesting is that plugins are meh… And asking, can quickly move past the plugin phase of things and get on to Agents which can actually do useful shit for you?

The only plugin that I am aware of that can access multiple apps is Zapier. Not being a ChatGPT user either, but it does seem like you have to select, and then it talks to the plugin. Maybe plugin power users can chime in here?

I’m sure there are plans to scale the plugins to at least intelligently select what the user has “installed”. If not , it should be on the Plugin roadmap!

One thing I forgot to mention was the possibility of the Plugin API. If they create it in a way that API users (like us) can intelligently decide which Plugin to send the data too, it could be awesome! But yeah, just a stub off of ChatGPT makes it limiting for us API builders.

It’s less about the ability to call multiple plugins, and more the single turn nature of the current crop of plugins. That’s going to fly for a minute or two. These plugins will need to be multi-turn to stick and then you’re going to need support for features like interruption and everything we’ve learned building alexa, google home, siri, and cortana. The current approach isn’t going to work for that. I’m sorry it’s just not.

Ok, so are you saying you can only send one thing to a plugin, and then you need to “re-select” it to sent it another thing? Not sure I’m following.

Yes… Single turn is easy. You send the model a request, it maps the users intent to the appropriate plugin, you dispatch that intent. easy.

Multi-turn is super hard. You need to start a conversation with a skill/agent, and then for each turn of the conversation you need to remember the skill/agent you’re talking to. You need to know when the user is done talking to the skill/agent, you need to track where the user is within the conversation with the skill/agent (conversation history), you need to detect when the user wants to switch skills/agents. Multi-turn is infinitely more complex.

If you can send a request, and it maps users intent to the appropriate plugin, then why do I need a deeper history?

This system is “stateless”, which is good right? Why is state important here? Just trying to understand. What is a good reason why we need state?

Lets use a concrete example… You want to book a flight from Denver to Seattle. You tell GPT, “I want to book a flight from Denver to Seattle”. It selects the “BookFlight” plugin. That plugin doesn’t have enough information to go on so it needs to ask additional questions. It asks “when do you want to leave?” You don’t want to direct the response to that at a different plugin so you you first need to remember you’re in the middle of booking a flight and should direct future request to the BookFlight plugin. When do you stop forwarding messages to that plugin? You need to know when the conversation has ended so issue #1.

Lets say the user then says, during the middle of booking a flight, “oh I’m also going to need a car.” That’s not something the BookFlight plugin is going to understand how to do and is what we call an interruption. You need a mechanism for first detecting interruptions and then deferring them to be handled later. Modern chat bot frameworks support this but its non-trivial.

These are all solvable issues but I don’t see the current approaches to plugins as solving them. They’re missing so many features that I don’t know where to start with my critique.

1 Like

I don’t think you can build a multi-turn system that stateless and have it work. It doesn’t seem possible to me or at least in 9+ years I haven’t worked out how to do it.

1 Like

Plugins, with very few exceptions are kinda trash.

The biggest complaint I have though is there is no information associated with them.

No documentation detailing how they can be used, no examples illustrating their benefits, zero information associated with any privacy policies or lack thereof.

There’s no ability to sort, rate, review, hide, or filter plugins.

There’s no indication which plugins you can use out of the box or which require a user account on the plugin site.

There’s no information available about the plugin developer.

Plugins are a great idea but without some stringent controls on the part of OpenAI they will become a desert of abundance.

There will be so many pointless, narrow, single use plugins it will make it impossible for new, interesting, novel plugins to surface.

All-in-all, I’ve been extremely disappointed in plugins. We don’t need seven different “chat with your PDF plugins” which all fail to meaningfully differentiate themselves and none of which can effectively, accurately, and reliably parse technical PDFs with equations, algorithms, tables, or plots.

Nor do I see the utility of a plugin specifically for earthquake information in the Philippines.

We certainly don’t need 5+ real estate plugins built around individual websites (including one specifically for Ontario). Why not just have one plugin that aggregates all the sites so you don’t need to? They’re literally just recreating the search functionally on the real estate websites but worse.

Nor do we need so many “SEO” plugins.

The most problematic (and the ones most likely to land OpenAI in hot water) though are the investment related plugins. Honestly, I’m shocked OpenAI hasn’t put the kibosh on them.

3 Likes

I got it. That is, indeed, the future of the plugins. They currently bring something new, but there are more things like memory (to get the user information), information that should be accessible from our part in order to build more powerful plugins.

I do not think this will come with ChatGPT (soon), but with Bing…

I mean the model itself is stateless. They just built some scaffolding around it to mimic memory.

2 Likes

I’m aware of this problem. I’m trying to solve it. But it’s hard, the plugin ecosystem moves pretty slow these weeks. It’s just time to learn what we have now, share what we see and wait until we get access to more capabilities…

You need to provide the model with its memory which is totally achievable… The way ChatGPT is calling plugins as stateless services isn’t going to work. There are fundamental missing concepts. Adding a simple “conversation id” to the request would go a long way towards making things tractable but then you need memory to know on the calling side which plugin you’re talking to and then the plugin needs to send GPT and “EndOfConversation” message to tell it when its done… Ironically, all the constructs needed to make this work exist but are being ignored.

Again this is all just my 2 cents… This will work itself out over the long run.

1 Like

I mean, if plugin developers want to maintain state, I’m sure they’d be able to implement it.

I imagine it would be as simple as building a key server into the API.

When the model initial decides to trigger a call to the API, it checks if there is a conversation ID in context. If there isn’t then the first call to the API requests an ID which would then be included in all future json strings sent to that API.

Maybe I’m under-thinking it or I have a fundamental misunderstanding of how ChatGPT is triggering plugins.

I don’t know if having a conversation ID included in requests by default is the right solution as it eats up tokens unnecessarily for those plugins which don’t require it.

Let me give an example conversation flow and I’ll leave to others to work out how the state and transfer of control should work. I have an opinion… This is a flow between a 2 agents; a TravelAgent, BookFlightAgent, and a User.

TravelAgent: Hi how can I help you?
User: I need to book a flight to Denver next week for 3 days.
BookFlightAgent: I can help with that so what day next week do you want to leave?
User: monday but I am also going to need a car
TravelAgent: Sure, I can help you book a car once we get your flight booked.
BookFlightAgent: So I have you leaving for Denver next Monday. Where are you flying from?

The user doesn’t see the handoff happening between the various agents but this is a very believable conversation flow. There’s a lot going on here…

The TravelAgent needs to interrupt the BookFlightAgent when the car is requested and then it needs to remember to pass control to the BookCarAgent once the BookFlightAgent is done.

Also, after an interruption occurs the TravelAgent needs to pass control back to the BookFlightAgent so that it can re-prompt the user for the information its waiting on.

1 Like

I’d just have the TravelAgent generate some UID in the first invocation, inject it into the context, and have the plugin include that UID in the json sent to the plugin API on each subsequent call.

1 Like

That would help you to remember the tasks you need to perform (we call this UID a ConversationID in the Bot Framework) but you still need mechanisms to pass control temporarily from the BookFlightAgent to the TravelAgent and then back. And you need a mechanism to know when the BookFlightAgent is finished.

1 Like

I have had (some) success with having a plugin return instructions for GPT to follow, and have implemented an agent using that model. Couldn’t that solve some of @stevenic problems?

Having said that, I basically just use my search plugin and little/nothing else (well, ScholarAI).

IMHO, the biggest problem with Plugins is that OpenAI wants to ‘own’ the user. As long as that is the case we have a problem. Hence I eagerly await the python port of alphawave (hi @stevenic :slight_smile:), and will start working on a crude version of it myself after I get a couple of other things off my plate.

1 Like