The plugin framework has failed to support full features of the OpenAPI standard. So far, I have identified two bugs.
If the OpenAPI.yaml references other .yaml files, the plugin won’t crash, but will silently ignore all the files. In other words, the plugin supports only one yaml file.
The “allOf” is a very useful feature of OpenAPI to construct complex data structures. Again, the plugin will ignore it.
I don’t know if OpenAI developers would read this forum. If you know a better way to report bugs, feel free to talk to them.
The issues are quite important. When plugin apps are getting bigger and bigger, it is obviously a bad practice to put everything in a single yaml file.
I have a medium sized plugin. Its OpenAPI will soon reach 10,000+ lines. Not easy to manage it in a single file. I split it into 30 - 50 yaml files. I have to bundle them together for production. That would be a workaround for the bug 1 for now. But the bug 2 cannot be solved. Without “allOf”, the bundled file will bloat to a ridiculous size with a lot of redundant data structures.
Don’t know if this is regarded as a bug, but based on your description it might even be a feature, having plugins load with ~10k lines of code will fill the context window several times over
If you want to report it though there’s a designated place for that over here:
They may even pay you money if your bug report is accepted
You definitely reminded me of running a stress test before proceeding with my ambitious plans. I used 250 endpoints (about 4,000 lines). It worked. But, when it reached 5,000 lines, the system didn’t crash, but it was worse - the responses were like gibberish of our long lost village idiot. That worries me. The very presence of a large openapi.yaml will distort ChatGPT’s own behavior. ChatGPT didn’t even call those endpoints. I really hope OpenAI people will be on this forum to notice this.
An API of 5,000 lines is not big for any serious applications! OpenAI plugin platform is not intended for toy apps only, is it? If OpenAI developers are reading this, it would be very nice for you to publish some performance benchmark testing results.
Right now, I don’t know if the 250+ endpoints or the number of lines of code hit the limit.