OK, nobody is going to want to do this, but here is a solution I just created.
I developed a RAG system using the OpenAI Chat Completion API. FYI, I could also use Assistants API to accomplish the same. It allows you to chat with a knowledge base of documents I have created on a particular subject. I also designed it to be able to receive (and respond to) questions via REST API.
So, I created a GPT that accesses this knowledge base API via an Action – and it works as expected.
However, as the knowledge base was using gpt-4.5-turbo-preview, it’s responses to questions was slow. Very slow. Many times beyond the GPT Action API timeout of 45 seconds. So, I solved the problem by changing the knowledge base model. I designed it to be able to also use gpt-3.5-turbo-16k, mistral-medium and claude-2.
Theoretically, you could solve your GPT speed problem using an external knowledge base running gpt-3.5-turbo-16k (via Actions). Now, of course, you need to be able to develop the knowledge base using Assistants or Chat Completion API, then make that knowledge available via an API itself. And using an API will add some latency.
But, it would solve the problem if you absolutely positively must use a GPT.
Again, it wasn’t a problem I was initially trying to solve, but a benefit I discovered as part of the solution I created.