We’ve built our product on top of OpenAI assistants (RAG) and our customers and prospects are happy with what we’re offering. However, we’ve been waiting a long time for the Assistants beta to end. Given the frequency of failures we see in APIs and the inconsistent performance (particularly response times), we are feeling pressure to reimplement our solution on top of something with better reliability and more performance.
That means rolling our own assistants on top of lower-level APIs (completion, vector search, thread history, guardrails, etc.).
That means that we are looking at implementing parallel solutions on top of both lower-level OpenAI APIs and Google Gemini. This will give us a more robust solution where we can tolerate OpenAI outages/failures and can use the higher-performing solution.
Has anyone else taken this parallel approach? Any words of wisdom?