Failover system when vector store api is not responsding

due to this issue Uploading file to the vector store is stuck at 'in progress' that already happened twice and took OpenAI hours to recognize the issue and few more hours to fix it, we decided to start exploring the implementation of a failover system.

Anyone done this with minimal changes to implementation? We currently use OpenAI storage for managing files, vector stores, and using Responses API to handle chat.