Hi everyone!
I am building an AI startup which has a very high velocity of generated messages per minute (upwards of 1000 messages per minute). They are ~2000 tokens per input. Anyways, given that background, as we all know OpenAI’s API is not able to handle this output volume and has much more downtown than we’d like so see for a LLM provider over the past few months. This is not sustainable to growing a company, and was wondering which alternatives are recommended so we can keep this high velocity of message generation?
I was looking into Mistral and it seemed promising, but would love to know if anyone else had other suggestions!