If you want easy, use managed services like pinecone for rag, still using Open AI LLMs and embeddings.
The Assistant API will not ready for production use-cases in the next months I imagine, but if you want predictability, build it.
If you want cheap and effective, use Qdrant & Mistral, you can start with their managed services and move to your own hosting once you’re ready. Build the conversation memory & the RAG yourself. It’s not that complicated and allows you to build up to your specs rather than depend on a black box.
Best of luck,
PS: Feel free to DM me if you’d want to talk more. Our company, Integrait helps startups with their product buildout, and buildins assistants for SMBs.