We have a bunch of Azure credits as a startup.
As we ramp up using Assistants APIs, our cost sky rocketed. I was wondering if it’s possible now or in near future that Azure models are allowed to be used in Assistants APIs.
Azure does not have that currently, but it could be really quick before it arrives. In fact, they just announced that GPT4 with vision is now available on Azure.
You might want to also look at MS founder hub where it’s possible to get OpenAI API credit directly as well.
Thanks for sharing this @louis030195. Looks like a good workaround until Assistants API is made available on Azure.
However, I feel the costs would be higher for paid LLMs (eg: Azure) as all previous messages of the thread will have to be resent back and it’ll be counted as new input tokens (if I am not wrong?).
I am hopeful Microsoft will release its own public access for REST APIs to use the Assistants (built inside Azure) by end of Jan or mid-Feb max.
It is not yet clear what the pricing structure for Assistants is. At the time of writing, details on token usage (particularly as it refers to the internal RAG that Assistant do) is not controllable or known.
Therefore, you may not actually spend more with your own setup, as you can more granularly control the context window and model used (Assistants cannot change models mid-convo, whilst stateless chat api technically don’t care).
So it’s not such clear-cut comparison. For a specific Assistant setup we have, we spend about ~15 cents per message on average for conversations that average 4 user messages / 5 assistant messages / 2 retrieval interactions using gpt-4-1106-preview
.
We have more benchmarks but it’s really variable depending on Assistant setup & conversation profile. It is particularly difficult to predict. Custom RAG setups aren’t though, and sometimes predictability is more important in the business model when the price delta is (looks like) quite small or in fact favor the custom ones.
Rag is the way
I am very curios to see if Azure AI will support assistant API … specifically for the storage / state full part … Not sure if they want to hide behind that api the storage and vector embeddings calculation.
So far they have been selling it explicitly with azure search and cosmo db …
I am really very curios on this
Very likely it won’t be offered until there is a clear new value from from the assistants API that Microsoft cannot offer with its suite of search tools. It is actually better at the moment to use those tools until the Assistant API is out of Beta and its multiple issues resolved.
Particularly the lack of flexibility on how to index and retrieve that for now drives all serious implementations to use other stacks.
The Assistants API is now available on Azure.
But I can’t seem to get it to work with the REST version of their API. Seems like there is a gap between marketing and engineering.
It may very well be. Are you seeing a specific issue? Are you using their docs for this test?