9 months of using the OpenAI Assistants API
As I celebrate the first anniversary of our OpenAI powered Vic application I thought it would be good to share a few thoughts on using the OpenAI Assistants API for 9 months. Here are some observations:
Increased use — decreased cost.
In July we ran up 10 million tokens — and our API bill was just $55! Every month since March we have seen usage go up and cost go down. Pretty insane. It makes the cost for a ‘personal’ GPT at $20 seem very hight comparatively. Just this year we went from having the bulk processes on GPT-4, to some on 3.5, and then since May on GPT-40.
The value we receive from the work done with those 10 million tokens is truly astonishing.
Batch processing — even cheaper for bulk
The Batch API that was introduced a few months ago allows for Assistant runs to be sent as batches that are processed in a few hours, instead of ‘now’. The savings here are 50%! We are currently not using the Batch API because most of our requests are one offs — but we have had a few situations (like anew ‘AI Score’ that we needed to calculate across all 10,000 existing records) where this could have been useful. Simply have not had the time to create the infrastructure to use it. It’s definitely on the list for my Django/OpenAI processor.
Performance / reliability is much better today than earlier this year
Around April when Assistant usuage started to pick up performance was noticeably slower than today, gpt-4o seems to be making a huge difference.
Reliably generating JSON is a breeze now!
In the early days of the Assistants it seemed like you had beg prompt to try to get JSON output generated reliably. A few month ago with the introduction of gpt-4o, JSON output became a standard option and just this week a new feature was introduced that enables JSON schema for function input and repsonse output, solving this issue once and for all.
Vector store(s) are now built in
The original Assistants always allowed uploading files to the Assistant or the Thread but in the latest iteration you can create sets of files, — ie your own vector stores and attach those to Assistants or Threads — not need to use an external vector store. These work great for what you expect: query and retrieval. At the moment they are not very good for things like ‘summarize the text in this document’ (We use Llamaparse at the moment to create a mark down version of every type of incoming document and store the Markdown together with the file. The ever increasing context window size make it fairly easy for most tasks to simply use markdown versions of documents and add them to the thread as is). The implementation also hints at an upcoming feature where you will be able to indicate the purpose of a file / library — ie search or content to solve this problem.
Vision is amazing!
Vision built into the models is so helpful — my email processor can now do the most amazing things with uploaded photos. One of the most amazing built in features is that when you provide it with graphs or charts — your output will be mark down tables, automatically. We use these for processing Pitch decks!
Streaming
Streaming has been available for 6 months or so in the Assistants API, needed for all those chat applications. In our situation most interactions are through email, so this was less of a thing for us. But …
… I can’t wait for streaming voice!
This is going to be such a game changer. My users already email a lot that the Assistant then handles — but in terms of requests — talking (back end forth especially) in realtime will be a real game changer I believe.
Model updates / Changes are sometimes challenging
While most of the model updates have been positive, there are always slight changes. gpt-4o can be much more verbose for example in ‘explaining’ what it is doing vs gpt-4-turbo. Across the more than 30 different Assistants we run, keeping track of those subtle differences is an increasingly important task that screams for automation that I have not figured out yet.
Conclusion
9 Months of API has brought mostly great improvements and I don’t regret switching early (in December 2023) at all. I’m very excited for the next updates, especially streaming voice. Performance has increased, cost has tumbled and reliability has overall been great.
What are your experiences with using the Assistants API so far?