Assistants API is Killing Me

d9ab41db13f27d87f788 · February 10, 2024, 9:47pm

I’ve setup an Assistant which contains one book (a few MB in size) and one other document (a few KB in size) as references. My instruction set isn’t that large. Now that OpenAI have exposed the Threads with Token counts, I’m realizing how expensive this seems to have become. Some threads with only a few messages back and forth are coming to tens of thousands of tokens. I’ve guesstimated that the cost per message (each way) comes to somewhere between $0.07 and $0.10. That’s crazy.

Anyone else having issues? Am I doing something wrong here?

stevenic · February 10, 2024, 10:13pm

That’s one of the downsides of the current Assistants API (and why I don’t personally use it) there’s no way to control cost and you could easily get to a point where you’re paying $0.50 per request.

dignity_for_all · February 11, 2024, 2:23am

Normally, this data should not be directly retrieved by the Assistant API, but rather should be embedded in a vector database and then called appropriately.

mouimet · February 11, 2024, 2:30am

Change the model of your assistant. Use gpt 3.5. A request is less then 1 cent usualy.

d9ab41db13f27d87f788 · February 11, 2024, 2:42am

One of the benefits of the Assistants API is that it handles the vector/search functionality for you. Otherwise, using Chat Completions with your own vector/search solution would be the way to go…

d9ab41db13f27d87f788 · February 11, 2024, 2:43am

I’ve thought of doing that, but in some cases GPT-4 performs way better than 3.5

mouimet · February 11, 2024, 2:45am

Of course! That’s why I handle request with gpt 3.5 and delegate more complex to gpt 4.

dignity_for_all · February 11, 2024, 2:49am

Please reduce the data to the minimum necessary and limit the amount used during retrieval, which may also be helpful.

d9ab41db13f27d87f788 · February 11, 2024, 3:01am

Easy to separate if you have complex tasks. For a conversational bot, you’re either going to have good performance with GPT-4 or relatively poor performance with GPT-3.5…

d9ab41db13f27d87f788 · February 11, 2024, 3:02am

Assistants API determines that on its own. Again, supposedly one of the benefits of this service. There’s no way that I know of to control how much context the Assistant maintains or how much of the vector database to pull into the thread…

mouimet · February 11, 2024, 3:03am

For discussion, there is not much difference between gpt3.5 and gpt4, I mean small talk.

stevenic · February 11, 2024, 3:09am

To be honest in most cases I prefer GPT-3.5 for the vast majority of tasks. GPT-4 tends to overthink things.

dignity_for_all · February 11, 2024, 3:09am

I apologize for any confusion.
What I was referring to was the size of the data when uploading…

I understand that the benefit of an assistant lies in automated retrieval.

I thought that by limiting the data to be uploaded, it might help save some costs, but I apologize if my suggestion was off the mark.

d9ab41db13f27d87f788 · February 11, 2024, 3:10am

It’s a good suggestion! But I’m only uploading a few MB. The Assistants API can handle GBs worth of data (supposedly). Can’t really limit it any further

merefield · February 11, 2024, 10:04am

For the time being, put together your own local RAG agent.

Here are some code examples:

(A fully featured, stable, plugin for Discourse, written in Ruby (on Rails) )

(A Python implementation)

DevGirl · February 11, 2024, 10:10am

You’re not doing anything wrong.

As others have suggested, a less expensive LLM such as 3.5 is an option because you’re relying on RAG and don’t require as much power in the LLM.

IMHO, the most effective option would be to chunk this, import into a Pinecone vector DB (it will be small enough to be run for free) and this will substantially reduce costs while potentially increasing accuracy, depending on how conducive the embedding is to a structured document/chunking.

dignity_for_all · February 11, 2024, 10:28am

You aren’t doing anything wrong.
It would be beneficial if the Assistant API could retrieve data from files more efficiently.

I also need to learn more about the Assistant API.

d9ab41db13f27d87f788 · February 11, 2024, 10:42am

The whole point of the Assistants API was to cut down on all the infrastructure and coding. I had a great system working with Chat Completions, but moved over to Assistants thinking I’d save so much time on dev. I guess nothing comes for free — save time, spend more money and end up with a slower system.

d9ab41db13f27d87f788 · February 11, 2024, 10:43am

I’ve been using it since it first launched. It performed better initially. Performance has degraded. I assumed a new Assistant would need time to “warm up” before reaching peak performance. From my experience, it seems to be the opposite.

Let me know if you figure anything out. Thanks again for the responses.

merefield · February 11, 2024, 10:44am

I’d say:

less flexible
more expensive

(at this point)

Topic		Replies	Views
Unusually high API bill from Assistants API API assistants-api	1	784	November 25, 2023
Strange Assistants Pricing API assistants-api , assistants-pricing	1	536	May 7, 2024
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2444	May 24, 2024
Assistants API Cost Exceeds Reasonable Expectations API gpt-4	4	896	April 11, 2024
Pricing of Assistant API misleading API	1	2020	December 11, 2023

Assistants API is Killing Me

Related topics