Woa 35k tokens in one go ?!?!?!?!?!?!?

sbm · February 13, 2024, 7:36pm

my document is 3 pages long, I asked for something that is not in the document and it took 31k TOKENS !

sbm · February 13, 2024, 7:46pm

It happened 2 more times? is this normal???

cdonvd0s · February 13, 2024, 7:47pm

Wow, that seems unexpected. Have you uploaded some files as knowledge?

sbm · February 13, 2024, 7:48pm

yea I just have a tiny file, just 3-4 pages long
even if it was 100 pages, I don’t think it should cost this much it happened 3 times in a row

I am so happy I didn’t use Assistants API with any of my clients, I would’ve had 500$ gone in 1 hour

cdonvd0s · February 13, 2024, 7:50pm

Is the uploaded document in a language other than English?

sbm · February 13, 2024, 7:53pm

ALl of it is in english I dunno what’s causing the problem

sbm · February 13, 2024, 7:54pm

the bot also doesn’t reply

I think open AI is having some problems

If other people are experiencing this

It’s gonna make assistants API look sooooooo bad

cdonvd0s · February 13, 2024, 7:56pm

There is a partial outage for API and ChatGPT right now.

anon22939549 · February 13, 2024, 8:06pm

I think it’s probably not exactly expected but it seems to be somewhat “normal.”

I’m tagging in @_j since they have vastly more experience with assistants, but my understanding is it’s very easy for token use with the assistants API to very quickly spiral out of control.

In this case, what might have happened is it read the entire contents of your 3-page file, which might be ~1,500 tokens and put it into context. Then it made a series of calls to the model, including that context along with everything else, every single time.

At this point, because you do not have any direct control over how the assistants API manages context or the number of calls it will make in a single run, you don’t really have much in the way of control over your API costs.

_j · February 13, 2024, 8:21pm

Well described, except that retrieval has an additional token-burning feature.

Not only is the maximum amount of text from files placed into AI context regardless of relevance, it also has a function for searching, with no message “using this is stupid because you already got all uploaded file text”. The myfiles_browser has independent controls like “see search result part”, “scroll”, … and then, even has to emit a “back()” with another full context AI call just to change to a different file id or search return.

Then your conversation includes the function calls and returns and keeps growing too.

The first control they could offer is to limit the 125k model context length to a smaller number, to which algorithms adapt. I replied to OpenAI staffer with a more extensive list of API controls possible, describing to him by example schema what they don’t document…

Solution: Use chat completions. Inject via RAG only that documentation that has contextual relevance to the present need. Then you can even make more cheap model calls to ask “should I summarize this?”, “should I discard this?” against past chat and injection by means of writing a chat context manager, using a combination of embeddings and language AI and token counts.

cdonvd0s · February 13, 2024, 8:38pm

A simple solution (No code) would be to use a third-party service like dify.ai

It has a lot of cool features and offers better control.

sbm · February 14, 2024, 6:01am

wow man you always reply back to me and wow me each time, how can i thank you

I’m gonna stick to using LLamaIndex / langchain for the time being, assistants API is milking my API keys

Topic		Replies	Views
Assistants API Cost Exceeds Reasonable Expectations API gpt-4	4	1090	April 11, 2024
Why are my context tokens used so quickly? API api	3	2888	January 5, 2024
Assistants API token usage and pricing breakdown clarification API gpt-4 , api , assistants	10	10577	February 6, 2024
Assistant API / costs / where do I find my token consumtions in assistants\|messages\|threads API	4	2047	December 14, 2023
Assistant API - What are Context Tokens in the Billing calculation? API assistants	24	12670	May 6, 2024

Woa 35k tokens in one go ?!?!?!?!?!?!?

Related topics