Assistant API - Run won't cancel, keeps billing usage

Below are two screenshots 1 minute apart - in that time I’ve used 50c of GPT4-Turbo, but this is due to a thread stuck after requesting it’s cancelled (thread_Wpd3llJKfEbno4lNuZ4vDVd3)

Here is the following JSON to confirm the request was cancelled, but has not been picked up in the UI or billing it seems - it just kept spinning for several minutes and continues to bill

      "id": "run_gITrJ0bqR6mfNWR14Xq0AuyQ",
      "object": "",
      "created_at": 1699308938,
      "assistant_id": "asst_aoQqfTDFAKdxPCDciylDjnh9",
      "thread_id": "thread_Wpd3llJKfEbno4lNuZ4vDVd3",
      "status": "cancelled",
      "started_at": 1699308938,
      "expires_at": null,
      "cancelled_at": 1699309538,
      "failed_at": null,
      "completed_at": null,
      "last_error": null,
      "model": "gpt-4-1106-preview",
      "instructions": "<prompt>",
      "tools": [
          "type": "retrieval"
      "file_ids": [
      "metadata": {}


1 minute later:

Based on the “new pricing” and the fact I don’t have anything running now and still didn’t get the answer from the assistant, this doesn’t seem right…

the same here, I hope openai returns the money and explains the assistant charge better

1 Like

It is likely that it got stuck iterating trying to call a function in an alternating loop of code failures, - or you just gave it a huge document to be set loose on.

With no specifying “max_tokens”, and their own conversation management possibly filling the context or omitting failed iterations, it seems like it could be “I put my API key into autoGPT or langchain and it emptied my account” forum scenarios all over again.

You can possibly go in and edit the assistant and any code calling and that might make it stop - if not just a big “delete”.

Also set the hard limit of your account to 0 and that should stop the billing.

Would be interesting to see the by the minute requests usage report…

– edit, the daily view and individual in-browser hourly queries are gone. The only thing you can see is a daily bar graph by model and hover to see the daily tokens. One must do a monthly export, and the export is terrible, also only showing the daily total of the model for the entire period.

The default view and also activity view hides models. One must select the models to be seen in a drop-down.

But I believe that’s not the issue; the first time I used it, it returned successfully. However, subsequently, with the same assistant and the same files, it only returned an error, five times. The problem is that it’s not working, and yet it’s still charging.

If you didn’t get an API response, and you go to the usage daily view and by the five-minute view you still see calls and tokens, somethings still going because it hasn’t hit an iterate limit not in your code or prompt, or can’t see how many times its gone. Maybe an API error of “hard limit, check your plan” would make it give up.

– edit - there’s no more minute or hour view WTF. There’s just daily token reports and nothing with more resolution.

Luckily I keep my API soft and hard limits low to avoid a runaway like this, but the first few documents were at most 4mb PDFs or less and with a simple query.

With this the thread would just keep running until I cancelled it, and these ran up the first few dollars.

A second agent, those PDFs are anywhere from 14mb to 250mb and I got nowhere near as bad, and those threads would just refuse to run due to overload.

I hope in this case it was just a first few hours glitch due to the traffic and won’t really see it again.

But it would be useful to get an idea of how much the documents cost - and is it per request or only on first load, etc.

1 Like

Same thing happened to me. Try to let it read a 100Mb pdf (Openai says the limit is @500Mb). It wait for 5 minute, it says “Run cancelled”.

However, an interesting thing is in the dashboard, I see a 500 token output for the API call (I only used it once, so, it must be the API call I just used.), but no output showing.

Another thing is I didn’t see the embedding models calls — based on my PDF size, I was expect a large usage for embedding models since:

performs a vector search for longer documents

And that request a embedding models call.

I have No idea what has happened.

Yep I just had a thread run cancelled by OpenAI, but I was still changed for the thread running, even though I got no reply.

The problem I see is the pricing is too opaque.

For an agent it would really be nice to know the estimated price before running - and if the thread times out with no response or is cancelled from the server side it should not be charged.

1 Like

facing similar issue, was working fine for my time zone(India) today morning. Then the runs started getting cancelled. I can understand the server load. But the API is still getting charged by hundreds of thousands of token. That were never sent back to me. That’s just bad integration.

1 Like

The assistant pages aren’t even loading for me lol.

Same here, it uses tokens as the input for GPT turbo but doesn’t charge the output. Seems fair :slight_smile:

Same here but it specifically get stuck when trying to call a function? Maybe this context can help with that?