Function Call Loop burned through my budget

Hi Guys,
My assistant got stuck in a function_call loop and burned $100 in a couple of hours. I’ve got logs from AWS and sent them to OpenAI. Hopefully, they will reimburse.
However, the story didn’t finish there. I stopped my ECS task, added more specific guardrails, increased my limit, tested and deployed. I thought it was resolved, but by the next morning, it burned another $160, but this time no logs as no ECS tasks were running. It seems like the same assistant run is still stuck, but now I’ve got no access to any footprints of it.
Anyway, $260 were wasted on some function call bug, and I really hope OpenAI will return it to me, at least in the form of credits. We are also presenting the product to a prospect investor in a few days, so it’s quite a nightmare that this happened after a few months of testing and running.

PS: I’ve just realized that I can find my thread_id in the logs and delete it, which I just did. So I will put $100 more and see where it takes me.

1 Like

I’m sorry that happened to you, hopefully OpenAI is understanding and can identify exactly what happened.

I suppose one of the lessons here is to develop safeguards to ensure your assistant can’t go rogue or get stuck in a loop before you take your eyes off of it.

It could have been worse though. While $260 is nothing to sneeze at, I imagine under the right circumstances you could have burned a lot more cash, I’ll share my own tangentially related tale of woe…

Many moons ago I was working for a small startup that was hired to do the backend of a promotional website for Windows ME. Some electronic version of a scratch-off ticket where you could win things like Windows ME T-shirts, hats, other merch, copies of the OS, and even some cash prizes.

Long story short, I pushed the wrong backend version to the live site before going home for the night—the version with substantially higher odds of winning prizes I was using to test all of the assets were loading properly and the page would redirect winners to the proper pages to enter their information.

I got the panicked call around midnight from my boss that we’d apparently awarded on the order of 5x as many prizes as had been allocated for the entire month long promotion in under 3 hours.

It was like a three minute fix to pull the site offline, push the proper version and bring it back up, but it felt like forever.

Walking into work the next day was possibly the scariest thing I’d ever done in my life up to that point.

I was absolutely floored when my boss didn’t fire me or throw me out of window. What he said instead has stayed with me all these years later.

“Why would I fire you? I just paid $60,000 to teach you to never push the wrong thing to production and to never leave the office until you’ve triple-checked everything is running perfectly.”

And he was absolutely right, to this day I remain ever vigilant against the possibility of runaway systems.

So, :clinking_glasses: and welcome to the club.

5 Likes

wow that sucks… I can see that happening though and it’s totally on them.

I have guard rails that control both how many steps and how long an assistant is allowed to run.

Are you ok sharing both your prompt and what your response was from the function it called. I’m assuming your response was a bit ambiguous to the model.

1 Like

I woke up once to an email I sent on the cover of USA Today… I thought for sure I was fired. My boss simply said, Steve I think you’re about to learn a very valuable lesson.

3 Likes

Great call.

I highly recommend to everyone to use a FOR loop and NOT a WHILE loop for BOTH run status and repeating your function after a tool call

2 Likes

I have been very concerned about these rypes of issues for myself. I have experienced some squirrely things like this but just made sure my reserve was no more than I wanted to lose and not get back. OpenAI should have a no questions asked attitude until it is ready for prime time. It definitely is not as strong as they want us to believe.

PS. Stop the libtard filtering of content and let us do what we do! I dont need a pink haired Tard telling me how to think!

Thank you! Happens with the best of us, I assume!

1 Like

Sure!
Here is the part of the code, I have it now with the guardrail:

            tool_output = []
            for call in calls:
                try:
                    function_name = call.function.name
                    # Never request the same function more than X times!
                    if not function_calls_counter.get(function_name):
                        function_calls_counter[function_name] = 1
                    else:
                        function_calls_counter[function_name] += 1
                    if function_calls_counter[function_name] > MAX_LOOP_GUARD:
                        raise LoopStuckError()
                    args = eval(call.function.arguments)
                    if args.get("justification"):
                        logger.info(f"Justification: {args.pop('justification')}")
                    res = getattr(config.db, function_name)(**args)
                except Exception as ex:
                    logger.error(f"Error calling {call.function.name}: {ex}")
                    res = f"Error calling {call.function.name}: {ex}"
                tool_output.append(
                    {
                        "tool_call_id": call.id,
                        "output": json.dumps(res),
                    }
                )
            run = openai.beta.threads.runs.submit_tool_outputs(
                thread_id=thread.id,
                run_id=run.id,
                tool_outputs=tool_output,
            )

The function_calls_counter is the guardrail I have added after this happened.

Also, I can confirm that after I have found and deleted that thread, the burning stopped.

1 Like

Update! It happened again twice! Something has burned $76 on my billing on the 27/Dec and 13/Dec, although I have reset all my API keys, and my log show that I have only used 130k tokens on the 27/Dec, which are 99.9% input tokens. It’s also $60 higher that my expense limit.
I haven’t got a single reply from OpenAI’s support for any of the email I have sent. It’s an absolutely the worst customer service I have ever experienced, and, obviously, I am not going to pay my next bill.

2 Likes

Sorry for that, and thank you for sharing your experience.
I’ll be more careful to deal with function calling as we need to run it in a loop (thread polling) on assistant API.

There are many of us who have this problem of uncontrolled consumption.
I have been trying to contact openai for two weeks, but the response is null.
We cannot scale our app or offer better models with these consumptions.

Hi guys,

A brief update here. I’ve got a reply from OpenAI support, which is great! They applied credits for all unexpected uses. Which, although delayed, means they do care! With this in mind, I think, we (me and anyone who is also experiencing this issue) should really bring it to their attention. It’s not only about getting the credits but having a working and reliable service, which we can use for our applications.
For me, the unexpected usage started when I moved my old chat completion based code to Assistants and Threads.
What about you?

3 Likes

I truly think every one out there should hear your experience.
I also explicitly define the max number of function call it’s allowed to execute. But having the threshold hard coded can be bad sometimes, especially when we set it too low.

Another budget burning parameters I am looking at is the input token. Some of my function are API call to pull data from other services or web content scrape from other site. They might flood the context window in just a few call, end up lots of input tokens.

100%! But the funny thing is that I have a blocker which kills a repetitive function call with more than 2 attempts. The issue was that it continued to run nevertheless in the thread, and stopped as I ran out of my budget. Once I increased the budget, it started running again. The only way to kill it was to find the thread ID in the logs, and delete it. Well, or delete an API key.

1 Like

I’ve solved a similar to yours ‘input tokens’ problem by summarizing larger context with cheaper models (3.5) first. You can also summarize summarizations, etc, until you are happy with your remaining bullet points, which you then put into your smart and pricey model.

2 Likes

This is good advice and similar to what I was going to write.

The only thing I would add would be to be aggressive in culling previous context. If you’re pulling in a lot of data from outside sources, don’t keep it in context any longer than necessary, if the subsequent messages don’t rely on that data, there’s no need to keep it around.

2 Likes

However, this approach is hard to tune, because designing a robust prompt to run content summary is hard.

I imagine when the assistant agent has accumulated context length N, we dump all of it’s context to a cheaper chat completion agent. Then spawn a new assistant agent with the summarized context.

There is a way to tune it, or at least significantly increase observability. Consider using graph knowledge or at least employ graph’s way of summarizing.
Basically, your summary is not a paragraph of text with high variability, but rather a tree of nodes with minimal critical amount of information in each.
For example, assume you have an article, which has chapters, which has paragraphs, which have sentences. Now, what you are trying to achieve here is to reduce the dimensionality, e.g. paragraph will become a sentence, chapter will become a paragraph (or rather a set of bullet points), and article will become a set of these new chapters.
What is cool about graphs, is that you can still preserve the original knowledge and use it when required, e.g. create “more_details” callable or similar, which provides the original context.

1 Like

Aren’t startups just lovely. Mistakes are still allowed, while huge organizations like to pretend they already know everything (although somehow still they keep accumulating more best practices over time). I’ve been part of a 50k€ mistake as well on a startup, and nobody started the blame game because it was quite obvious we just lacked firm enough procedures to prevent those kind of mistakes and we could have blamed at least 3 different people depending on how we looked at it.

Multiple Bitcoin ATMs were accidentally left at 0% commission for months :slight_smile:

1 Like

Hello @pedantic_geek, I had created another issue where I expressed my concern about using the Assistant API precisely for being token expensive. In said conversation I was recommended to use the Chat Completion API instead, precisely to avoid the overcharging issues you are having. At this point I have a simple question for you: Since you likely have looked at the Completion API already, what’s the reason that’s making you stay with the Assistant API as opposed to the Completion API , assuming you are still using the Assistant API. I mean, I know what each one does, but I am curious to know what makes you stick to the Assistant API. Thanks