I disabled all my API keys, and I’m watching my usage go to the moon for “text-embedding-3-small” - when I look up the API key making the calls, it’s none of mine, and all mine are disabled.
Check here for the legacy API keys you forgot:
https://platform.openai.com/settings/profile/api-keys
Then here for the other organization members that can bill services to you:
It’s my personal account, there are no other organization members. There are no legacy keys. All projects are disabled. All keys are deleted, my usage is still increasing. I’m freaking out right now.
Sorry to be that guy, but have you gotten in touch with support yet so they can look into your account? https://help.openai.com/en/articles/6614161-how-can-i-contact-support
Yeah I tried that chatbot thing and it told me it escalated the ticket to live support, and it will take 2 to 3 days for a response. It looks like I got it under control by setting my budget to a dollar.
Yeah this is pretty nuts. I setup the code-context MCP server the other day so I think this is somehow related to that. I used it with Claude Code for most of the day today, but the API is saying there were over 7000 calls made totaling 700 million tokens. There’s no way that all came from me using Claude Code. Claude only made a handful of calls with it. Part of setting up the code-context MCP requires an OpenAI API key. I could see my usage keep going up even after I disabled the key I used.
Yeah what really throws me off here is that the API key I see under usage is not any key that I created, but all the tokens are input tokens going to embedding, which is where I would expect code-context to be sending my codebase. I’m working on a side project that’s only got ~28,000 lines of code so far. Even if it was reindexed 100 times that’s only like 20-25 million tokens. I wonder if that thing was indexing EVERYTHING - like node modules, next, binary files, hell my whole hard drive. That would explain the insane volume, but not the API key that I never created.
We must consider this an API key compromise, which can come from insecure software you’ve written. Any key rotation without fixing the problem is just offering a different delicious snack to the API consumer hacker.
“There are no other organization members” - you must check, organization credential compromise means someone can get into the dashboard, and add themselves.
The first thing to secure is the platform site itself. If you are using an authentication service to login to OpenAI, change that account’s passwords. If you are using an OpenAI user/pass, you reset by going through the login process (such as on a private browser window where you aren’t already logged in), then at the screen to enter password, say you forgot to get a reset message sent to your registered login email.
All project-based keys should be listed at this link below, but each project also has its own key page, along with limits:
https://platform.openai.com/settings/organization/api-keys
- (What is embeddings? It is the model that powers vector stores and semantic search, such as for document knowledge queries. See that nothing in bring-your-own-key cloud services is offering you such a thing.)
Identify the source of usage
You will need to go to the platform site, and investigate deeper.
https://platform.openai.com/settings/organization/usage
What we want to find is the project that is making the calls. “Group by Project”
The main display will now have a bargraph stack, and at the bottom, a list of projects by color as a legend. If there is just one making the large number of embeddings requests, we can target that specifically. There are further details also below.
You can then select a single project usage next to the selector for a date range to confirm you’ve identified the project being used.
Select that project at upper-left next to your organization name.
Locking down a project
Our goal here is to get the organization “shut off” as much as possible, now that OpenAI destroyed that ability by taking away an org-wide hard limit.
Go to your project at the lower-left navigation menu, and pick “limits”
You’ll have this kind of screen:
-
Edit budget, and set it lower than your incurred monthly use so far.
-
Go to “allow or block models”, and pick “allow” (blocked by default, only selected are allowed), and ensure all model IDs are then unselected. Save.
-
Rate limits: set the model being abused to 1 TPM, 1 RPM, making it nearly useless.
That should be everything you can throw at a project to stop its use.
Then check the lower API keys menu option, which is project-specific, and ensure they are all deleted (There is no “disable” anything like you report you have done). Were you to be safer in the future, you would pick “allow” on each API key, so you can select just the endpoints it can be used with.
Unfortunately, the usage site seems ridiculously degraded. Even picking a single day and a single project, I cannot get any view other than “Daily”, and there is no more legacy usage page (that actually didn’t stink). The only way you can see more detail is specifically to click on the word “embeddings” above its sub-usage graph, to see the usage of only one endpoint. Then you can see if this usage is ongoing, in 15 minute increments.
Thanks for that thorough write up. I did get it under control by setting my billing to a dollar. I also deleted all keys, logged out everywhere, turned on logging, blocked all models, etc..
Is there a long lag in reporting to the usage panel? I could see my usage keep climbing after deleting all keys. It stopped climbing after I set my budget to a dollar, but I wonder if that was just the delay in reporting.
In usage, pick the endpoint name with the sub-graphs, then pick the “1m” graph.
The by-the-minute once you are looking at the individual endpoints in the usage page should be relatively responsive. There have been ongoing issues in the past where usage keeps rolling in hours after the actual API calls, but that hasn’t been a recent complaint.
Additionally, after confirming that you’ve reset any passwords that would allow someone into the platform site (where they could make a project, create tons of usage, then archive it, so it doesn’t show up in the selector), then this link has a “log out of all devices”, (along with the option for authentication service multi-factor).
https://platform.openai.com/settings/profile/security
The highest chance here is insecure key use, such as hard-coding keys in application software, putting keys in client apps that make user API calls directly to OpenAI, committing code with keys in files - basically having keys on any untrusted platform where any other party can view them outside of the secure https request to openai.
Environment variables or a secrets manager must be used.
As a general guide to API Key hygene:
- Never put an API key in client side code, i.e. .exe, .dmg etc. Even if encrypted, this is trivial to hack.
- Better to never put an API key in any code ever, even server side. Use of a versioning system like git may lead to yur keys becoming public if the repo is discorvered by an attacker.
- Use either local .env files or system environment variables or key vaults offered by Azure, AWS, Google Cloud, Digitial Ocean, etc. to store your API keys and refer to them by indirectly by name, i..e OPENAI_API_KEY.
e.g.
.env file
OPENAI_API_KEY=sk-abc123yourapikeyhere
import os
from dotenv import load_dotenv
import openai
# Load environment variables from .env
load_dotenv()
# Get API key from environment
api_key = os.getenv("OPENAI_API_KEY")
Note thtat the OpenAI Python library (not tested the node js one, but I imagine hte same thing is true) will automatically make use of an environment variable called OPENAI_API_KEY so you don’t have to set it explicity when creating your client object, but you can if you wish.
Thanks, I’m looking at the minute by minute usage now. It’s averaging like 1.5 million tokens per minute throughout the period I had code-context turned on. 700 million tokens all input going to embedding. I can see the usage ended exactly at midnight UTC, which is around the time I disabled the API.
I’m beginning to think this is just a VERY poorly designed MCP that’s indexing the entire project folder’s contents every couple of minutes. I didn’t embed my keys, share my keys, put them in git, or anything like that. And if it was a hack, why would they dump millions of input tokens to embedding?
I don’t understand why the usage key in reporting doesn’t match the key I used for code-context though. - this is what really freaked me out before I figured out how to get it under control.
I have never had any embedding usage that did not corrispond to the key I was using, so tha tis quite odd.
On hte MCP code-conext plugin, I’m assuming thats the Zilliz: Vector Database for Enterprise-grade AI and LLM applications, or something very similar, in which case it all depends where tha tconciders it’s root folder, if that is the root on your project folder then it will indeed index every file down from there in the tree, so that could be pretty huge.
That would explain it. I think it was indexing everything in the project folder. But let’s say it was doing that, once it’s indexed, why would it continuously be sending 1.5 million tokens per minute for practically the entire day until I shut it down?
My only guess is things like pycache’s with like a bizzilion tiny files in htere or something database temp file thing htat keeps updating file attributes causing the system to reevaluate the files, file touche processes, sometimes code agents can open and close files by saving them before closing and htat can update he atribs as well.
You could try looking at file attribute updates and log which ones get affected, or it could just be a pretty garbage indexer that is crawling he same files over and over…
Lesson: Don’t run code you don’t understand.
It seems this “code-context”-“indexing” made free use of the environment variable API key, and started building a vector database on its own using embedding API models. The app never made apparent its use of a particular AI model, and how bonkers it was going to go.
Embeddings that large should be batched for a 50% discount, and permanently stored against file hashes so you never have to call the embeddings model (or your own local model overnight) again on the same files.
For example, don’t run this code unless you like attempting a bill for 16 million embeddings tokens…
import openai
print(openai.embeddings.create(model="text-embedding-3-small",input=["!@"*4095]*2047).usage)
(I won’t be as evil as also putting a while loop in there…)
Thanks for sharing the details. From what I understand, this may not be a system-side “ghost key” or compromise 🤞 What we’ve seen in similar cases is that some MCP setups (like code-context) may end up indexing large portions of a project directory and continue sending embedding requests in the background, even after a key is deleted from the dashboard. If the process has already cached the key in memory, usage can keep increasing until it’s stopped.
The best safeguard is to fully stop the MCP process itself, and in future set project budgets or block models as a quick way to contain usage. Keeping keys in a vault or .env
file is also a good practice when experimenting with third-party tools.
concur. That’s what the usage pattern suggests.
Apparently you didn’t read the part on help.openai.com :
https://help.openai.com/en/articles/9186755-managing-projects-in-the-api-platform
Setting a monthly budget allows you to establish soft spending thresholds for your project. When usage exceeds this limit within a given calendar month (UTC), API requests will continue to be processed without interruption.
You are not the only confused and deceived person. “budget” is essentially a mistruth. It must be renamed. (monthly check-your-inbox threshold).
Your suggestion is the opposite of practice, that an org isn’t given any off switch for a project, and the actual “limits” was purposefully removed from projects and the organization, almost deliberately to make the costs (and cost of hacks and account leaks) unmanageable until an account goes far negative.
There should be no “API key cache” which works like described. An API key being deleted should absolutely turn off API calls within minutes at the edge, and we’ve verified this does work in short order, under 15 minutes in the past.
Poor usage UI
The fault here is likely not that API calls continued happening after all the API keys were zapped, but that the usage page is so impenetrable, consolidating and grouping requests, instead of giving every call per five minute period (like how it used to work until the same hour Assistants was released 2023 with “retrieval” going nuts and no usage reporting by the API whatsoever) that it is easy to assume calls are still being made - especially when backend faults deliver increasing results of billing that happened over many hours before. This usage reporting slowly trickling in also even delays discovery of issues.
But yes, opaque environment-variable-scraping code, asked to create a semantic database at user expense, seems the origin of the usage when embeddings was never anticipated. An MCP client-server is actually where such a tool can be completely walled from the calling code’s API credentials and environment.
just to share, I experienced the same issue.
The strange thing is when I check https://platform.openai.com/settings/organization/usage everything seems normal. There is not much in logs either, just a few of my requests.
However, then I click on “Chat Completions” there is a huge spike of 122.507K input tokens on 1 Aug, and API returns an error message: “‘message’: 'You exceeded your current quota”
I have no idea how it was fully used.
Also, my API keys is “sk-proj-XXX”, however the key which was used is “key_XXX”. I did not create it, and cannot find anywhere.