From yesterday to today, my costs have randomly exploded - I am categorizing job titles which costs approximately 1.1k token call (context tokens) - I am tracking the tokens myself. I am using GPT-3.5 turbo
However, today I have categorized around 1500 jobs, this has cost a total of 5.8 million context tokens - it should be roughly 1.5 million, so almost a quarter.
Is this all purely internal work, as in you have no other employees or anyone with access to your API keys? also they are not in any public application?
Do you use assistants or any of the new features? or is this all based on traditional API calls?
I had the same experience yesterday with roughly 40 million tokens in just a few minutes. Multiple threads were generating messages in a loop. You might wanna check your runs for the messages and if they were re-created multiple times.
I switched to Azure a few hours ago and that works as expected. Doing exactly the same request, but then on Azure, and token count looks normal again.
So, there seems to be an issue with the way OpenAI countās the tokens.
I wonder how this is even possible to fall under the radar and OpenAI just ignores messages over support. To me, it looks like they are not doing their business serious. It is like a scam, or at least a rip-off.
I am having a similar issue, I am very upset about this although luckily I have usage caps set so it is not costing me that much money but its very concerning. I am using a wordpress plugin on my website to make requests to chatgpt4 turbo and the tokens used do not at all match the actual usage reported by the plugin (its like 7 or 8 times what I expect). I have been discussing with the plugin creator but now Iām not sure it is their fault it seems this may be an OpenAI issue?
this seems very similar to what I am seeing, I am using a wordpress plugin that makes calls to chatgpt4-turbo and the usage is insanely high compared to what it should be. I have been talking with the plugin creator but I think this seems likely to be an openAI issue actually not their issue.
Yeah, it really seems to be always a range between 7 to 8times of the normal token usage. Some messages work normally and some are done 20 timesā¦ And the average is then 8x times. Thats what I have observed. There will be a team from around @nikunj investigating the issue. So Itās good to see that itās acknowledged to be an issue.
Glad to see they are looking into it, just to add more clarity, the chatgpt version I am seeing this issue with is called āGPT-4-1106-previewā on the https://platform.openai.com/usage page.
Hello @nikunj, is there any way I can be added to any updates that relate to this issue? I have had to pause all usage of chatgpt for the moment due to this issue so would like to know as soon as itās resolved. Thanks
Weāve seen this happen before when someone is expecting the API to return JSON and have retry code for if it doesnāt, and the model started replying with it wrapped in ```json, resulting in the retry loging being hit way more often than before.
So check if this is the case, and you can try using JSON mode to avoid this issue.
They āonlyā started with checking it this week. Until now, I did not hear back from them, but they made me aware that it could take some weeks to understand the underlying issue.
We, in fact, do output JSON and were having these issues with re-generated content in a loop. Sadly, the JSON mode is only available to the chat completion API and not the assistant.