I am using a wordpress plugin that makes chatgpt4-turbo requests. This plugin tracks all the chats made and that seems to match with the total number of API request, or at least is close, however the number of context tokens is incredibly high compared to what I expect. For example, the past chat logs for the last day, including both request and response, were about 6,000 words. I would expect this to be maybe 8,000 tokens but maybe only half were part of the context as the other half were the response. However, it says I have used over 78,000 context tokens.
At this point I have had to put very tight caps on the usage because I keep hitting my previous limit. This issue is very concerning, it seems something is very wrong right now with this API.
I am happy to provide more information about the specific chat requests made, etc I may just need to scrub the data a bit.
Hi, are you making use of the assistants endpoint or just the standard completions one? Assistants may be making use of stored information and or building message threads of prior messages that compound.
I’m unsure how the wordpress plugin works, it is not made by OpenAI so you would need to go to them and ask.
This is the version it is using: GPT-4-1106-preview.
According to this page: New models and developer products announced at DevDay, this version is: " GPT-4 Turbo with 128K context" so I don’t see how this relates to the assistants API, I have had no intention of using that and, as far as I can tell, I am not using a version that uses assistants.
I saw in another thread that an issue was acknowledged and is being investigated, it seems I am not the only one that has this issue.
The wordpress plugin creator has been adamant that it is not them making the extra requests and that what I see in the logs they have is what was sent over, nothing else.
Also, again, the number of API requests seem to match what is expected, so it isn’t making extra requests, it seems somehow a huge amount of extra context is being added into every request.
This is a very serious issue so I ask that you please continue to investigate.
Do you happen to know the thread an issue was acknowledged? I am not currently aware of a token overuse issue that is not related to the assistants system, as that is recognised in the documentation, so does not concern me as an undetected issue.
Presumably a fairly simple test could be constructed where a prompt of a given length is sent with some predictable response and ran 10 times then check the token usage matches that predicted by tiktoken counter +/- 20%
I will perform that test and get back with the results.
I have made an API call to the GPT-4 preview model containing around 25 tokens of prompt designed to produce 1 token of output, that would create 35-45 tokens of usage typically with boundary markers and such internally, and this is the API billing system page
Thanks for the info, 3.5 is useless to me, I am using it to write code and I need to use 4 for this, it’s not a tool for mass-use, but rather an internal tool for use by my team where they are asking complex technical questions related to writing code. I understand what you’re saying about the context, that is something I will investigate, I had thought each request was standing on it’s own (not inputting content from prior questions) with the tool I’m using but perhaps it’s not. As far as I can tell, it is not using assistants.
Interesting, thanks for that. I’ll keep an eye out over the next 24 hours to see if the usage suddenly spikes as the OP mentioned. I’ll make sure that this account does not get used for anything else in the mean time as a control.
Thanks, I’ll see what the wordpress plugin creator says and also will check more to make sure it is not sending extra context in, each request should stand on it’s own otherwise I can see how this snowballs into a huge amount of context tokens being used.
I see, the other poster @devve is saying that it is with GPT-4-preview, so I need to get a handle on what is occurring. For the next 24 hours I will only be looking at the GPT-4-preview issue, I’ll take a look into yours after that should it still be a problem.
Can I suggest that we split this thread into two as there are now 2 very different issues being talked about here? I’ll do the splitting if you agree @adaptiv
I could be wrong but I don’t think GPT-3.5 was trained to function well with the Assistants framework. Maybe it had some “last-minute” training. I have seen numerous times that it enters a self-loop and maxes out the token context (which I think is what you’re saying as well).
More speculation but with Retrieval I am guessing that they use embeddings to return pockets of data, and then run it through GPT to verify that it answers the question, if not, it tries again.
For JSON mode if it cannot properly create a JSON object it turns insane and enters an infinite loop.
All this speculation because they have told us nothing about how any of it works.
More simply: GPT-4 has a higher chance of success than GPT-3.5. These Assistant tools tend to “force” results by churning tokens until satisfied
It can also be that the “assistants” is not programmed with a quality conversation history that lets the AI see what it has called for functions and the results that it gets back when it is given access to functions, causing it to loop. This, in addition to attempts to use the even-further-degraded 1106 model on top of the assistants that can output garbage max context instead of the function.
You also don’t get shown in a “thread” you retrieve the actual interacting that the model has been doing with retrieval functions they don’t even describe.
Assistants are uncontrolled iterations where OpenAI’s failed programming and implementation abuses your account balance.
Thats some very valuable insight and I learned a lot. But the cost for GPT-4 is still too high to use it for various applications, so we need cheaper models. And for me the 16k context window should be enough. But you were ticking a box with the JSON. We have the command that it should output JSON… Will discuss this with them.
@Foxalabs thank you for your help, this issue is resolved from my end as far as I can tell. From talking to the plugin creator it seems it was passing in a huge amount of previous context with each request, the setting in the plugin that restricts the number of previous messages passed in is named in a confusing way so I had thought it was for something different, after talking to them though I made an adjustment to that and then tested and it is no longer passing in any previous context from what I can tell so I think the issue is resolved for me.