Something seems to be very wrong, chatgpt4-turbo context tokens being used are much higher than expected

I am using a wordpress plugin that makes chatgpt4-turbo requests. This plugin tracks all the chats made and that seems to match with the total number of API request, or at least is close, however the number of context tokens is incredibly high compared to what I expect. For example, the past chat logs for the last day, including both request and response, were about 6,000 words. I would expect this to be maybe 8,000 tokens but maybe only half were part of the context as the other half were the response. However, it says I have used over 78,000 context tokens.

At this point I have had to put very tight caps on the usage because I keep hitting my previous limit. This issue is very concerning, it seems something is very wrong right now with this API.

I am happy to provide more information about the specific chat requests made, etc I may just need to scrub the data a bit.

Hi, are you making use of the assistants endpoint or just the standard completions one? Assistants may be making use of stored information and or building message threads of prior messages that compound.

I’m unsure how the wordpress plugin works, it is not made by OpenAI so you would need to go to them and ask.

This is the version it is using: GPT-4-1106-preview.

According to this page: New models and developer products announced at DevDay, this version is: " GPT-4 Turbo with 128K context" so I don’t see how this relates to the assistants API, I have had no intention of using that and, as far as I can tell, I am not using a version that uses assistants.

I saw in another thread that an issue was acknowledged and is being investigated, it seems I am not the only one that has this issue.

The wordpress plugin creator has been adamant that it is not them making the extra requests and that what I see in the logs they have is what was sent over, nothing else.

Also, again, the number of API requests seem to match what is expected, so it isn’t making extra requests, it seems somehow a huge amount of extra context is being added into every request.

This is a very serious issue so I ask that you please continue to investigate.

Do you happen to know the thread an issue was acknowledged? I am not currently aware of a token overuse issue that is not related to the assistants system, as that is recognised in the documentation, so does not concern me as an undetected issue.

Presumably a fairly simple test could be constructed where a prompt of a given length is sent with some predictable response and ran 10 times then check the token usage matches that predicted by tiktoken counter +/- 20%

I will perform that test and get back with the results.

I have made an API call to the GPT-4 preview model containing around 25 tokens of prompt designed to produce 1 token of output, that would create 35-45 tokens of usage typically with boundary markers and such internally, and this is the API billing system page

Which matches with expectation. So I am confident that there is no systemic issue causing the high token count.

The creator of the plugin is welcome to visit and we can discus possible issues if they are willing.

[EDIT]

I have now ran the test 10 times and the resulting token usage report is now

1 Like

Thanks for the info, 3.5 is useless to me, I am using it to write code and I need to use 4 for this, it’s not a tool for mass-use, but rather an internal tool for use by my team where they are asking complex technical questions related to writing code. I understand what you’re saying about the context, that is something I will investigate, I had thought each request was standing on it’s own (not inputting content from prior questions) with the tool I’m using but perhaps it’s not. As far as I can tell, it is not using assistants.

Thank you for the update, this is the thread I was referring to: # of tokens used and costs randomly exploded over night - #14 by adaptiv

I have an email thread with the creator of the plugin so I can see if they are able to provide any other information.

1 Like

Interesting, thanks for that. I’ll keep an eye out over the next 24 hours to see if the usage suddenly spikes as the OP mentioned. I’ll make sure that this account does not get used for anything else in the mean time as a control.

1 Like

Yup, all the previous history. With 128k context window it can add up quick!

So it’s not OpenAI’s problem, really. The plugin developer should be providing you options for context control if they are using ChatCompletions.

Assistants is a whole different ball game though. You may be using it without knowing.

1 Like

Thanks, I’ll see what the wordpress plugin creator says and also will check more to make sure it is not sending extra context in, each request should stand on it’s own otherwise I can see how this snowballs into a huge amount of context tokens being used.

1 Like

It would be my guess that for one reason or another prior messages are being appended to allow for conversational context.

1 Like

The issue we do see (not related to your plugin) are with the GPT-3.5 model. With GPT-4 all seems fine.

3.5 with Assistants? Seems like a bad idea tbh

I see, the other poster @devve is saying that it is with GPT-4-preview, so I need to get a handle on what is occurring. For the next 24 hours I will only be looking at the GPT-4-preview issue, I’ll take a look into yours after that should it still be a problem.

Why do you think that? Because of context window? Both are available to the assistant and I would argue that it is intentional by OpenAi.

Can I suggest that we split this thread into two as there are now 2 very different issues being talked about here? I’ll do the splitting if you agree @adaptiv

I could be wrong but I don’t think GPT-3.5 was trained to function well with the Assistants framework. Maybe it had some “last-minute” training. I have seen numerous times that it enters a self-loop and maxes out the token context (which I think is what you’re saying as well).

More speculation but with Retrieval I am guessing that they use embeddings to return pockets of data, and then run it through GPT to verify that it answers the question, if not, it tries again.

For JSON mode if it cannot properly create a JSON object it turns insane and enters an infinite loop.

All this speculation because they have told us nothing about how any of it works.

More simply: GPT-4 has a higher chance of success than GPT-3.5. These Assistant tools tend to “force” results by churning tokens until satisfied

1 Like

It can also be that the “assistants” is not programmed with a quality conversation history that lets the AI see what it has called for functions and the results that it gets back when it is given access to functions, causing it to loop. This, in addition to attempts to use the even-further-degraded 1106 model on top of the assistants that can output garbage max context instead of the function.

You also don’t get shown in a “thread” you retrieve the actual interacting that the model has been doing with retrieval functions they don’t even describe.

Assistants are uncontrolled iterations where OpenAI’s failed programming and implementation abuses your account balance.

Thats some very valuable insight and I learned a lot. But the cost for GPT-4 is still too high to use it for various applications, so we need cheaper models. And for me the 16k context window should be enough. But you were ticking a box with the JSON. We have the command that it should output JSON… Will discuss this with them.

@Foxalabs thank you for your help, this issue is resolved from my end as far as I can tell. From talking to the plugin creator it seems it was passing in a huge amount of previous context with each request, the setting in the plugin that restricts the number of previous messages passed in is named in a confusing way so I had thought it was for something different, after talking to them though I made an adjustment to that and then tested and it is no longer passing in any previous context from what I can tell so I think the issue is resolved for me.

1 Like