Since September 24th, it has started consuming more tokens than usual

Hello,

I’m experiencing a significant issue related to token usage in my app. Since around September 24th, it has been consuming more tokens than usual, despite no changes to the code.

Here’s an example from a screenshot:

While debugging the interactions, I noticed responses containing a high number of tokens, but the choice message consists of empty strings.

The response should be in JSON format, but it’s not.

Can anyone assist with this issue? It’s consuming a lot of tokens and money, and it’s not due to any error on our side.

Example:

{
    "choices": [
        {
            "message": {
                "content": "\n\n\n \n\n \n\n \n\n\n \n\n \n\n \n\n\n \n\n \n\n \n\n\n \n\n \n\n\n \n\n \n\n \n\n \n\n .... [cut but it had 16k token of \n\n\n]",
                "refusal": null,
                "role": "assistant"
            },
            "logprobs": null,
            "finish_reason": "length",
            "index": 0
        }
    ],
    "model": "gpt-4o-mini-2024-07-18",
    "created": 1727290866,
    "object": "chat.completion",
    "system_fingerprint": "fp_3a215618e8",
    "id": "chatcmpl-ABRY2y5lIR6vvrnCFsyDx3hWhX6u7",
    "message": "ChatAI_v2 ChatAI.getChatResponse responseData",
    "usage": {
        "completion_tokens": 16384,
        "total_tokens": 17991,
        "completion_tokens_details": {
            "reasoning_tokens": 0
        },
        "prompt_tokens": 1607
    }
}

Hi @victortavernari !

Most of us are enthusiasts/volunteers here, I suggest you go to https://help.openai.com/en/ and use the little chat icon in the bottom right to file a ticket. This looks very strange!

3 Likes

It looks like “mini” is getting more mini on you.

Are you trying to use json_object response mode?

That tries to enforce a json by model training, but when it fails, it fails badly. You need to over-specify the JSON format when using it, not relax your description of the JSON. The response format parameter shouldn’t be used if the AI can’t already 100% produce the desired JSON format reliably. Irony?

You can also increase the repetition_penalty so this doesn’t go on so long, along with setting a max_tokens value that could cover any response size and no more.