Does gpt-4o-mini-search-preview have a completion token limit of around 1530?

Hi everyone,

I’ve been testing with gpt-4o-mini-search-preview and gpt-4o-search-preview and whenever I try to generate a longer completion, it looks like the completion tokens are always limited to around 1530 - 1532 which causes the content to be cut off. I’ve tried this with normal text response, and json_schema response.

Using the exact same options with gpt-4o-mini, does not have this issue.

Any ideas if this is the expected behaviour?

3 Likes

Hi @herman.schutte
I am still facing the same issue, did you figure out this issue?

Did you try changing the search context size?

Available values:

  • high: Most comprehensive context, highest cost, slower response.
  • medium (default): Balanced context, cost, and latency.
  • low: Least context, lowest cost, fastest response, but potentially lower answer quality.

After some experimenting, I can confirm that it seems to be truncating the results even with search_context_size = high.

It is difficult to make it fill the context window beyond ~1500 tokens because it keeps summarizing things really hard to fill into that.

Request example

response = client.chat.completions.create(
  model="gpt-4o-mini-search-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": input_text,
        }
      ]
    },
  ],
  response_format={
    "type": "text"
  },
  web_search_options={
    "search_context_size": "high",
    "user_location": {
      "type": "approximate",
      "approximate": {
        "country": "US"
      }
    }
  },
  max_completion_tokens=15000,
)

print(response.choices[0].message.content)
print(response.usage)

Total tokens: 1935 Response length (text): 6765 characters

Completion response details
{
    "id": "chatcmpl-61afc24c-6f71-42ae-a192-904065637863",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "(redacted)......**[Alignment faking in large language models](https://www.anthropic.com/news/alignment-f ",  <---- truncated
                "refusal": null,
                "role": "assistant",
                "annotations": []
            }
        }
    ],
    "created": 1749161600,
    "model": "gpt-4o-mini-search-preview-2025-03-11",
    "object": "chat.completion",
    "system_fingerprint": "",
    "usage": {
        "completion_tokens": 1935,
        "prompt_tokens": 105,
        "total_tokens": 2040,
        "completion_tokens_details": {
            "accepted_prediction_tokens": 0,
            "audio_tokens": 0,
            "reasoning_tokens": 0,
            "rejected_prediction_tokens": 0
        },
        "prompt_tokens_details": {
            "audio_tokens": 0,
            "cached_tokens": 0
        }
    }
}

I’m also facing this issue. It seems like tokens are being truncated at a certain limit and increasing output tokens is not obeyed, any potential fix here?