After some experimenting, I can confirm that it seems to be truncating the results even with search_context_size = high.
It is difficult to make it fill the context window beyond ~1500 tokens because it keeps summarizing things really hard to fill into that.
Request example
response = client.chat.completions.create(
model="gpt-4o-mini-search-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": input_text,
}
]
},
],
response_format={
"type": "text"
},
web_search_options={
"search_context_size": "high",
"user_location": {
"type": "approximate",
"approximate": {
"country": "US"
}
}
},
max_completion_tokens=15000,
)
print(response.choices[0].message.content)
print(response.usage)
Total tokens: 1935 Response length (text): 6765 characters
Completion response details
{
"id": "chatcmpl-61afc24c-6f71-42ae-a192-904065637863",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "(redacted)......**[Alignment faking in large language models](https://www.anthropic.com/news/alignment-f ", <---- truncated
"refusal": null,
"role": "assistant",
"annotations": []
}
}
],
"created": 1749161600,
"model": "gpt-4o-mini-search-preview-2025-03-11",
"object": "chat.completion",
"system_fingerprint": "",
"usage": {
"completion_tokens": 1935,
"prompt_tokens": 105,
"total_tokens": 2040,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
}
}
}