Why is service_tier "flex" not available?

Somehow the “flex” option isn’t available for my organization. I can’t find anything in the documentation that would explain why.

Whenever I want to create a new project, I MUST choose between “default” and “priority”, there are no other options. And this setting seems to override any parameters I send with my API requests, thereby making it impossible for me to use the “flex” service tier. Is this intentional? What am I missing? What are the necessary requirements for an “organization” to be able to use the “flex” service tier?

"service_tier": "flex" is a parameter option that you can use in API calls.

You can set a permanent doubling of the price in the platform site UI, but not a (semi) permanent halving of costs.

API calls made using flex to the models that support them are also lower priority, without a service guarantee. You will likely need code specific to handling the calls, with longer timeouts, pushing the tasks to background mode, and with determination of what retry technique you’d wish to use, up to promoting the next retry’s service tier level.

Review if the flex pricing is available for the model class (o3 and up) by the discount and the model being listed:

Thank you for trying to help, but I’m already aware of how “flex” works. I’ve been using it successfully with other providers with both Gemini and GPT models that support it.

The problem is that this parameter gets ignored when I’m using OpenAI directly, and I suspect that it’s because of the “default” setting that’s automatically applied to a new project when created. I’m guessing that this overrides any conflicting parameters that get sent with the API request. At least that would explain why the parameter gets ignored.

If that’s not the reason, what other possible reasons could exist for OpenAI ignoring the service_tier parameter?

And I’ve obviously already searched the official documentation before making this post.


{
“messages”: […],
“abortSignal”: {},
“headers”: {
“X-Title”: “Cherry Studio”
},
“providerOptions”: {
“openai”: {
“reasoningEffort”: “none”,
“reasoningSummary”: “auto”,
“forceReasoning”: true,
“store”: true,
“service_tier”: “flex”
}
},
“maxRetries”: 0
}

Obviously the “service_tier” parameter gets ignored somehow, while the parameters “reasoningEffort” and “store” get accepted.

I would just like to find out WHY they are being ignored.

First: we have to ask the API and the models to “flex” their brains:

gpt-5-mini
tier: flex
{'input_tokens': 34, 'input_tokens_details': {'cached_tokens': 0}, 'output_tokens': 86, 'output_tokens_details': {'reasoning_tokens': 0}, 'total_tokens': 120}
gpt-5.2
tier: flex
{'input_tokens': 34, 'input_tokens_details': {'cached_tokens': 0}, 'output_tokens': 6, 'output_tokens_details': {'reasoning_tokens': 0}, 'total_tokens': 40}
gpt-5.3-codex FAIL! Error code: 400 - {'error': {'message': 'Flex is not available for this model.', 'type': 'invalid_request_error', 'param': None, 'code': None}}
gpt-5.4
tier: flex
{'input_tokens': 34, 'input_tokens_details': {'cached_tokens': 0}, 'output_tokens': 20, 'output_tokens_details': {'reasoning_tokens': 12}, 'total_tokens': 54}
gpt-5.4-mini
tier: flex
{'input_tokens': 34, 'input_tokens_details': {'cached_tokens': 0}, 'output_tokens': 19, 'output_tokens_details': {'reasoning_tokens': 11}, 'total_tokens': 53}
gpt-5.5
tier: flex
{'input_tokens': 34, 'input_tokens_details': {'cached_tokens': 0}, 'output_tokens': 6, 'output_tokens_details': {'reasoning_tokens': 0}, 'total_tokens': 40}

You’ll see how being denied the “flex” tier looks on my attempt at codex.

How did I make the API calls? Through the default project - at default:

You can try my same snippet, see what the API is returning back at you for making the same API calls - and you can throw in a check of the API key in your environment and the associated project in the OpenAI platform site that contains it.

from openai import OpenAI

client = OpenAI(timeout=240, max_retries=0)
models = ["gpt-5-mini", "gpt-5.2", "gpt-5.4", "gpt-5.4-mini", "gpt-5.5"]
input_messages = [
    {
        "type": "message",
        "role": "user",
        "content": [
            {
                "type": "input_text",
                "text": ("What is the Answer to the Ultimate Question of Life, "
                         "the Universe, and Everything?"),
            }
        ],
    }
]
for model in models:
    response = None
    try:
        response = client.responses.with_raw_response.create(
            model=model,
            input=input_messages,
            instructions="Answer helpfully - and coyly",
            max_output_tokens=3456,
            store=False,
            reasoning={"effort": "low"},
            text={"verbosity": "low"},
            service_tier="flex",
        )
        print(f"{model}")
        #print(f"Request: {response.headers.get("x-request-id")}")
        #print(f"Response:\n{response.parse().output_text}")
        print(f"tier: {response.parse().service_tier}")
        print(response.parse().usage.model_dump())
    except Exception as e:
        print(f"{model} FAIL! {e}")
        try:
            print(f"Request: {response.headers.get("x-request-id")}")
        except:
            pass

Hopefully the diagnosis and fix is all in your control!

Thank you very much again for trying to help! I appreciate you taking the time to test it yourself.

I think I’ve tried everything I can on my end. No matter HOW I make these API calls, the “service_tier” parameter simply gets ignored. All other parameters (reasoning effort, verbosity etc.) seem to work fine. Regardless of Responses API or Chat Completions API.

I’m starting to suspect there might be some kind of limitation on my account or organization. Could it be that “flex” is only available to higher tier clients? I’m verified, but only on “usage tier 1”. Could that be the issue?

Anyway, thanks again for trying to help. At least now I know that the “default” setting doesn’t override parameters from the API requests.

The documentation still calls it “experimental”, but models as new as April 23 you can see delivering the discount. It’s not as obscure a value as “scale tier”, and it’s right there in everybody’s price list, so should be offered universally.

You seem to show a different shape than an OpenAI API call in your prior post, why I offered a snippet you can run in Python, locally, using the OpenAI SDK library module.

You are trusting other software to translate and pass through non-existent parameters such as "forceReasoning" or "reasoningEffort". It seems that if someone wrote that camelly code for a parameter that is actually "reasoning": {"effort": "none"}, and they were aware of the full shape of the API for Responses, they might call it … serviceTier?

Thank you once again!

You were right. There was a problem with the app I used to make the API calls (Cherry Studio).

It looks like they recently added a native feature to explicitly set the “service_tier” for requests to OpenAI. It’s basically hidden unless you know where to look, and its default setting is, obviously, “default”. Even though I was setting the “service_tier” parameter correctly in the appropriate custom settings, the built-in switch that was still set to “default” must have overridden it.

That’s why it was working with other providers (not just Gemini, but also GPT models), since Cherry Studio only activates this built-in feature when you use the official OpenAI API as the provider. Which I did.

So, mystery solved.

The reason I didn’t notice this earlier was the extreme delay in the OpenAI “usage” dashboard. I was already testing it with LibreChat to rule out CherryStudio, but due to the delay, I couldn’t see any new “flex” requests in the dashboard. They only appeared much later.