Batch API can be used together with Scale Tier?

We currently subscribe to Scale Tier, but it sits idle at night. We’d like to make use of that downtime by running jobs through the Batch API.

If we call the Batch API with the same API key we use for Scale Tier, will the requests automatically run under our Scale Tier plan?
Or do we need to add something like extra_body when we create the batch? (I’m not even sure how to specify that.)

Any insights would be greatly appreciated.

Hello and welcome to the community!

You can use Flex Processing to leverage the cost savings from the Batch API at slower response times with the additional benefit of prompt caching being available.

https://platform.openai.com/docs/guides/flex-processing

1 Like

Flex price should be available to most organizations. You just submit and get higher latency of needing to crank your timouts way up. Also only applicable to o3 and o4-mini. That is not what is being asked.


The API parameter is service_tier. Since batch is with normal RESTful API calls in the JSONL, it should also accept the parameter without error.

  • If set to ‘auto’, and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
  • If set to ‘auto’, and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
  • If set to ‘default’, the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.

Would using the parameter actually employ scale tier credits when sent to batch? That is certainly an unknown, you’ve hit an indeterminate state in the docs.

It does seem that when a batch is run, the model is mapped to an internal model name like “gpt-4o-mini-batch” that provides the service and discounting (discovered by an errors returning such a model name). That makes an answer even more ambiguous.

(Note the misspelling of guarantee in the API reference.)

That’s probably an “ask the go-to-market account manager that sold it to you.”

2 Likes

Thanks a lot for the clarification!

Flex Processing does sound interesting, but its model lineup is a bit limited, so I’ll probably explore it for other use cases.

What I actually need is to label a huge dataset with gpt-4o through the Batch API. The Batch pricing is already good, but I was hoping Scale Tier might push the cost down even further—that’s why I asked.

Welcome to the dev forum, @Mr.chick.

Requests in the batch API will be processed in accordance with the value of the service_tier for the respective request.

As @_j pointed out, based on the documentation, you need to set its value to default to make sure requests in the batch API don’t eat up your scale tier quota.

If you’re using o3 or o4-mini models, the flex service tier, as shared by @vb, seems like a better way to maximize your savings.

I wonder if it can be used on batch API requests.

1 Like

Or maybe you do want to use scale tier, considering an account organization would have already purchased blocks of scale tier service in increments of several thousands of dollars per input and output unit.

Then:

  • is it accepted?
  • is it applied correctly if employed as an API parameter?
  • is it properly discounted?

We have seen very little engagement here with Enterprise customers, to even know how they are utilizing the SLA service and if it is of value. Maybe NDA is the first thing you are handed…

Thanks.

Looks like it’s still undocumented and a bit murky.
I tried explicitly setting service_tier, but it blew up:

response = self._openAI_client.batches.create(
    input_file_id=input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "test"},
    extra_body={"service_tier": "auto"},
)

On Azure OpenAI you have to deploy a separate “batch”-specific model that isn’t eligible for Scale Tier credits, so the two can’t be combined. I’m starting to think the same restriction applies to OpenAI’s own Batch API as well.

Per your recommendation, I’ll reach out to our GTM account manager and see what they can confirm.

1 Like

When using the scale tier plan a organization is using pre-paid input and output (token) units.

Scale Tier lets you purchase a set number of API input and output tokens per minute (known as “token units”) upfront for access to one dedicated model snapshot.

The goal is to realize low latency, high reliability and large scale.

When using the Batch API a job is instead waiting for any server to be available for a job waiting in the queue.

As such, you most likely don’t want to use the scale tier token units to be spend on batch jobs.

1 Like

You have discovered the understood nomenclature I employed earlier.

You overlooked the understanding and purposefulness of the desired off-hours usage in the first post, though.

The determination depends on how oversubscribed one is. One may simply want the utility of the batches feature, having plenty of overhead left to fund the calls even if not discounted.

As an analogy, I could recommend for batches to use a project that has opted out of complementary tokens for training data. Save the complementary tokens for use that wouldn’t be discounted. However, if you are not making adequate use of free tokens in the 24 hour window, you are not losing anything by batching to consume them.

The key point is that it seems contradictory to pay for reserved resources and then combine them with a service that waits for any available server, as this contradicts the purpose for which token units are purchased.

This issue isn’t fully documented, so the suggestion to contact the account manager is valid.

1 Like