Batch API custom_id does not support UUID?

I tried to use UUID for the “custom_id” of Batch API, but I got “null” for all the responses when the batch job completed. It did work with simple numerical IDs.

There is any limit on the custom_id?

There is a limit. A user found that UUID does not work. :smile:


But there is no limit in the yml spec:
BatchRequestInput:
type: object
description: The per-line object of the batch input file
properties:
custom_id:
type: string
description: >-
A developer-provided per-request id that will be used to match outputs to inputs. Must be unique
for each request in a batch.
method:
type: string
enum:
- POST
description: The HTTP method to be used for the request. Currently only POST is supported.

And we can’t actually see what schema the batch processor validates against.

It’s also not generating a validation error, it’s proceeding with the LLM batch generation and returning blank custom_ids.

Another topic recently reported the same thing, although without an example of what format was attempted for custom_id, or verification that the correct key name was actually used.

The OpenAI staff member that introduced batch has not been seen in May on the forum. @nikunj might be someone to tag to ensure that:

  • Valid and invalid custom_id string is documented;
  • The uploader performs validation similar to fine-tune, checking custom_id strings and endpoint specs, with the ‘create’ submission blocked until this file status is ready;
  • The batch processor also performs job validation, not submitting if custom_id cannot be read or recorded;
  • A robust deployment plan is followed in the future to prevent these mistakes, and everyone with empty custom_ids is credited for at least the call.

Now: did you get double-billed the regular price also, or did they at least fix that…

It cost about 8 cents for around 100,000 1-3 sentence embeddings. I don’t know what pricing model that is but seems pretty cheap to me.

Here’s an example of a line from one of my batch input files:

{“custom_id”: “1171863263651299388_content”, “method”: “POST”, “url”: “/v1/embeddings”, “body”: {“model”: “text-embedding-3-large”, “input”: “paal is my favourite”}}

I see now there is an underscore. However, the documentation does not mention anything about limiting custom IDs to alphanumeric strings only.

I have no current batch jobs, so I submitted them again without underscores. Hilariously, they were rejected because my queue is full -_-

Batch seems buggy af at least for embeddings rn.

I have just seen a similar problem. I submitted a batch of 50,000 gpt-3.5-turbo-0125’ text completion prompts which successfully validated. When I ran it, it processed the first 4200 or so of them and then started failing all subsequent ones. I cancelled the job and downloaded the results to date. They contained successful generated results but all the custom_id fields in the returned jsonl were null.

I just had a similar issue.

Before sending a huge batch, I sent a small one (~30 requests) containing chat completions with custom_ids of pattern {number}_{number}_{number} - yes, with underscores. With the small batch, it worked just fine.

Since everything was fine, i sent a huge batch over (~25k requests) that I had to cancel due to billing limits - it stopped at 7k.
With this batch, all custom_ids are null - even those that matched the requests of the test batch. The responses are all valid and like I requested, but I can’t match them to the inputs now.

I can’t tell if it’s due to the cancellation or what this depends on otherwise. Would be great if anyone had some insight

I’m encountering null for custom_id today as well, despite in an extremely simple case for a completed batch job:

  1. custom_id was request-0, not a UUID;
  2. Only one data point exists in this batch job.

I do have a lot of successful batch jobs previously, including those containing only one data point. All those jobs use the same style of custom_id. This behaviour is very strange.

I had tried the same thing, like assigning the uuids to the custom_ids it got worked.
Only thing is the results will be random