Batch API custom_id does not support UUID?

raphaelcosta · May 10, 2024, 9:25pm

I tried to use UUID for the “custom_id” of Batch API, but I got “null” for all the responses when the batch job completed. It did work with simple numerical IDs.

There is any limit on the custom_id?

_j · May 10, 2024, 10:43pm

There is a limit. A user found that UUID does not work.

But there is no limit in the yml spec:
BatchRequestInput:
type: object
description: The per-line object of the batch input file
properties:
custom_id:
type: string
description: >-
A developer-provided per-request id that will be used to match outputs to inputs. Must be unique
for each request in a batch.
method:
type: string
enum:
- POST
description: The HTTP method to be used for the request. Currently only POST is supported.

And we can’t actually see what schema the batch processor validates against.

raphaelcosta · May 11, 2024, 3:01pm

It’s also not generating a validation error, it’s proceeding with the LLM batch generation and returning blank custom_ids.

_j · May 11, 2024, 3:19pm

Another topic recently reported the same thing, although without an example of what format was attempted for custom_id, or verification that the correct key name was actually used.

The OpenAI staff member that introduced batch has not been seen in May on the forum. @nikunj might be someone to tag to ensure that:

Valid and invalid custom_id string is documented;
The uploader performs validation similar to fine-tune, checking custom_id strings and endpoint specs, with the ‘create’ submission blocked until this file status is ready;
The batch processor also performs job validation, not submitting if custom_id cannot be read or recorded;
A robust deployment plan is followed in the future to prevent these mistakes, and everyone with empty custom_ids is credited for at least the call.

Now: did you get double-billed the regular price also, or did they at least fix that…

circa1 · May 11, 2024, 3:37pm

It cost about 8 cents for around 100,000 1-3 sentence embeddings. I don’t know what pricing model that is but seems pretty cheap to me.

Here’s an example of a line from one of my batch input files:

{“custom_id”: “1171863263651299388_content”, “method”: “POST”, “url”: “/v1/embeddings”, “body”: {“model”: “text-embedding-3-large”, “input”: “paal is my favourite”}}

I see now there is an underscore. However, the documentation does not mention anything about limiting custom IDs to alphanumeric strings only.

I have no current batch jobs, so I submitted them again without underscores. Hilariously, they were rejected because my queue is full -_-

Batch seems buggy af at least for embeddings rn.

david.attwater · July 1, 2024, 12:58pm

I have just seen a similar problem. I submitted a batch of 50,000 gpt-3.5-turbo-0125’ text completion prompts which successfully validated. When I ran it, it processed the first 4200 or so of them and then started failing all subsequent ones. I cancelled the job and downloaded the results to date. They contained successful generated results but all the custom_id fields in the returned jsonl were null.

shieldine · July 3, 2024, 12:07pm

I just had a similar issue.

Before sending a huge batch, I sent a small one (~30 requests) containing chat completions with custom_ids of pattern {number}_{number}_{number} - yes, with underscores. With the small batch, it worked just fine.

Since everything was fine, i sent a huge batch over (~25k requests) that I had to cancel due to billing limits - it stopped at 7k.
With this batch, all custom_ids are null - even those that matched the requests of the test batch. The responses are all valid and like I requested, but I can’t match them to the inputs now.

I can’t tell if it’s due to the cancellation or what this depends on otherwise. Would be great if anyone had some insight

renyuneyun · October 7, 2024, 6:24pm

I’m encountering null for custom_id today as well, despite in an extremely simple case for a completed batch job:

custom_id was request-0, not a UUID;
Only one data point exists in this batch job.

I do have a lot of successful batch jobs previously, including those containing only one data point. All those jobs use the same style of custom_id. This behaviour is very strange.

tbalaji8822 · October 27, 2024, 4:23am

I had tried the same thing, like assigning the uuids to the custom_ids it got worked.
Only thing is the results will be random

Topic		Replies	Views
Batch processing custom_id issue Bugs	2	395	October 26, 2024
Batch: The custom_id for this request is a duplicate of another request API batch	2	1165	April 21, 2024
Batches don't work at all Bugs batch-api	18	2122	November 28, 2024
Batch API instability - randomly failing on 403 Bugs batch-api	1	178	September 13, 2025
Missing custom_id on batches cancelled output file data Bugs	0	77	February 14, 2025

Batch API custom_id does not support UUID?

Related topics