I tried to use UUID for the “custom_id” of Batch API, but I got “null” for all the responses when the batch job completed. It did work with simple numerical IDs.
There is any limit on the custom_id?
I tried to use UUID for the “custom_id” of Batch API, but I got “null” for all the responses when the batch job completed. It did work with simple numerical IDs.
There is any limit on the custom_id?
There is a limit. A user found that UUID does not work. ![]()
But there is no limit in the yml spec:
BatchRequestInput:
type: object
description: The per-line object of the batch input file
properties:
custom_id:
type: string
description: >-
A developer-provided per-request id that will be used to match outputs to inputs. Must be unique
for each request in a batch.
method:
type: string
enum:
- POST
description: The HTTP method to be used for the request. Currently only POST is supported.
And we can’t actually see what schema the batch processor validates against.
It’s also not generating a validation error, it’s proceeding with the LLM batch generation and returning blank custom_ids.
Another topic recently reported the same thing, although without an example of what format was attempted for custom_id, or verification that the correct key name was actually used.
The OpenAI staff member that introduced batch has not been seen in May on the forum. @nikunj might be someone to tag to ensure that:
Now: did you get double-billed the regular price also, or did they at least fix that…
It cost about 8 cents for around 100,000 1-3 sentence embeddings. I don’t know what pricing model that is but seems pretty cheap to me.
Here’s an example of a line from one of my batch input files:
{“custom_id”: “1171863263651299388_content”, “method”: “POST”, “url”: “/v1/embeddings”, “body”: {“model”: “text-embedding-3-large”, “input”: “paal is my favourite”}}
I see now there is an underscore. However, the documentation does not mention anything about limiting custom IDs to alphanumeric strings only.
I have no current batch jobs, so I submitted them again without underscores. Hilariously, they were rejected because my queue is full -_-
Batch seems buggy af at least for embeddings rn.
I have just seen a similar problem. I submitted a batch of 50,000 gpt-3.5-turbo-0125’ text completion prompts which successfully validated. When I ran it, it processed the first 4200 or so of them and then started failing all subsequent ones. I cancelled the job and downloaded the results to date. They contained successful generated results but all the custom_id fields in the returned jsonl were null.
I just had a similar issue.
Before sending a huge batch, I sent a small one (~30 requests) containing chat completions with custom_ids of pattern {number}_{number}_{number} - yes, with underscores. With the small batch, it worked just fine.
Since everything was fine, i sent a huge batch over (~25k requests) that I had to cancel due to billing limits - it stopped at 7k.
With this batch, all custom_ids are null - even those that matched the requests of the test batch. The responses are all valid and like I requested, but I can’t match them to the inputs now.
I can’t tell if it’s due to the cancellation or what this depends on otherwise. Would be great if anyone had some insight
I’m encountering null for custom_id today as well, despite in an extremely simple case for a completed batch job:
custom_id was request-0, not a UUID;I do have a lot of successful batch jobs previously, including those containing only one data point. All those jobs use the same style of custom_id. This behaviour is very strange.
I had tried the same thing, like assigning the uuids to the custom_ids it got worked.
Only thing is the results will be random