Incorrect API jsonl parsing

bonfiglio.sergio · November 11, 2024, 9:43am

After innumerable tests I think I have found a probable bug in API validation process.
I’m testing the API response for batch purposes, so I created a test file with JSON lines in it. I have tried to upload this file but the API replies with error 400. I have validated the file content with an external jsonl validator that tells that the content is valid json.
The lines lenght (2 lines, for testing) are:
Line 1: 9614 characters
Line 2: 9625 characters
Frankly I don’t know what to do now.
I’m using POSTMAN to send the file. I double-checked each parameter, but Istill continue to receive this message:

{
    "error": {
        "message": "Invalid file format for Batch API. Must be .jsonl",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}

I have a very large project to run but I don’t know if I can go ahead.
It should be so easy for OpenAI organization to put up a JSONL validator that could explain the possible errors done in batch requests jsonl files.
A response like the one I report over here is completely useless. Is there anyone who can help? Thank you in advance!

One update that is very interesting for API people:
I modified the content of the test file putting the examples that are found at
https://platform.openai.com/docs/guides/batch
So now the test file contains the two lines of the example such as:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}

You know what? THis is the reply of the API:

{
    "error": {
        "message": "Invalid file format for Batch API. Must be .jsonl",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}

I think that someone should look at this buggy API…

bonfiglio.sergio · November 11, 2024, 8:11pm

In order to be of some help to the community I answer my own question since I managed to find the problem.
The file containing the batch requests must be formatted in UTF-8
BUT IT MUST NOT CONTAIN THE BOM AT THE BEGINNING!
The BOM is the three byte field that defines the content as UTF-8. But for some strange reason unknown to me the file with the requests MUST NOT CONTAIN THE BOM.
Once stripped the BOM away from the request file everything has started to work.

PaulBellow · November 11, 2024, 8:13pm

Thanks for coming back to let us know. Hopefully this helps someone in the future!

Topic		Replies	Views
Error on tryng to use batches API api , batching	6	1083	November 11, 2024
Error after creating batch job "This line is not parseable as valid JSON" API batch	4	2437	May 2, 2024
ERROR in read_any_format validator: File 'my jsonl file' does not exist API api	6	885	August 14, 2023
Unable to use .JSONL file API	6	2212	April 25, 2021
What is the correct format for the JSON Lines file used for getting answers? Prompting	1	2469	June 13, 2021

Incorrect API jsonl parsing

Related topics