Error after creating batch job "This line is not parseable as valid JSON"

Hi,

I am trying to use the batch API, but I get the error “This line is not parseable as valid JSON”, which I do not think is the “real” error. At least, I don’t know what is wrong.

This is what the first line of my batch.jsonl file looks like:

{"custom_id": "request-0", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4-0613", "messages": [{"role": "system", "content": "You will be provided with a piece of text delimited by triple quotes (\"\"\") as input.  The input should contain a name (only given name, only family name or both).  Based purely on the name, can you tell me whether the origin of the name is \"African\", \"European\", or \"Other\". If the input does not contain a name, reply instead with \"Not a name\". Do not reply with anything else.  Example: Name: \"\"\"Oduro\"\"\" Origin: African  Name: \"\"\"31.12.1998\"\"\" Origin: No name  Name: \"\"\"Zhang Wei\"\"\" Origin: Other  Name: \"\"\"Gjengedal\"\"\" Origin: European"}, {"role": "user", "content": "Name: \"\"\"Kraft\"\"\"Origin:"}]}}

I created the JSON using the python json.dumps(...) function. I use the same fields as described in requestInput here.

My script works like this:

  1. Create the JSON Lines file
with open(batch_file_path, mode="w", encoding="utf-8-sig") as batch_file:
    # line is a dict
    for line in lines:
        batch_file.write(f"{json.dumps(line)}\n")
  1. Upload the file
client.files.create(file=open(batch_file_path, "rb"), purpose="batch")

The file is uploaded, I get the status “processed” and a file ID.

  1. Create the batch job
client.batches.create(
        input_file_id=input_file_id,
        endpoint="/v1/chat/completions",
        completion_window="24h",
    )

I use the same file ID as the one that was returned in step 2. The batch job is successfully created, and I get the status “validated”.

  1. Check the status
batches = client.batches.list()
batch = [b for b in batches if batch_id == b.id][0]
pp(batch)

This is where I get the error. My batch object now looks like this:

Batch(
    id="batch_hQ71KCmV6qmap8kpCJKK0Hrk",
    completion_window="24h",
    created_at=1714119194,
    endpoint="/v1/chat/completions",
    input_file_id="file-fkUnQ87TkhobLCkWOUAQQcnv",
    object="batch",
    status="failed",
    cancelled_at=None,
    cancelling_at=None,
    completed_at=None,
    error_file_id=None,
    errors=Errors(
        data=[
            BatchError(
                code="invalid_json_line",
                line=1,
                message="This line is not parseable as valid JSON.",
                param=None,
            )
        ],
        object="list",
    ),
    expired_at=None,
    expires_at=1714205594,
    failed_at=1714119194,
    finalizing_at=None,
    in_progress_at=None,
    metadata=None,
    output_file_id=None,
    request_counts=BatchRequestCounts(completed=0, failed=0, total=0),
)

I am stuck. Copy & pasting the first line into a code editor & formatting works, so the line is definetely parseable as JSON. So the issue seems to be somewhere else, but just cannot find it.

Thanks in advance for reading.

Kind Regards,
Rafael

I now get this error when trying to upload the batch file:

Traceback (most recent call last):
  File "C:\Users\rb\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\241.14494.241\plugins\python\helpers\pydev\pydevd.py", line 1535, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\rb\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\241.14494.241\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\git\OFB_Code\scripts\classify_names_batch.py", line 146, in <module>
    upload_batch_file()
  File "C:\git\OFB_Code\scripts\classify_names_batch.py", line 100, in upload_batch_file
    response = client.files.create(file=open(file_path, "rb"), purpose="batch")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\OFB_Code\venv\Lib\site-packages\openai\resources\files.py", line 109, in create
    return self._post(
           ^^^^^^^^^^^
  File "C:\git\OFB_Code\venv\Lib\site-packages\openai\_base_client.py", line 1232, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\git\OFB_Code\venv\Lib\site-packages\openai\_base_client.py", line 921, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "C:\git\OFB_Code\venv\Lib\site-packages\openai\_base_client.py", line 1012, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': 'Invalid file format for Batch API. Must be .jsonl', 'type': 'invalid_request_error', 'param': None, 'code': None}}

I am using the newest version of the openai package (1.23.6). The file that I am uploading has the extension .jsonl, and only has one JSON object per line. How is that check implemented, because it is clearly not working correctly?

Same issue over here, valid JSON getting flagged as unparseable: This line is not parseable as valid JSON

See if GPT-4-0314 has come up with the solution for passing a valid JSON within the valid JSON…


The JSON object you provided has some issues with escaping double quotes and formatting. To make the “body” value pass JSON validation independently, you need to properly escape the double quotes and ensure the correct formatting. Here’s the corrected JSON object:

{
  "custom_id": "request-0",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": "{\"model\": \"gpt-4-0613\", \"messages\": [{\"role\": \"system\", \"content\": \"You will be provided with a piece of text delimited by triple quotes (\\\"\\\"\\\") as input.  The input should contain a name (only given name, only family name or both).  Based purely on the name, can you tell me whether the origin of the name is \\\"African\\\", \\\"European\\\", or \\\"Other\\\". If the input does not contain a name, reply instead with \\\"Not a name\\\". Do not reply with anything else.  Example: Name: \\\"\\\"\\\"Oduro\\\"\\\"\\\" Origin: African  Name: \\\"\\\"\\\"31.12.1998\\\"\\\"\\\" Origin: No name  Name: \\\"\\\"\\\"Zhang Wei\\\"\\\"\\\" Origin: Other  Name: \\\"\\\"\\\"Gjengedal\\\"\\\"\\\" Origin: European\"}, {\"role\": \"user\", \"content\": \"Name: \\\"\\\"\\\"Kraft\\\"\\\"\\\"Origin:\"}]}"
}

Now, when you extract and parse the “body” value, it will be a valid JSON object:

{
  "model": "gpt-4-0613",
  "messages": [
    {
      "role": "system",
      "content": "You will be provided with a piece of text delimited by triple quotes (\"\"\") as input.  The input should contain a name (only given name, only family name or both).  Based purely on the name, can you tell me whether the origin of the name is \"African\", \"European\", or \"Other\". If the input does not contain a name, reply instead with \"Not a name\". Do not reply with anything else.  Example: Name: \"\"\"Oduro\"\"\" Origin: African  Name: \"\"\"31.12.1998\"\"\" Origin: No name  Name: \"\"\"Zhang Wei\"\"\" Origin: Other  Name: \"\"\"Gjengedal\"\"\" Origin: European"
    },
    {
      "role": "user",
      "content": "Name: \"\"\"Kraft\"\"\"Origin:"
    }
  ]
}

Here’s the code that demonstrates reading the JSON body from a string, validating it, and then creating the final API request with proper escaping. The variable here is already escaped twice for the sake of showing it in Python without single quotes, but if you were reading this body string as JSON from individual request files, that body would only have the quote characters within contents escaped.

import json

def create_json(custom_id, body_string):
    # Validate the input body_string JSON
    body_dict = json.loads(body_string)

    # Create the final API request with proper escaping
    data = {
        "custom_id": custom_id,
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": body_dict
    }
    json_string = json.dumps(data)
    return json_string

# Example usage
custom_id = "request-0"
json_body = "{\"model\": \"gpt-4-0613\", \"messages\": [{\"role\": \"system\", \"content\": \"You will be provided with a piece of text delimited by triple quotes (\\\"\\\"\\\") as input.  The input should contain a name (only given name, only family name or both).  Based purely on the name, can you tell me whether the origin of the name is \\\"African\\\", \\\"European\\\", or \\\"Other\\\". If the input does not contain a name, reply instead with \\\"Not a name\\\". Do not reply with anything else.  Example: Name: \\\"\\\"\\\"Oduro\\\"\\\"\\\" Origin: African  Name: \\\"\\\"\\\"31.12.1998\\\"\\\"\\\" Origin: No name  Name: \\\"\\\"\\\"Zhang Wei\\\"\\\"\\\" Origin: Other  Name: \\\"\\\"\\\"Gjengedal\\\"\\\"\\\" Origin: European\"}, {\"role\": \"user\", \"content\": \"Name: \\\"\\\"\\\"Kraft\\\"\\\"\\\"Origin:\"}]}"

json_output = create_json(custom_id, json_body)
print(json_output)

This code first validates the input json_body by loading it into a dictionary using json.loads(). Then, it creates the final API request with proper escaping by constructing a dictionary with the required keys and values, and converting it to a JSON string using json.dumps().


Instructed a few times to make the AI understand, but I didn’t actually run.

OK, I figured it out. If the data.jsonl has the encoding UTF-8-BOM then you will get the error Invalid file format for Batch API. Must be .jsonl when uploading, which is clearly a bug with the file handling OpenAI does. Otherwise, this should be clearly stated in the documentation.

→ MAKE SURE THE FILE ENCODING IS UTF-8 WITHOUT A BOM.

Others things I also changed, but not sure if they are necessary:

  • Make sure the file uses unix line separator (LF instead of CR LF)
  • Make sure the file does not contain any emtpy lines