Batch mode takes a JSONL file. https://jsonlines.org/
âlinesâ means no plaintext linefeeds. You need the escapement in values and in stripping whitespace from the JSON.
You would do this with JSON libraries as is appropriate.
Starting with a typical dictionary that youâd pass as **kwargs to the SDK, hereâs some bot talk:
Explanation of Questions
-
How to handle multi-line JSON?
Multi-line strings and pretty-formatted JSON with linefeeds need to be normalized for JSONL format. JSONL requires each JSON object to be a single line, so we need to strip out extra whitespace and ensure all newline characters (\n
) in strings are properly escaped.
-
Does Python need a specific JSONL library?
No, Pythonâs json
module is sufficient to create JSONL files. The key is to:
- Use
json.dumps()
to serialize each dictionary to a JSON string.
- Write these strings line by line to the file.
Example: Converting Multi-line JSON into JSONL
Hereâs how to achieve this in Python:
Step 1: Create Input Data
Start with two dictionaries representing API parameters. Assume that the user inputs in messages
contain multi-line strings.
# Input dictionaries with multi-line user content
request_1 = {
"custom_id": "request-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-3.5-turbo-0125",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello,\nworld!\nHow are you today?"}
],
"max_tokens": 1000
}
}
request_2 = {
"custom_id": "request-2",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-3.5-turbo-0125",
"messages": [
{"role": "system", "content": "You are an unhelpful assistant."},
{"role": "user", "content": "Goodbye,\ncruel\nworld!"}
],
"max_tokens": 1000
}
}
# Place the requests in a list
requests = [request_1, request_2]
Step 2: Generate JSONL File
To create a JSONL file:
- Serialize each dictionary with
json.dumps()
.
- Escape newlines within strings.
- Write each serialized object as a line in the file.
import json
# File to write to
jsonl_file_path = "batch_requests.jsonl"
# Open the file for writing
with open(jsonl_file_path, "w", encoding="utf-8") as jsonl_file:
for request in requests:
# Convert dictionary to JSON string
json_line = json.dumps(request, separators=(',', ':'))
# Write JSON string as a line in the file
jsonl_file.write(json_line + '\n')
print(f"JSONL file created at {jsonl_file_path}")
Output JSONL File
The contents of batch_requests.jsonl
will be:
{"custom_id":"request-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-3.5-turbo-0125","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello,\nworld!\nHow are you today?"}],"max_tokens":1000}}
{"custom_id":"request-2","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-3.5-turbo-0125","messages":[{"role":"system","content":"You are an unhelpful assistant."},{"role":"user","content":"Goodbye,\ncruel\nworld!"}],"max_tokens":1000}}
Key Notes
-
Escaping Newlines:
Pythonâs json.dumps()
automatically escapes newline characters (\n
) inside string values, ensuring that the JSONL file remains valid.
-
Compact Formatting:
The separators=(',', ':')
argument in json.dumps()
removes unnecessary spaces, making the JSON strings compact and suitable for JSONL format.
-
File Handling:
Always open files in text mode with UTF-8 encoding to handle special characters properly.