Newlines in batch mode prompts

How do I convert newlines in a prompt when using the batch api?

I have been submitting prompts to the synchronous chat completion api and now I would like to use the async batch api. I have been constructing prompts with paragraph structure, for example:

prompt = f"""
Given the following transcript:

{transcript_text}

Write an outline summarizing the content. The outline should be in time order and follow a pattern similar to: 

I) Agenda
   A) Agenda item 1:
      1) Time allocation
      2) Discussion topics
      3) Decision made
  B) Agenda item2:
      1) Time allocation
      2) Discussion topics
      3) Decision made
  ...
"""

When I submit this to the batch api in jsonl format, I need to remove the newlines from the prompt. I see two options, replace the newlines with a space, or replace the newlines with an escaped newline ‘\n’. I am experimenting with both but was hoping for a definitive answer here.

Thank you,

Matt

Batch mode takes a JSONL file. https://jsonlines.org/

“lines” means no plaintext linefeeds. You need the escapement in values and in stripping whitespace from the JSON.

You would do this with JSON libraries as is appropriate.

Starting with a typical dictionary that you’d pass as **kwargs to the SDK, here’s some bot talk:


Explanation of Questions

  1. How to handle multi-line JSON?
    Multi-line strings and pretty-formatted JSON with linefeeds need to be normalized for JSONL format. JSONL requires each JSON object to be a single line, so we need to strip out extra whitespace and ensure all newline characters (\n) in strings are properly escaped.

  2. Does Python need a specific JSONL library?
    No, Python’s json module is sufficient to create JSONL files. The key is to:

    • Use json.dumps() to serialize each dictionary to a JSON string.
    • Write these strings line by line to the file.

Example: Converting Multi-line JSON into JSONL

Here’s how to achieve this in Python:

Step 1: Create Input Data

Start with two dictionaries representing API parameters. Assume that the user inputs in messages contain multi-line strings.

# Input dictionaries with multi-line user content
request_1 = {
    "custom_id": "request-1",
    "method": "POST",
    "url": "/v1/chat/completions",
    "body": {
        "model": "gpt-3.5-turbo-0125",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello,\nworld!\nHow are you today?"}
        ],
        "max_tokens": 1000
    }
}

request_2 = {
    "custom_id": "request-2",
    "method": "POST",
    "url": "/v1/chat/completions",
    "body": {
        "model": "gpt-3.5-turbo-0125",
        "messages": [
            {"role": "system", "content": "You are an unhelpful assistant."},
            {"role": "user", "content": "Goodbye,\ncruel\nworld!"}
        ],
        "max_tokens": 1000
    }
}

# Place the requests in a list
requests = [request_1, request_2]

Step 2: Generate JSONL File

To create a JSONL file:

  • Serialize each dictionary with json.dumps().
  • Escape newlines within strings.
  • Write each serialized object as a line in the file.
import json

# File to write to
jsonl_file_path = "batch_requests.jsonl"

# Open the file for writing
with open(jsonl_file_path, "w", encoding="utf-8") as jsonl_file:
    for request in requests:
        # Convert dictionary to JSON string
        json_line = json.dumps(request, separators=(',', ':'))
        
        # Write JSON string as a line in the file
        jsonl_file.write(json_line + '\n')

print(f"JSONL file created at {jsonl_file_path}")

Output JSONL File

The contents of batch_requests.jsonl will be:

{"custom_id":"request-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-3.5-turbo-0125","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello,\nworld!\nHow are you today?"}],"max_tokens":1000}}
{"custom_id":"request-2","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-3.5-turbo-0125","messages":[{"role":"system","content":"You are an unhelpful assistant."},{"role":"user","content":"Goodbye,\ncruel\nworld!"}],"max_tokens":1000}}

Key Notes

  1. Escaping Newlines:
    Python’s json.dumps() automatically escapes newline characters (\n) inside string values, ensuring that the JSONL file remains valid.

  2. Compact Formatting:
    The separators=(',', ':') argument in json.dumps() removes unnecessary spaces, making the JSON strings compact and suitable for JSONL format.

  3. File Handling:
    Always open files in text mode with UTF-8 encoding to handle special characters properly.

1 Like