GPT3.5 JSON output format

Hello Community,
I have a prompt and i want it to provide me always with JSON response but using gpt3.5 model it outputs some other words plus the json object .
I would ask is there a consistent solution to force gpt models to output only a JSON object ?
notice that the prompt is working fine with gpt4 but with gpt3.5 it makes some problems sometimes

Consider incorporating an example of the desired output format in your instructions to clearly communicate your expectations and direct it to consistently use that format. This approach often leads to improved outcomes. If you continue to encounter issues, implementing a response validation feature could further enhance accuracy by checking the outputs against your criteria.

This is supposed to guarantee the JSON format response: https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format

this is posted on that link:
Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length" , which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.

Yep, that’s is also needed to be sure it is a valid JSON you get from external service, never trust external sources in your code

I tried this but it is not working i think they have removed this option

Should be available for “turbo” models, but didn’t try

Put in your system prompt something like: “You will reply only with the JSON itself, and no other descriptive or explanatory text”, and also set the JSON format option on the API request object. That has never failed for me, although I’m always using the GPT-4 models and never the GPT-3.5 ones.

What is the solution to tackle this issue @darcschnider ? So that we can get all the data in json without getting the response cut ?

Understanding the complexity of your query is essential.

It entails evaluating the scale of the data retrieval and determining the maximum token lengths for input and output. This evaluation is crucial for estimating the data volume per transaction.

Subsequently, a chunking mechanism needs to be devised to ensure that data retrieval adheres to these constraints. This entails developing a mechanism that evaluates the size of each data pull against the maximum allowable limit and divides it into appropriately sized chunks. It is imperative to ensure that these chunks are delineated at the conclusion of a coherent dataset to maintain data integrity. (In some cases you may want overlap to keep context understanding)

When dealing with JSON data, it’s imperative to delineate splits at the End of Transaction (EOT) markers (depending on your structure you can provide the ai itself with the barebones structure for better understand of the message structure. This ensures that each chunk represents a complete dataset, facilitating streamlined processing. Following each chunking operation, reassessment of the data size is necessary. If the data remains excessively large, iterative processes such as data refinement through selective word removal or utilization of optimized data structures can be employed. or you can reprocess the results as a whole again to understand and narrow down the summary / results. NLP can also be used to reduce data size along the way, or vectors etc…

Furthermore, parallel AI processing, facilitated by asynchronous methodologies, offers an effective means to expedite data processing. By leveraging asynchronous processing, each chunk can be processed concurrently, optimizing throughput within the defined limits of the AI model, be it per transaction, per hour, or as otherwise specified. This is what keeps large data responses fast in that all chunks processed at the sametime optimizes the total time for response.

Hope that gives you one way to do it. I been playing with so many variations of handling data and as I learn along the way and study / research there are new ways always coming. I love to optimize things when I can. If there is a better way I am always ears :slight_smile: @kzainul22

1 Like

This is a really useful response. Can you possibly provide an example of how you’ve done this?

I’m trying to do exactly this but I don’t even know where to look (a link to the right reference would also be appreciated).

Thanks :slight_smile:

Hey Nick,

When dealing with JSON data from a retrieval-augmented generation (RAG) system, the key is ensuring you split the data at logical boundaries, such as at closing brackets or specific transaction markers, to maintain the integrity of the JSON structure.

Here’s how I would approach it:

  1. Identify Logical Break Points: You want to split at natural boundaries like closing curly braces } or square brackets ]. These are safe points where the data structure concludes without breaking the validity of the JSON format.
  2. Buffering Data: Ensure you have enough content in each chunk to avoid breaking the JSON structure. This involves reading enough of the data before splitting to ensure validity.
  3. Maintaining Context: Overlap or passing metadata with each chunk might be necessary to ensure the system keeps context across multiple chunks.
  4. Validation: After splitting, it’s important to validate the JSON structure in each chunk to prevent issues when reassembling the data.

Example with Size Limits

In this example, we’ll handle a situation where a chunk reaches the size limit, demonstrating how the data is split and what happens when limits are reached. This version assumes the JSON is in the form of a list of objects:

python

Copy code

import json

def split_json(data, max_size):
    chunks = []
    current_chunk = []
    current_size = 0

    for item in data:
        # Calculate the size of the next item as a string
        item_size = len(json.dumps(item))  # Estimate the size of the current item
        
        # Check if adding the item would exceed the max_size
        if current_size + item_size <= max_size:
            current_chunk.append(item)
            current_size += item_size
        else:
            # If the chunk size limit is reached, save the current chunk and reset
            print(f"Chunk reached limit: {current_size} bytes")
            chunks.append(current_chunk)  # Save the current chunk
            
            # Start a new chunk with the current item
            current_chunk = [item]
            current_size = item_size

    # Add any remaining items as the last chunk
    if current_chunk:
        print(f"Final chunk size: {current_size} bytes")
        chunks.append(current_chunk)

    return chunks

# Example usage:
json_data = [
    {"id": 1, "name": "John"}, 
    {"id": 2, "name": "Jane"},
    {"id": 3, "name": "Doe"}, 
    {"id": 4, "name": "Alice"},
    {"id": 5, "name": "Bob"}
]

# Setting a low max size for demonstration purposes (in bytes)
max_chunk_size = 50

# Splice the JSON into chunks that respect the size limit
json_chunks = split_json(json_data, max_chunk_size)

# Display chunks and their sizes
for i, chunk in enumerate(json_chunks):
    chunk_str = json.dumps(chunk, indent=2)
    print(f"Chunk {i + 1} (size: {len(chunk_str)} bytes):\n{chunk_str}\n")

Explanation:

  1. Logic: The function splits the JSON data into chunks by estimating the size of each item (in bytes). If adding the next item would exceed the specified max_chunk_size, the current chunk is saved, and a new chunk is started.
  2. Handling Limits: The function checks whether adding another item would exceed the size limit and prints a message when the limit is reached. It then processes the next chunk.
  3. Final Chunk: Any remaining items that don’t exceed the size limit are added to the final chunk.

Output Example:

plaintext

Copy code

Chunk reached limit: 48 bytes
Final chunk size: 44 bytes
Chunk 1 (size: 92 bytes):
[
  {
    "id": 1,
    "name": "John"
  },
  {
    "id": 2,
    "name": "Jane"
  }
]

Chunk 2 (size: 88 bytes):
[
  {
    "id": 3,
    "name": "Doe"
  },
  {
    "id": 4,
    "name": "Alice"
  }
]

Chunk 3 (size: 48 bytes):
[
  {
    "id": 5,
    "name": "Bob"
  }
]

Considerations:

  • The system effectively splits the data while respecting size constraints, ensuring each chunk is a valid JSON structure.
  • This method maintains integrity and context within each chunk, ensuring no partial or invalid JSON data is produced.
  • You can customize this method to fit your needs, such as adapting it for more complex JSON structures or adding metadata to help reassemble the chunks later.

This approach should help you handle JSON in your RAG system effectively while ensuring that the data remains intact and within the required size limits. Let me know how it works out!

This does not show how I include it with ai processing or how I handle the summaries etc. that is a whole other part as well validators etc. but it will lead you the right direction. this was a simple example GPT generated. also GPT4o-mini is what you want to use now as GPT3.5 is discontinued and well the new models has 128k so you can expand your chunks a lot larger now to do less calls.

ps. this is not exactly how I do it, but it will get you started. Also don’t be afraid to ask chat gpt 4 or 4o on these matters. it can help you flesh out a better method from this based on your own structures that way you dont need to figure out your own plan if you have complex systems. GPT 4o is a great coder compared to 2021 lol.

1 Like

That is incredibly helpful. Many thanks.

This is what I’ve done to implement this in a crude way based upon lines of text rather than chunk sizes (helpful for others in the same position) and it’s working.

def getConceptsFromFile(transcript):
    
    #split the transcript into chunks of ten lines.
    lines=transcript.splitlines()

    chunk_size = 5

    chunks = [lines[i:i + chunk_size] for i in range(0, len(lines), chunk_size)]
    
    combined_data = []

    #sends each chunk of the textfile to gpt to get concept:sentence json, combines responses, and outputs
    for chunk_set in chunks:
        chunk = " ".join(chunk_set)
        print("Sending chunk: " + chunk)
        json_data = get_JSON_from_string(chunk)
        data = json.load(json_data)
        # If the data is a list, extend the combined_data list
        if isinstance(data, list):
            combined_data.extend(data)
        # If the data is a dictionary, append it as an element of the combined_data list
        elif isinstance(data, dict):
            combined_data.append(data)

    with open('combined_output.json', 'w') as output_file:
        json.dump(combined_data, output_file, indent=4)

    return combined_data

#note that we need to chunk our file
def get_JSON_from_string(chunk):

    response = client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        response_format={"type": "json_object"},
        temperature=0,
        messages=[{"role": "system", "content": "You are helping me to find all of the concepts in a transcript."},
            {"role": "user", "content": "This is a transcript from a design session. Please read it then read it a second time and a third time."},
            {"role": "user", "content": chunk},
            {"role": "user", "content": 'My query.'}],
        seed=1335,
        top_p=0.0001
    )

    message = response.choices[0].message.content

    print(message)

    return message
1 Like