Assistant Response is sometimes an invalid JSON Array

I’m using Assistants V2 Api to extract structured data from free-form text input.

I’ve created an assistant with the necessary instructions. I use the V4 model with temperature = .001 and TopP=1. I’ve configured the output to be JSON and the instructions specify that I want JSON output.

Normally it works fine.

But sometimes I see invalid JSON output that looks like this:

[
{ obj1 },

{ obj2}
]

Notice the “…” after the comma?

It’s not easily repeatable because usually if I run it with the same input again in a new thread, the output is valid.

Interesting output. I always add a json schema to the instructions and output examples with several objects in the array. works 100%

The instructions include “If the input contains information about multiple events, aggregate all the event JSON objects into an array of JSON objects. This array should be the final output.” and it works correctly over 99% of the time.

I also just saw a JSON response like this with backticks (note the " ```json "):

[
  {
    "property": value
  }
]

the backticks did not appear correctly in my previous post

```json
[
{
“property1”: value1,
“property2”: “value2”
}
]
```

I would just keep working on your prompt.

Here is an example of a schema I add at the end of my prompt. After examples.

{
“$schema”: “http://json-schema.org/draft-07/schema#”,
“type”: “object”,
“properties”: {
“reviews”: {
“type”: “array”,
“items”: {
“type”: “object”,
“properties”: {
“review_id”: {
“type”: “string”,
“description”: “Unique identifier for the review”
},
“text”: {
“type”: “string”,
“description”: “The review text”
},
“review_rating”: {
“type”: “integer”,
“description”: “Overall rating given by the reviewer”
},
“sentiment”: {
“type”: “integer”,
“description”: “Sentiment score of the review”
},
“dishes”: {
“type”: “object”,
“description”: “Specific dishes mentioned in the review along with their ratings”,
“properties”: {
“wine pairing events”: {
“type”: “integer”
},
“selection”: {
“type”: “integer”
},
“quick bite”: {
“type”: “integer”
},
“Spicy Pilgrim”: {
“type”: “integer”
}
}
},
“value”: {
“type”: “integer”,
“description”: “Rating for the value of the experience”
},
“food_quality”: {
“type”: “integer”,
“description”: “Rating for the quality of food”
},
“food_taste”: {
“type”: “integer”,
“description”: “Rating for the taste of food”
},
“atmosphere”: {
“type”: “integer”,
“description”: “Rating for the atmosphere”
},
“service”: {
“type”: “integer”,
“description”: “Rating for the service”
},
“journey”: {
“type”: “object”,
“description”: “Ratings for different stages of the dining journey”,
“properties”: {
“Booking”: {
“type”: “integer”
},
“Arrival”: {
“type”: “integer”
},
“Ordering”: {
“type”: “integer”
},
“Dining”: {
“type”: “integer”
},
“Payment”: {
“type”: “integer”
},
“PostVisit”: {
“type”: “integer”
}
}
}
},
“required”: [“review_id”, “text”, “review_rating”, “sentiment”, “dishes”, “value”, “food_quality”, “food_taste”, “atmosphere”, “service”, “journey”]
}
}
},
“required”: [“reviews”]
}

I think you should reverse the sampling parameters. Just top_p:0.0001 is needed.

Having elision within the output makes one think the AI is assuming it is producing examples and chats instead of production data.

The correct mindset:

“You are an automated backend processor assistant, where assistant writes a response sent directly to a validating RESTful API that only accepts JSON. Chat and markdown formatting is prohibited, and only the keys and data types specified in this description (or your schema) will be accepted. Any deviation from specification will produce an error…”