Enum restrictions in strict JSON schemas for responses API

I’m getting an error when using the responses API:

Error: 400 Invalid schema for response_format ‘navu-question-answer-v1’: In context=(‘properties’, ‘topic’), " is not allowed in string literals for structured outputs (strict=true).

In this case, I’m requesting a JSON response and providing a schema. One of the fields in the response is a string and the schema for that field specifies the allowed values via an “enum” specification. One of those enums has double-quotes (“) in it.

I cannot find any documentation about what restrictions are imposed on enum fields in a JSON response. This one seems like a bad one to try to guess about. Which punctuation is acceptable?

I can guess about it, the reason for quote prohibition in enum strings:

The context-free grammar output enforcement of logits likely has no state to track whether an escape character was used in an enum by looking at tokens from a reduced set of logits that can be allowed.

If in a string (which is freeform and allows the AI to go all wrong), another AI-produced quote character closes the string, then back to the grammar of JSON.

Enum however is a strict token run enforcement and things start to get hairy when you allow a quote that may or may not be a closure and what token the AI might predict.

Observe the varying joining of byte pairings:
Untitled

or then quotes with AI-produced escapes even:

In the last case, you’ve got a JSON string that should have been closed by a logit that was both the string content and the container.

1 Like

Yes. I can’t complain about restrictions with the enums. I just wish they were documented. Based on your answer, I guess it makes sense that double-quote is special because the value is surrounded by double-quotes in the JSON itself. Hopefully the restrictions they’ve imposed at the schema level are no more aggressive than that. (In my case, the enumerated values are provided by my customer rather than in my code. So if I need to impose equivalent restrictions, I’d like to know what they all are.)

The AI must also be able to make sense of the choices it has to select from, and they need to have “writeability” and probability that makes them as likely as others if the correct choice.

One can immediately imagine an application where desired enums can’t be used:

“You are an AI that reports on string characters that can’t be used as key values in an OpenAI structured JSON schema”;
enum = ["\x00", "\x0a", '"', '\"'] # python

What, you don’t want to send a bunch of unicode code points and see what fails as an enum yourself? Okay, I’ll do a few hundred.

Send a schema

Vary the enum.

# Construct the json_schema with the current test_value as the enum
    json_schema = {
        "name": "ascii_test",
        "description": "A basic structured output response schema for an ASCII test with a fixed value",
        "strict": True,
        "schema": {
            "type": "object",
            "required": [
                "key"
            ],
            "properties": {
                "key": {
                    "type": "string",
                    "description": "The enum value under test which constrains the AI to one possible response",
                    "enum": [
                        test_value
                    ]
                }
            },
            "additionalProperties": False
        }
    }

Iterate over all the unicode single bytes (double quotes skipped),:


  {
    "character": "' '",
    "byte_value": 32,
    "supported": true
  },
  {
    "character": "'!'",
    "byte_value": 33,
    "supported": true
  },
  {
    "character": "'\"'",
    "byte_value": 34,
    "supported": false
  },
  {
    "character": "'#'",
    "byte_value": 35,
    "supported": true
  },

Prove no issue with sending characters of bytes 128-255 as latin1 or cp1252 representations:


[
  {
    "character": "'}'",
    "byte_value": 125,
    "supported": true
  },
  {
    "character": "'~'",
    "byte_value": 126,
    "supported": true
  },
  {
    "character": "\\x7f",
    "byte_value": 127,
    "supported": true
  },
  {
    "character": "\\x80",
    "byte_value": 128,
    "supported": true
  },...(successes continue)

The only “gotcha” seems to be any quote, or an unescaped linefeed “\x0a”

Quotes of any kind in an enum: Ew! :nauseated_face:

Use a localisation layer if you need quotes on output but keep the enums themselves “symbolic”.

When using the word, “enum”, you are thinking about the counterpart in the programming world. Who would ever use a double-quote within an enum that you declare! In this case, we’re talking about a JSON schema. Don’t think about it from the perspective of programming enums but, rather, that you are specifying in the schema that a field is only allowed to take certain specified values. In my case, these values are something that my customer gets to configure. I don’t mind restricting what my customers are allowed to choose for those values. But without any documentation, it was difficult to know how much I should restrict them. As it stands now, without documentation, we are going to restrict them from using quotes within those values. But if we needed to restrict them more aggressively, I wanted to know that.

1 Like