Output format from Tool randomly changing in C#

velocedge · May 12, 2024, 3:20pm

I’m trying to create questions and multiple choice answers from text content using the OpenAI API in c#. I’m using a Tool to format the output but it keeps changing the json structure for the question and answers from something like:

  "questions": [
    [
      "How does expressing gratitude and appreciation impact your well-being and life according to the content?",
      "It makes you feel more energetic and fuller of life",
      "It makes you more anxious",
      "It has no effect on your well-being",
      "It makes you feel tired and drained",
      "It makes you feel more energetic and fuller of life. Being grateful and expressing gratitude leads to better self-care and a sense of well-being."
    ]
  ]

To something like this:

 "questions": [
  {
   "question": "What is emphasized as a crucial factor in achieving elite athletic status according to the text?",
   "answers": [
    "Physical strength and agility",
    "Envy and jealousy",
    "Development of personal traits",
    "Comparing oneself to others"
   ]
 ]

The tool definition is unchanged between runs and I'm using "QuestionGeneration" in my tool-choice:

                var tool = new List<Tool>
                {
                new Function(
                    "QuestionGeneration",
                    "Generate a question from text",
                     new JsonObject
                     {
                         ["type"] = "object",
                         ["properties"] = new JsonObject
                         {
                             ["questions"] = new JsonObject
                             {
                                 ["type"] = "array",
                                 ["description"] = "An array of each individual question",
                                 ["items"] = new JsonObject
                                 {
                                     ["question"] = new JsonObject
                                     {
                                         ["type"] = "string",
                                         ["description"] = "The question derived from the provided text"
                                     },
                                     ["answers"] = new JsonObject
                                     {
                                         ["type"] = "array",
                                         ["description"] = "An array of possible answers that could answer the question",
                                         ["items"] = new JsonObject
                                         {
                                             ["name"] = "answers",
                                             ["type"] = "string",
                                             ["description"] = "One of the possible answers to the question."
                                         }
                                     },
                                     ["correctAnswer"] = new JsonObject
                                     {
                                         ["type"] = "number",
                                         ["description"] = "the index number identifying which of the items in the answers array is correct"
                                     },
                                     ["question_type_int"] = new JsonObject
                                     {
                                     ["type"] = "string",
                                     ["description"] = "The type of question that was created.",
                                     ["enum"] = new JsonArray { "multiple choice", "true or false", "yes or no", "fill in the blank", "multiple selection", "numeric" }
                                     },
                                     ["feedback"] = new JsonObject
                                     {
                                         ["type"] = "string",
                                         ["description"] = "a statement telling why the answer was correct"
                                     },
                                     ["competency"] = new JsonObject
                                     {
                                         ["type"] = "string",
                                         ["description"] = "the category or summary of the type of question that was generated"
                                     }
                                 }
                             }
                         },
                         ["required"] = new JsonArray { "questions", "question", "answers", "correctAnswer", "feedback" }
                     })
                };

So, what is wrong with my tool definition and how can I fix it so it returns a consistent format?

_j · May 12, 2024, 5:19pm

Here’s what it looks like if you don’t make it so complicated and just construct a JSON string, and we imagine how your ‘Function’ might work…like the AI had to after several goes and then showing it finally what a function actually looks like for inference.

toolspec.extend([{
        "type": "function",
        "function": {
            "name": "QuestionGeneration",
            "description": "Generate a question from text",
            "parameters": {
                "type": "object",
                "properties": {
                    "questions": {
                        "type": "array",
                        "description": "An array of each individual question",
                        "items": {
                            "question": {
                                "type": "string",
                                "description": "The question derived from the provided text"
                            },
                            "answers": {
                                "type": "array",
                                "description": "An array of possible answers that could answer the question",
                                "items": {
                                    "name": "answers",
                                    "type": "string",
                                    "description": "One of the possible answers to the question."
                                }
                            },
                            "correctAnswer": {
                                "type": "number",
                                "description": "the index number identifying which of the items in the answers array is correct"
                            },
                            "question_type_int": {
                                "type": "string",
                                "description": "The type of question that was created.",
                                "enum": ["multiple choice", "true or false", "yes or no", "fill in the blank", "multiple selection", "numeric"]
                            },
                            "feedback": {
                                "type": "string",
                                "description": "a statement telling why the answer was correct"
                            },
                            "competency": {
                                "type": "string",
                                "description": "the category or summary of the type of question that was generated"
                            }
                        }
                    }
                },
                "required": ["questions", "question", "answers", "correctAnswer", "feedback"]
            }
        }
    }]
)

I think you have a misunderstanding of what an array is. It is not a container for more properties. It just means the AI can make a list of things. This is the text the AI might be receiving as a tool specification:

# Tools

## functions

namespace functions {

// Generate a question from text
type QuestionGeneration = (_: {
// An array of each individual question
questions: any[],
}) => any;

} // namespace functions

## multi_tool_use

// This tool serves as a wrapper for utilizing multiple tools....(blabla)

“object” JSON data type serves as nesting. And you should make the names not repeat and be very descriptive. You cannot have a arbitrary length tool call that grows - you must still tell the AI how to make long strings or single nesting lists that grow in length within the schema specified.

velocedge · May 12, 2024, 6:57pm

I’m sure I have a misunderstanding about a lot of this but confused by your example. Everywhere that references the Tool definition (previously Function until that was deprecated… and worked perfectly well for my needs) it looks more like this:

{
  "name": "MyFunction",
  "description": "This is a sample function",
  "parameters": [
    {
      "name": "array",
      "type": "array",
      "description": "Two-dimensional array",
      "items": {
        "type": "array",
        "items": {
          "type": "number"
        }
      }
    }
  ],
  "output": {
    "type": "array",
    "items": {
      "type": "array",
      "items": {
        "type": "number"
      }
    }
  }
}

Obviously, this is for a two-dimensional array but they all have a similar format. So, I’m really confused by your example now.

Macha · May 12, 2024, 7:56pm

Hey there!

So, if you squint, you’ll actually recognize that this:

velocedge:

{
  "name": "MyFunction",
  "description": "This is a sample function",
  "parameters": [
    {
      "name": "array",
      "type": "array",
      "description": "Two-dimensional array",
      "items": {
        "type": "array",
        "items": {
          "type": "number"
        }
      }
    }
  ],
  "output": {
    "type": "array",
    "items": {
      "type": "array",
      "items": {
        "type": "number"
      }
    }
  }
}

is actually embedded inside the example code given to you.

This is what is called a JSON schema. It’s a format that you can use across all different kinds of things, like a set standard for what contents should be expected inside a JSON object.

You can see _j’s example more closely resembled what you’ve given by looking here:

_j:

            "name": "QuestionGeneration",
            "description": "Generate a question from text",
            "parameters": {
                "type": "object",
                "properties": {
                    "questions": {
                        "type": "array",
                        "description": "An array of each individual question",
                        "items": {
                            "question": {
                                "type": "string",
                                "description": "The question derived from the provided text"
                            },
                            "answers": {
                                "type": "array",
                                "description": "An array of possible answers that could answer the question",
                                "items": {
                                    "name": "answers",
                                    "type": "string",
                                    "description": "One of the possible answers to the question."
                                }
                            },
                            "correctAnswer": {
                                "type": "number",
                                "description": "the index number identifying which of the items in the answers array is correct"
                            },
                            "question_type_int": {
                                "type": "string",
                                "description": "The type of question that was created.",
                                "enum": ["multiple choice", "true or false", "yes or no", "fill in the blank", "multiple selection", "numeric"]
                            },
                            "feedback": {
                                "type": "string",
                                "description": "a statement telling why the answer was correct"
                            },
                            "competency": {
                                "type": "string",
                                "description": "the category or summary of the type of question that was generated"
                            }
                        }
                    }
                },
                "required": ["questions", "question", "answers", "correctAnswer", "feedback"]
            }

_j · May 12, 2024, 8:12pm

What I provided wasn’t a “correct” example, it was attempting to produce a JSON like your code might.

Here is a 2D array, in that the second level has more than one entry, but they are named and not also of arbitrary length:

toolspec.extend([{
        "type": "function",
        "function": {
            "name": "produce_quiz",
            "description": "The AI generates a multiple choice quiz as a JSON array (list) of questions and answers",
            "parameters": {
                "type": "object",
                "properties": {
                    "quiz_items": {
                        "type": "array",
                        "description": "an array of questions, answers, and answer key in the strict format specified",
                        "items": {
                            "type": "object",
                            "properties": {
                                "question": {
                                    "type": "string",
                                    "description": "The question the quiz item is about",
                                    },
                                "answers": {
                                    "type": "string",
                                    "description": "three possible answers, one randomly correct",
                                    },
                               "key": {
                                    "type": "string",
                                    "description": "The correct answer",
                                    },
                            },
                            "required": ["question", "answers", "key"]
                        },
                    },
                },
                "required": ["quiz_items"]
            },
        }
    }]
)

It is seen this way when it is placed into AI language context for the AI to understand what kind of JSON output to write, not damaged as yours was:

## functions

namespace functions {

// The AI generates a multiple choice quiz as a JSON array (list) of questions and answers
type produce_quiz = (_: {
// an array of questions, answers, and answer key in the strict format specified
quiz_items: Array<
{
// The question the quiz item is about
question: string,
// three possible answers, one randomly correct
answers: string,
// The correct answer
key: string,
}
>,
}) => any;

} // namespace functions

And that’s the crux: how well the AI can understand, especially when the language model is prediction a token at a time from the entire context of input.

Add another array (such as a number of multiple choice answer at a length the AI can choose) at your own peril.

velocedge · May 13, 2024, 12:44pm

Thanks guys! I definitely need some time to soak all this in. Lots-o-meetings today but I’ll be on it when I can.

velocedge · May 14, 2024, 6:17pm

Ok, that works pretty well… still need massage it a bit here and there but much better. Here’s what I ended up using:

string parm = @"
{
    ""type"": ""object"",
    ""properties"": {
        ""questions"": {
        ""type"": ""array"",
            ""description"": ""an array of questions, answers, feedback, and answer index in the strict format specified"",
            ""items"": {
                ""type"": ""object"",
                ""properties"": {
                        ""question"": {
                        ""type"": ""string"",
                        ""description"": ""The question the quiz item is about""
                            },
                    ""answers"": {
                        ""type"": ""string"",
                        ""description"": ""${qtext}""
                    },
                    ""correctAnswer"": {
                        ""type"": ""string"",
                        ""description"": ""The sequence number identifying which of the items in the answers is correct, start numbering with 0.""
                    },
                    ""feedback"": {
                        ""type"": ""string"",
                        ""description"": ""A description of why the correct answer is in fact correct.""
                    },
                    ""Competency"": {
                        ""type"": ""string"",
                        ""description"": ""The grouping or category the this question and its answers belong""
                    }

                },
            ""required"": [""question"",""answers"",""correctAnswer"",""feedback""]
            }
        }
    },
    ""required"": [""questions""]
}
";

parm = parm.Replace("${qtext}", qtext);

var tool = new List<Tool>
{
    new OpenAI.Function("QuestionGeneration","The AI generates a multiple choice quiz as a JSON array (list) of questions and answers", parm)
};

The qtext variable has text based on the type of question being asked for: multiple choice, true/false, multiple-selection, etc.

Appreciate the help.

Topic		Replies	Views
I want to get json format response which I can pass using gpt-4 model. Also I want to give my prompt to get json data Prompting gpt-4	14	21538	October 26, 2023
Codex knows Powershell and Azure API codex	8	1442	January 4, 2024
Partially structured output? Free text output, but force correct tool call JSON API structured-output	9	1843	October 8, 2024
Function Calling Help - Model Doesn't Seem To Accept Function Prompt? Prompting functions , function-calling	14	4551	February 10, 2024
Reverse-engineer the chart drawing of ChatGPT API chatgpt	11	1807	September 1, 2024

Output format from Tool randomly changing in C#

Related topics