I am using the run_grader API function to test my graders. Let’s look at what we can do according to the documentation:
curl -X POST https://api.openai.com/v1/fine_tuning/alpha/graders/run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"grader": {
"type": "score_model",
"name": "Example score model grader",
"input": [
{
"role": "user",
"content": "Score how close the reference answer is to the model answer. Score 1.0 if they are the same and 0.0 if they are different. Return just a floating point score\n\nReference answer: {{item.reference_answer}}\n\nModel answer: {{sample.output_text}}"
}
],
"model": "gpt-4o-2024-08-06",
"sampling_params": {
"temperature": 1,
"top_p": 1,
"seed": 42
}
},
"item": {
"reference_answer": "fuzzy wuzzy was a bear"
},
"model_sample": "fuzzy wuzzy was a bear"
}'
The documentation states:
model_sample
string
Required
The model sample to be evaluated. This value will be used to populate the
samplenamespace. See the guide for more details. Theoutput_jsonvariable will be populated if the model sample is a valid JSON string.
So, and I got that to work. When I supply a string, output_text is populated. When I supply a valid JSON as String, the output_json variable is populated as well as the output_text. So far, so good. But what (string(?)) do I have to supply to “model_sample“ so that output_tools will be populated? I naively tried to supply a string of an array of tool calls compliant to the Chat Completions API, but it didn’t work.