I’m using a very large response model with many different nullable fields. I want to skip generating the nullable fields. I added the required fields to the “required” field and this is correctly skipping the nullable fields if returning a single object. The issue is when I try to return a list of said object. For whatever reason it will no longer respect the “required” field and instead return everything regardless of whether it’s null or not. The ultimate problem with this is not only the cost of generating all of the unnecessary null fields but the time it takes to generate them all.
When using gpt-3.5-turbo-16k-0613, 90% of the time it explicitly returns all the null values like so:
"{\n \"tickets\": [\n {\n \"ticketId\": \"1\",\n \"offenderName\": \"{redacted}\",\n \"lawEnforcementAgency\": \"Alameda County\",\n \"violationDate\": \"03032018\",\n \"docketNumber\": \"WWM0001451534\",\n \"primaryViolationDescription\": \"CVC 216555b I\",\n \"caseNumber\": \"\",\n \"violationIds\": [],\n \"totalFineAmount\": \"\",\n \"dueDate\": \"\",\n \"status\": \"\",\n \"disposition\": [],\n \"ocPayNumber\": \"\",\n \"pleaDate\": \"\",\n \"lawEnforcementOfficer\": \"\",\n \"retainedAttorney\": \"\",\n \"sentenceDate\": \"\",\n \"bail\": \"\",\n \"bonds\": \"\",\n \"caseReport\": \"\",\n \"nextCourtDate\": \"\",\n \"agency\": \"\",\n \"drNumber\": \"\",\n \"arrestDate\": \"\",\n \"charge\": \"\",\n \"custodyStatus\": \"\",\n \"citationFilingType\": \"\",\n \"citationFilingDate\": \"\",\n \"orderedBail\": \"\",\n \"postedBail\": \"\",\n \"nextAction\": \"\",\n \"warrantType\": \"\",\n \"probationType\": \"\",\n \"sentenceConvictedDate\": \"\",\n \"fineAndPenalty\": \"\",\n \"restitutionFine\": \"\",\n \"chargeSeverity\": \"\",\n \"chargeDescription\": \"\",\n \"probationStatus\": \"\",\n \"relatedCases\": \"\",\n \"otherCases\": [],\n \"actions\": [],\n \"fineInformation\": []\n },\n {\n \"ticketId\": \"2\",\n \"offenderName\": \"{redacted}\",\n \"lawEnforcementAgency\": \"Alameda County\",\n \"violationDate\": \"12152022\",\n \"docketNumber\": \"FHJ0002159428\",\n \"primaryViolationDescription\": \"CVC 22349a I\",\n \"caseNumber\": \"\",\n \"violationIds\": [],\n \"totalFineAmount\": \"\",\n \"dueDate\": \"\",\n \"status\": \"\",\n \"disposition\": [],\n \"ocPayNumber\": \"\",\n \"pleaDate\": \"\",\n \"lawEnforcementOfficer\": \"\",\n \"retainedAttorney\": \"\",\n \"sentenceDate\": \"\",\n \"bail\": \"\",\n \"bonds\": \"\",\n \"caseReport\": \"\",\n \"nextCourtDate\": \"\",\n \"agency\": \"\",\n \"drNumber\": \"\",\n \"arrestDate\": \"\",\n \"charge\": \"\",\n \"custodyStatus\": \"\",\n \"citationFilingType\": \"\",\n \"citationFilingDate\": \"\",\n \"orderedBail\": \"\",\n \"postedBail\": \"\",\n \"nextAction\": \"\",\n \"warrantType\": \"\",\n \"probationType\": \"\",\n \"sentenceConvictedDate\": \"\",\n \"fineAndPenalty\": \"\",\n \"restitutionFine\": \"\",\n \"chargeSeverity\": \"\",\n \"chargeDescription\": \"\",\n \"probationStatus\": \"\",\n \"relatedCases\": \"\",\n \"otherCases\": [],\n \"actions\": [],\n \"fineInformation\": []\n }\n ]\n}"
When using GPT-4 I get what I want (only the non null fields):
"{\n \"tickets\": [\n {\n \"offenderName\": \"{redacted}\",\n \"ticketId\": \"DN35168\",\n \"docketNumber\": \"WWM0001451534\",\n \"violationDate\": \"03032018\",\n \"violationIds\": [\"CVC 216555b\"],\n \"status\": \"I\"\n },\n {\n \"offenderName\": \"{redacted}\",\n \"ticketId\": \"JQ73467\",\n \"docketNumber\": \"FHJ0002159428\",\n \"violationDate\": \"12152022\",\n \"violationIds\": [\"CVC 22349a\"],\n \"status\": \"I\"\n }\n ]\n}"
Is there something I can do in my config to consistently get only the non null fields with gpt-3.5-turbo?
Here’s my config:
messages = [
{
"role": "system",
"content": "You are a text extraction machine. You will be given a large body of text and I need you to parse the given text and extract the traffic ticket information. There may be one or many tickets. Each field is nullable. Do not return null fields/arguments.",
},
{
"role": "user",
"content": htmlText,
}]
response = openai.ChatCompletion.create(
model="gpt-4-0613",
messages=messages,
functions=[
{
"name": "get_traffic_ticket_data",
"description": "return list of traffic tickets",
"parameters": {
"properties": {
"tickets": {
"items": {
"properties": {
"ticketId": {"type": "string", "description": "ID of the ticket."},
"offenderName": {"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
], "description": "Name of the offender."},
"lawEnforcementAgency": {"type": "string", "description": "Name of the law enforcement agency."},
"court": {"type": "object", "description": "Court information."},
"violationDate": {"type": "string", "description": "Date of the violation."},
"dueDate": {"type": "string", "description": "Due date for the ticket."},
"status": {"type": "string", "description": "Status of the ticket."},
"disposition": {"type": "array", "items": {"type": "string"}, "description": "array of dispositions."},
"docketNumber": {"type": "string", "description": "Docket number."},
"violationIds": {"type": "array", "items": {"type": "string"},
"description": "array of violation IDs."},
"totalFineAmount": {"type": "string", "description": "Total amount of the fine."},
"primaryViolationDescription": {"type": "string",
"description": "Description of the primary violation."},
"caseNumber": {"type": "string", "description": "Case number."},
"ocPayNumber": {"type": "string", "description": "OC Pay number."},
"pleaDate": {"type": "string", "description": "Date of the plea."},
"lawEnforcementOfficer": {"type": "string", "description": "Name of the law enforcement officer."},
"retainedAttorney": {"type": "string", "description": "Name of the retained attorney."},
"sentenceDate": {"type": "string", "description": "Date of the sentence."},
"bail": {"type": "string", "description": "Amount of bail."},
"bonds": {"type": "string", "description": "Details about bonds."},
"caseReport": {"type": "string", "description": "Case report details."},
"nextCourtDate": {"type": "string", "description": "Next date for court hearing."},
"agency": {"type": "string", "description": "Name of the agency."},
"drNumber": {"type": "string", "description": "DR Number."},
"arrestDate": {"type": "string", "description": "Date of arrest."},
"charge": {"type": "string", "description": "Details about the charge."},
"custodyStatus": {"type": "string", "description": "Status of custody."},
"citationFilingType": {"type": "string", "description": "Type of citation filing."},
"citationFilingDate": {"type": "string", "description": "Date of citation filing."},
"orderedBail": {"type": "string", "description": "Amount of ordered bail."},
"postedBail": {"type": "string", "description": "Amount of posted bail."},
"nextAction": {"type": "string", "description": "Details of the next action."},
"warrantType": {"type": "string", "description": "Type of warrant."},
"probationType": {"type": "string", "description": "Type of probation."},
"sentenceConvictedDate": {"type": "string", "description": "Date of sentence conviction."},
"fineAndPenalty": {"type": "string", "description": "Details about fine and penalty."},
"restitutionFine": {"type": "string", "description": "Amount of restitution fine."},
"chargeSeverity": {"type": "string", "description": "Severity of the charge."},
"chargeDescription": {"type": "string", "description": "Description of the charge."},
"probationStatus": {"type": "string", "description": "Status of probation."},
"relatedCases": {"type": "string", "description": "Details about related cases."},
"otherCases": {
"type": "array",
"items": {
"type": "object",
"properties": {
"caseNumber": {"type": "string"},
"filedDate": {"type": "string"},
"charges": {"type": "string"},
"nextHearing": {"type": "string"},
"jurisdiction": {"type": "string"},
"status": {"type": "string"}
}
},
"description": "array of other related cases."
},
"actions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"actionDate": {"type": "string"},
"actionText": {"type": "string"},
"disposition": {"type": "string"},
"hearingType": {"type": "string"}
}
},
"description": "array of actions related to the ticket."
},
"fineInformation": {
"type": "array",
"items": {
"type": "object",
"properties": {
"dateToPay": {"type": "string"},
"firstPayment": {"type": "string"},
"priorNSF": {"type": "string"},
"paymentAmount": {"type": "string"},
"lastPayment": {"type": "string"},
"fineNumber": {"type": "string"},
"fineType": {"type": "string"},
"fineDescription": {"type": "string"},
"originalAmount": {"type": "string"},
"paidToDate": {"type": "string"},
"currentDueTotal": {"type": "string"}
}
},
"description": "array of fine information related to the ticket."
}
},
"description": "properties of a given traffic ticket. Only include non null fields.",
"title": "TicketItem",
"type": "object",
"required": []
},
"title": "Tickets",
"type": "array",
"required": []
}
},
"title": "TrafficTicketData",
"type": "object",
"required": []
}
}
],
function_call={
"name": "get_traffic_ticket_data",
},
stream=False
)