BadRequestError Invalid schema for function

Hi all,

I am trying to take a resume (as text string) and make structured data out of it with open ai api.
e.g. I want an array for each work experience so I can use that for further processing.
I am combining a hard coded prompt with the resume text string and try to use function calling get a structured response in json. See code below. However, I am getting a bad request error with the following:

Error code: 400 - {‘error’: {‘message’: “Invalid schema for function ‘extract_values_from_resume’: In context=(‘properties’, ‘opleiding’), array schema missing items”, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

Here is my code:

from openai import OpenAI
from dotenv import load_dotenv
import os

dotenv_path = '/config/settings/.env' 
load_dotenv(dotenv_path) 
dotenv_path = '/config/settings/.env'
load_dotenv(dotenv_path)
api_key = os.getenv("OPENAI_API_KEY")
hardcoded_prompt = """
   Retrieve specified values from the source text. Indicate the absence of information with '#####'. Handle multiple data occurrences as arrays. Return answer as JSON object. Here is the source text:
   
   """

def prompt_open_ai(extracted_text):
    api_key = os.getenv("OPENAI_API_KEY")
    client = OpenAI(api_key=api_key)

    tools = [
        {
            "type": "function",
            "function": {
                "name": "extract_values_from_resume",
                "description": "Retrieve specified values from the source curriculum vitae and export according to the JSON schema",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "voornaam": {
                            "type": "string",
                            "description": "voornaam"
                        },
                        "achternaam": {
                            "type": "string",
                            "description": "achternaam"
                        },
                        "woonplaats": {
                            "type": "string",
                            "description": "woonplaats"
                        },
                        "profielomschrijving": {
                            "type": "string",
                            "description": "Stuk tekst waarin de persoon zichzelf beschrijft aan het begin van het cv"
                        },
                        "motivatie": {
                            "type": "string",
                            "description": "motivatie voor een specifieke baan of opdracht"
                        },
                        "opleiding": {
                            "type": "array",
                            "patternProperties": {
                                "een gevolgde opleiding": {
                                    "items": {
                                        "type": "object",
                                        "properties": {
                                            "naam": {
                                                "type": "string",
                                                "description": "naam van gevolgde opleiding"
                                            },
                                            "instituut": {
                                                "type": "string",
                                                "description": "naam van de school waar de opleiding is gevolgd"
                                            },
                                            "startjaar": {
                                                "type": "integer",
                                                "description": "eerste jaar van gevolgde opleiding"
                                            },
                                            "eindjaar": {
                                                "type": "integer",
                                                "description": "laatste jaar van gevolgde opleiding"
                                            }
                                        },
                                        "required": ["naam", "instituut", "startjaar", "eindjaar"]
                                    }
                                }
                            }
                        },
                        "certificering": {
                            "type": "array",
                            "patternProperties": {
                                "Beschrijving van een behaald certificaat": {
                                    "items": {
                                        "type": "object",
                                        "properties": {
                                            "naam": {
                                                "type": "string",
                                                "description": "naam van certificaat"
                                            },
                                            "instituut": {
                                                "type": "string",
                                                "description": "naam van de instantie waar het certificaat is behaald"
                                            },
                                            "eindjaar": {
                                                "type": "integer",
                                                "description": "jaar waarin certificering is behaald"
                                            }
                                        },
                                        "required": ["naam", "instituut", "eindjaar"]
                                    }
                                }
                            }
                        },
                        "werkervaring": {
                            "type": "array",
                            "patternProperties": {
                                "Beschrijving van een specifieke werkervaring": {
                                    "items": {
                                        "type": "object",
                                        "properties": {
                                            "startjaar": {
                                                "type": "integer",
                                                "description": "startjaar van werkervaring"
                                            },
                                            "eindjaar": {
                                                "type": ["integer", "string"],
                                                "description": "eindjaar van werkervaring. Zet 'heden' neer als de persoon hier momenteel nog werkzaam is."
                                            },
                                            "functietitel": {
                                                "type": "string",
                                                "description": "functietitel van werkervaring"
                                            },
                                            "bedrijf": {
                                                "type": "string",
                                                "description": "naam van bedrijf of organisatie waar deze werkervaring is opgedaan"
                                            },
                                            "plaats": {
                                                "type": "string",
                                                "description": "locatie van bedrijf of organisatie waar deze werkervaring is opgedaan"
                                            },
                                            "functieomschrijving": {
                                                "type": "string",
                                                "description": "Omschrijving van werkervaring, taken, verantwoordelijkheden, resultaten en overige informatie van deze werkervaring"
                                            }
                                        },
                                        "required": ["startjaar", "eindjaar", "functietitel", "bedrijf", "plaats", "functieomschrijving"]
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    ]
    response = client.chat.completions.create(
        model="gpt-4-1106-preview",
        tools=tools,
        tool_choice={"type": "function", "function": {"name": "extract_values_from_resume"}},
        #temperature=2,
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": "You are a machine that extracts specific bits of text and exports the exact quotes in JSON format."},
            {"role": "user", "content": hardcoded_prompt + extracted_text},
        ],
    )

    return response.choices[0].message.content
    print(response)

    return response

Any help would be appreciated, thanks.

@IntelliJJ I’ll do my best to provide feedback – your Schema is complex and it’s in Dutch, which I do not understand.

  1. Missing "required" property keyword:
    You should specify which, if any, parameter properties are “required”.
    In the example below, I’ve made all of them required. If none are required, then “required” should be an empty list.

  2. Verify implementation of "patternProperties":
    Your implementation of "patternProperties" looks odd. What exactly are you trying to achieve?
    The error you provided points me in the direction of the source of the bug being here.
    Quoting the error: In context=(‘properties’, ‘opleiding’), array schema missing items
    When you define opleiding as being of type array, you should follow it up with items. See an example below from JSON Schema’s docs:

{
  "type": "array",
  "items": {
    "type": "number"
  }
}

  1. Redundant response_format:
    When using tools in OpenAI Chat Completion, the response_format is automatically set to {"type": "json_object"}source. Therefore, you can remove this line of code: response_format={"type": "json_object"},.

  2. Provide example text:
    Could you provide an example input text you’re trying to extract data from?

Not directly addressing your question - but it seems that everybody generates their schemas from pydantic models - see JSON Schema - Pydantic

Some people then remove the ‘title’ fields from these schemas: Schemas for OpenAI functions parameters

Hi @jacob3 , thanks for the thoughts, I will try my best to answer.

For context it is good to know I am trying to extract 4 types of data from the provided resume;
-key value pairs for personal info like names and phone numbers
-for each education I want to extract some details
-for each certificate I want to extract some details
-for each work experience I want to extract some details
For the last 3 I am specifying what details are required (main identifyers for each)

Now as to your questions;

  1. I made everything required and want the LLM to put a placeholder in place to avoid empty arrays.
  2. Well my reasoning was to use patternProperties (e.g. every job experience is a pattern) to explain to the LLM that I want the specified information for every e.g. education. In hindsight I am probably mixing up LLM instructions and python functions breaking the code.
    Thanks for the link to the docs about JSON Schema’s, Ill look into that.
  3. Right, didnt know that. Thanks.
  4. Unfortunately I cant (its personal data). For input I upload a word document curriculum vitae, extract all text. That string is my input.

Thanks for your time, I think the answer might lay in #2.

@zzbbyy Ill check it out, thanks for the link.