Generating synthetic logs while constrained to function call

I am trying to generate synthetic data. Here’s my code

import json
import csv
import json
import openai
from pydantic import BaseModel

user_id = "1234"
age = "24"
job = "construction worker"


class Location(BaseModel):
    dt: str
    user_id: str
    location_category: str
    subtype: str
    minutes: int


messages = [
    {
        "role": "system",
        "content": """You are an AI assistant that follows instruction extremely well. Help as much as you can. Respect the desired output above all else.
    Generate believable logs for a user, based on their age and job, incorporating a variety of location categories, subtypes and minutes spent.
    These dates ('dt`) should span between the dates given, but you can miss entire days randomly.
    Each day should have 2-3 records ranging from 15 to 240 minutes but always try to include a "Home" location category for the day.

    Possible options for location categories and subtypes are: 
    {
    "Home": ["House", "Apartment", "Dormitory"],
    "Work/School": ["Office", "School", "University", "Workplace"],
    "Outdoor Locations": ["Parks", "Streets", "Wilderness", "Beaches"],
    "Commercial Places": ["Malls", "Shops", "Restaurants", "Cinemas"],
    "Public Institutions": ["Libraries", "Museums", "Government Buildings"],
    "Transportation": ["Buses", "Trains", "Airplanes", "Cars"],
    "Leisure Facilities": ["Gyms", "Clubs", "Sports Arenas", "Theaters"],
    "Healthcare Facilities": ["Hospitals", "Clinics", "Nursing Homes"],
    "Religious Places": ["Churches", "Temples", "Mosques", "Synagogues"]
    }

    """,
    },
    {
        "role": "user",
        "content": f"Please produce a DAILY log for `dt` between 2023-11-01 and 2023-12-01 about the following user: user_id: {user_id}, age: {age}, job: {job} ",
    },
]


response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    functions=[
        {
            "name": "get_json_response",
            "description": "get exact json output",
            "parameters": Location.model_json_schema(),
        }
    ],
    function_call={"name": "get_json_response"},
)

print(response)

output = json.loads(response.choices[0]["message"]["function_call"]["arguments"])
print(output)

is there a way that I can output multiple records that fit the schema?

What might work well in this situation is adding a few sample exchanges in the history before giving it the real question (multi-shot prompting). The assistant is very good at making new responses that resemble the samples.

I actually ended up getting it to work! I updated the function to

function = {
        "name": "locations_output",
        "description": "A function that takes in a list of arguments related to logs and extracts it",
        "parameters": {
            "type": "object",
            "properties": {
                "location_logs": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "user_id": {
                                "type": "string",
                                "description": "The user_id provided in the call",
                            },
                            "dt": {
                                "type": "string",
                                "description": "The date of the log, which must be in the format of YYYY-mm-dd",
                            },
                            "location_category": {
                                "type": "string",
                                "description": "the location category from one of options listed",
                            },
                            "subtype": {
                                "type": "string",
                                "description": "the subtype from one of options listed",
                            },
                            "minutes": {
                                "type": "string",
                                "description": "the minutes spent at each location",
                            },
                        },
                    },
                }
            },
            "required": ["location_logs"],
        },
    }

but now, I run into a situation where it just fails to follow instruction for something that seems to basic haha. I want to generate records across a date range but it seems to only produce them for a small subset.