Assistant API: Best way to retrieve data from GPT4 in fixed JSON format?

Hi, I want to use GPT4 API to retrieve key information from a lot of documents. These documents have more or less the same format.

I want to get the results in fixed JSON format. What is the best way to do this?

I have read the docs on function calling, with it I can predefine my desired JSON format, but it requires me to submit tool output to the API. I don’t have anything to submit (as I just want the result in fixed format), can I just submit a random value and ask GPT ignore it?

There are two things you can use. The straight forward one is to use Pydantic. If you give your desired output to chatgpt, it will give you the kind of pydantic format required. You can use that format to get the right format from the assistant

Here’s an example

from enum import Enum
from typing import List, Dict, Literal
from pydantic import BaseModel

class IntentType(str, Enum):
GET_CURRENT_WEATHER = “get_current_weather”
GET_FORECAST = “get_forecast”
GET_WEATHER_ALERTS = “get_weather_alerts”
GET_TEMPERATURE = “get_temperature”

class Intent(BaseModel):
intent_type: IntentType
message: str

class GetCurrentWeatherIntent(Intent):
intent_type: Literal[IntentType.GET_CURRENT_WEATHER] = IntentType.GET_CURRENT_WEATHER

class GetForecastIntent(Intent):
intent_type: Literal[IntentType.GET_FORECAST] = IntentType.GET_FORECAST

class GetWeatherAlertsIntent(Intent):
intent_type: Literal[IntentType.GET_WEATHER_ALERTS] = IntentType.GET_WEATHER_ALERTS

class GetTemperatureIntent(Intent):
intent_type: Literal[IntentType.GET_TEMPERATURE] = IntentType.GET_TEMPERATURE

class MessageIntentClassifier(BaseModel):
message: str

def classify(self) -> List[Dict[str, str]]:
    previous_messages = get_last_dialogues()
    current_weather = str(get_current_weather())
    system_message = {"role": "system",
                      "content": f"You have access to the current weather - {current_weather}, and also the previous conversation - ({previous_messages}). Use that and the user message to divide the message into multiple intents. It can be get current weather, get forecast, get weather alerts, or get temperature. Organize messages into these categories. Each category can have one or multiple messages. Reproduce the messages as is. Output JSON. The queries are in natural language, and it is your job as an expert to figure out the intent of the user."}

    response = client.chat.completions.create(
        messages=[system_message, {"role": "user", "content": self.message}],
        temperature=0.5,
        model="gpt-4-0125-preview",
        response_format={"type": "json_object"},
        max_tokens=300
    )

    intents = json.loads(response.choices[0].message.content)
    filtered_intents = {key: value for key, value in intents.items() if value}
    print(f"filtered_intents are {filtered_intents}")

    intent_list = []
    for intent_type, intent_messages in filtered_intents.items():
        for message in intent_messages:
            intent_list.append({"intent_type": intent_type, "message": message})
    return intent_list

The other way to do this is to use a library called instructor

1 Like

Thanks, I will try the Pydantic approach. Your example is using the chat completion API so I have to convert it to assistant API first.

BTW, instructor does not support the assistant API yet