I would like to specify partial responses in using the Structured Outputs API.
To illustrate what I mean by this, I created a mock example illustrating my use-case. I suspect that it might not be possible to do exactly what I am looking for, but I hope that there is a way to improve my solution.
Basically, let’s say you have a prompt which involves answering a series of Yes/No questions on a document. You might code something like this:
from typing import List, Literal
from pydantic import BaseModel
from openai import OpenAI
TEMPERATURE = 0
MODEL = 'gpt-4o-mini-2024-07-18'
class DocumentQuestion(BaseModel):
"""
A question about a given company's job description, accompanied by the corresponding answer.
"""
question_label: int
question_text: str
answer: Literal['Yes', 'No', 'N/A']
class Analysis(BaseModel):
questions: List[DocumentQuestion]
fake_job_description = """Job Title: Marketing Analyst
Company: Brightwave Solutions
Location: Remote
Brightwave Solutions is seeking a Marketing Analyst to join our growing team. The ideal candidate will analyze market trends, manage digital marketing campaigns, and optimize customer acquisition strategies. Responsibilities include data-driven decision-making, preparing reports, and collaborating with cross-functional teams. Applicants should have experience with Google Analytics, SEO, and CRM platforms.
Requirements:
Bachelor's degree in Marketing, Business, or related field
2+ years of experience
Strong analytical and communication skills
Salary: $55,000 - $75,000 annually
Apply by: October 31, 2024"""
questions = [
{"question_label" : 1, "question_text": "Does the job description specify a salary expectation?"},
{"question_label" : 2, "question_text": "If the job description specifies a salary expectation: Is this a salary range?"},
{"question_label" : 3, "question_text": "If the job description specifies a salary expectation: Is this an exact salary?"},
{"question_label" : 4, "question_text": "Does the job offer a hybrid working arrangement?"},
]
user_input_template = (
"Your task is to read through the document provided below, and answer the provided questions.\n"
"<document>{document}</document>\n"
"<questions>{questions}</questions>"
)
# Create a user message
user_input = user_input_template.format(
document = fake_job_description,
questions = questions
)
messages = [{"role": "user", "content": user_input}]
client = OpenAI()
completion = client.beta.chat.completions.parse(
model=MODEL,
temperature=TEMPERATURE,
messages=messages,
response_format=Analysis
)
parsed_completion = completion.to_dict()['choices'][0]['message']['parsed']['questions']
This approach seems to have a few issues, in my estimation:
- It relies on the model picking out all the questions from the
questions
list, which is not guaranteed - hallucination risk - It relies on the model quoting these questions correctly, which is not guaranteed - hallucination risk
- It wastes output tokens (which, of course, are more expensive than input tokens), since these questions are identical for every single document, meaning that the model will duplicate copy-pasting these across every single API call.
To further illustrate this, let’s say I continue using the same approach, and build a second model - an evaluation model which checks whether the output of the above model is correct (which you could include in the production roll-out of the above model to be able to monitor it’s performance in an automated way):
class DocumentQuestionEvaluation(BaseModel):
"""
A question about a given company's job description, accompanied by the corresponding answer and an evaluation thereof.
"""
question_label: int
question_text: str
answer: Literal['Yes', 'No', 'N/A']
evaluation_explanation: str
evaluation: Literal['Consistent', 'Inconsistent']
class AnalysisEvaluation(BaseModel):
questions: List[DocumentQuestionEvaluation]
user_input_template = (
"Your task is to evaluate the analysis provided below, and by checking whether it is consistent with the document provided below.\n"
"<analysis>\n{analysis}\n</analysis>\n"
"<document>\n{document}\n</document>"
)
# Create a user message
user_input = user_input_template.format(
analysis = parsed_completion,
document = fake_job_description
)
evaluation_messages = [{"role": "user", "content": user_input}]
completion = client.beta.chat.completions.parse(
model=MODEL,
temperature=TEMPERATURE,
messages=evaluation_messages,
response_format=AnalysisEvaluation
)
parsed_completion_evaluation = completion.to_dict()['choices'][0]['message']['parsed']['questions']
This amplifies the previously mentioned issues, as you now rely on the model to correctly recall the list of questions, as well as their responses. This creates a hallucination risk which, in theory, structured outputs could completely alleviate.
So, to take the above example, is there a way to hard-code the list of questions and answers, in such a way that the evaluation model only has to generate the evaluation_explanation
and evaluation
fields?
I imagine that instead of having to create a “partial response” to a given schema, you could reframe this partial response as another response schema or pydantic model, but I’m having trouble imagining what this would look like.