Best practices for Structured Inputs?

amirbakarov · October 3, 2024, 5:19pm

Hi everyone! I’m building a simple text classifier using OpenAI API, and I was wondering if there is a way to explicitly define an input data structure?

For example, my input contains a list of texts to classify (texts) and a list of available labels (labels), and I want the API to match each text with one of the labels from the list. I’m defining the output format through the following data structure:

from pydantic import BaseModel

class LabeledText(BaseModel):
    text: str
    label: str

class LabeledTexts(BaseModel):
    texts: list[LabeledText]

And then I pass this structure through the response_format parameter:

response = await async_client.beta.chat.completions.parse(messages=messages, model=model, response_format=LabeledTexts)

But I’m not sure how can I define an input structure texts and labels in the same fashion (so I can explicitly separate those variables from the prompt instructions). My current solution is to pass the lists without any formatting:

prompt = f"""
Act like a text classifier. You will be given a list of texts and a list of labels. Your task is to match each text with a label. Return the results in a JSON format where each item contains the original text and the corresponding label.

Texts: {texts}
Labels: {labels}
 """

But the classification quality is not that great, and I feel like the input structure might be the main bottleneck here. Does anyone know some best practices to structure the inputs for API in similar scenarios?

Thanks!

platypus · October 3, 2024, 6:20pm

Hi @amirbakarov !

So how I have solved this in the past, is by specifying the instruction and the labels in the system prompt, and the text to be classified in the user prompt. However, what would drive your performance even higher, is if you created a few-shot context, i.e. by giving at least one example per correct label. Again, you would do this in the system prompt.

So assuming zero-shot prompt, your system prompt would look like this:

**Instructions**
You are an expert text classifier.
You will be given text samples, one per line.
Your task is to match each text with a label.
Return the results in a JSON format where each item contains the original text and the corresponding label.
The following are the labels you need to choose from when assigning to your text samples:

- Label 1
- Label 2
- ...
- Label N

Your user prompt is then simply text samples, i.e.

**Text Data**
Text 1
Text 2
...

Topic		Replies	Views
Few-Shot Prompting with Structured Outputs Prompting gpt-4 , chatgpt , api	1	2545	December 7, 2024
Single pre-prompt with multiple prompts for classification task API	3	3598	February 26, 2024
Recommended approach for few shot examples in structured output Prompting api , structured-output	0	364	April 28, 2025
Looking for advice on prompt engineering + API setup Community project , api	2	231	December 9, 2024
Keywords and meta description API	2	509	December 23, 2024

Best practices for Structured Inputs?

Related topics