Undesired categories in multiclass classification with gpt-4o-mini

padpadpad · November 8, 2024, 5:20pm

Hello everyone!

I am fairly new to working with the API and I am facing an issue while using GPT-4o-mini for a classification task

My problem is as follows: I want to classify scraped text into categories so that I do not have to do it manually.

There are many categories—around 1500.
The scraped text varies from 1 to 50 words. Sometimes, it directly matches a category, but other times the information is more implicit and needs to be inferred. For example:

W6184QAX (Airlive) → W6184QAX
TUF Gaming AX6000 (Tuf-Ax6000), Wireless Router Gigabit Ethernet Dual-Band (2.4 GHz / 5 GHz) Black (ASUS) → TUF Gaming AX6000

The main problem is that the model invents new categories even when I set the temperature to 0, even if sometimes the guess seems very straitforward…

Also, the output is sometimes incorrectly formatted. The model uses quotation marks or writes the answer in a sentence, even though I specified not to do so

This is the prompt for each request, maybe it could help you understand why the model does not obey:

messages = [
{‘role’: ‘system’, ‘content’: “You are an assistant that determines which reference from this list (1589 references): <list of the 1589 references> matches a given reference. Answer with the corresponding element in the list only.”},
{‘role’: ‘user’, ‘content’: “What is the corresponding reference for this given reference?: ”}
]

Do you have any recommendations for solving this problem? Fine-tuning did not help either… Thank you!

_j · November 8, 2024, 6:41pm

The model does not obey because it is simply too dumb to evaluate 1600 choices equally against its pretrained knowledge and the rest of the instructions you give. Mini means less quality and less attention.

If you want to constrain the output so the AI cannot write anything else, you could send four anyOf schemas in a response_format. Then each schema has a different name (perhaps a category) that the AI has to first write correctly, and then an enum string with 400 options (500 max per enum). See if that isn’t beyond the limit of the total object.

Remember: structured outputs with enum will force AI to use one of those, regardless of the match or quality. It should also have a “get out” schema for no matches.

Otherwise, you’ll have to make multiple requests with sub-lists, then have a final AI decide which of those is the best.

alejoforero · November 8, 2024, 11:12pm

Agree with OP. I have a similar use case with less categories. The answer has been to use structured output + pydantic with 100% compliance by GPT even though I use temperature=0.3. You can check the documentation for structured outputs. This is the option I use in the request:

“response_format”: {
“type”: “json_schema”,
“json_schema”: {
“name”: “entity_response”,
“schema”: flattened_schema, # Pydantic schema passed here
“strict”: True # Optional strict validation
}
}

karthik.shivaram · November 10, 2024, 5:57am

Your label space is too large for a model to perform effectively, one good approach here would be to set this up as a hierarchical classification task, and recursively call the model across different hierarchy levels.

This way you reduce the number of labels options you send to the model (which majorly tends to improve performance and reduces hallucination rate, also makes it easy to validate, (you could also technically create dynamic json schemas for each level in your hierarchy which can be used to take advantage of the structured output functionality of the api , further aiding in reducing hallucinations))

One drawback is you need to create a label hierarchy (very similar to a taxonomy), but i believe these models are quite adept at helping you create this. (Or you could try a mix of clustering and using LLM’s to guide/correct the clustering process).

Topic		Replies	Views
How Can I Use the OpenAI API to Categorize Large Amounts of Text Data? API classification	3	5557	May 23, 2023
Help with fine-tuning for text categorization API	4	1293	December 16, 2023
Use OpenAPI for supervised classification task API	4	1877	March 26, 2023
Force GPT 3.5 Turbo to choose an answer from a set of predefined options API	5	435	June 7, 2024
How do I handle a large number of classes for classification API	12	3358	May 28, 2024

Undesired categories in multiclass classification with gpt-4o-mini

Related topics