What's wrong with my Structured Output response format?

I’m trying to use Structured outputs, and I cannot make it to work. I have been debugging a long now and still not idea why this is happening. I have used structured ouputs before and it has worked, but for this one it does not seem to work.

class FieldRule(BaseModel):
    selector_type: str
    selectors: List[str]
    attribute: Optional[str]

class Rules(BaseModel):
    title: FieldRule
    description: Optional[FieldRule]
    link: FieldRule

class RuleSet(BaseModel):
    name: str
    rules: Rules

class ExtractionRules(BaseModel):
    rules_sets: List[RuleSet]
    has_news: bool

def extract_xpaths(html_content):
    try:
        completion = openai_client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[
                {"role": "system", "content": system_instruction},
                {"role": "user", "content": html_content}
            ],
            response_format=ExtractionRules
        )
        if completion and completion.choices and len(completion.choices) > 0:
            return completion.choices[0].message.parsed
        else:
            logger.error("Invalid completion response from OpenAI")
            return None
    except Exception as e:
        logger.error(f"Error in extract_xpaths: {str(e)}")
        return None

I get the following error:
Object of type Tag is not JSON serializable

Would appreciate any help on finding out the issue.

1 Like

Can you show your full code? This doesn’t have the bug in it.

Run this in your Python environment:

import pydantic_core, pydantic
print(pydantic_core.__version__, pydantic.__version__)

Output result meeting the latest supported:
2.20.1 2.8.2

If you have lesser or greater versions in your Python 3.9-3.11 environment for OpenAI API requests, try this forced upgrade line from the user account (with access to upgrade those installations) or on the venv:

pip install --upgrade --upgrade-strategy eager regex "charset-normalizer<4" "idna" "urllib3<3" "certifi" "requests" "anyio<5" "distro<2" "sniffio" "h11<0.15" "httpcore==1.*" "httpx<1" "annotated-types" "typing-extensions<5" "pydantic-core==2.20.1" "pydantic<3" "jiter<1" "tqdm" "colorama" "openai" "tiktoken"

If you have broader application use in the environment, you should also verify those requirements match what other software you are running, otherwise you may need a separate venv for your API calls.

Pydantic here is the most suspect in sending unanticipated output to the API by different versions, whereas the most reliable code across various runtime platforms will be using simply a https-supporting library and your own support written for sending JSON to the API endpoint URL.

Explanation: openai has broad version numbers as requirements that may be less strict than the upgrades required for compatibility with the platform the latest SDK version is auto-built for. Later explicit upgrades to libraries may not consider all requirements to maintain lower versions.

That’s basically all the relevant part of the code.
There is no Tag object anywhere and exception is happing from the code:
logger.error(f"Error in extract_xpaths: {str(e)}")

You can try it out yourself using my classes.

I’m using Python 3.12.6, so naturally my pydantic versions were much up.
I tried force upgrading using your code and I verified that indeed the version was 2.20.1 2.8.2:

>>> import pydantic_core, pydantic
>>> print(pydantic_core.__version__, pydantic.__version__) 
2.20.1 2.8.2

However, still got the same error:
ERROR - Error in extract_xpaths: Object of type Tag is not JSON serializable

In case anyone wondering these are my imports:

from pydantic import BaseModel
from typing import Optional, List
from openai import OpenAI