Web search on Responses API breaks inline citations when passed a pydantic data model as text_format

Bharat_Sharma · July 10, 2025, 1:10pm

Code to reproduce:

openai_client = OpenAI(api_key=OPENAI_API_KEY)

class Company(BaseModel):
    name: str = Field(..., description="Name of the company")
    company_summary: str = Field(..., description="Summary of the company")

def openai_web_search(
    prompt: str,
    model: str,
) :
    response = openai_client.responses.parse(
        model=model,
        tools=[
            {
                "type": "web_search_preview",
                "search_context_size": "high",
            }
        ],
        input=[{"role": "user", "content": prompt}],
        text_format=Company,
    )

    return response

if __name__ == "__main__":
    result = openai_web_search(
        prompt="Which company was the first one to create reusable rockets?",
        model="o3",
    )
    print(result)

This returns the parsed attribute as:
parsed=Company(name='SpaceX', company_summary='While NASA’s government-run Space Shuttle program (first flown in 1981) demonstrated partial reuse, the first company – i.e., privately-owned commercial entity – to design, fly, land and refly a rocket stage was SpaceX. On 22 December 2015 its Falcon 9 Flight 20 mission launched 11 ORBCOMM satellites to orbit and then brought the 15-story first stage back to Landing Zone 1 at Cape Canaveral, marking the first successful recovery of an orbital-class booster that was later reflown. \ue200cite\ue202turn1search12\ue201 (Blue Origin had landed the sub-orbital New Shepard booster a month earlier on 23 November 2015, but Falcon 9 was the first reusable orbital-class rocket.) Therefore, the first company to create a truly reusable rocket capable of reaching orbit was SpaceX.')

It contains this string above in bold, which I assume is the placeholder token where the inline citations was supposed to be added.

This is the content attribute which has an empty annotations list:
content=[ParsedResponseOutputText[Company](annotations=[], text='{"name":"SpaceX","company_summary":"While NASA’s government-run Space Shuttle program (first flown in 1981) demonstrated partial reuse, the first company – i.e., privately-owned commercial entity – to design, fly, land and refly a rocket stage was SpaceX. On 22 December 2015 its Falcon 9 Flight 20 mission launched 11 ORBCOMM satellites to orbit and then brought the 15-story first stage back to Landing Zone 1 at Cape Canaveral, marking the first successful recovery of an orbital-class booster that was later reflown. \ue200cite\ue202turn1search12\ue201 (Blue Origin had landed the sub-orbital New Shepard booster a month earlier on 23 November 2015, but Falcon 9 was the first reusable orbital-class rocket.) Therefore, the first company to create a truly reusable rocket capable of reaching orbit was SpaceX."}

If I remove text_format, the request works as intended.

kristoph · July 11, 2025, 9:31pm

I am also struggling with this issue and have raised it with Open AI. Do you have a work around that removes these citation markers?

Topic		Replies	Views
Streamed responses with web search corrupted by citations Bugs streaming , web-search , responses-api	0	29	July 11, 2025
Web Search Citations Not Appearing in API Response API	2	680	April 13, 2025
Web Search Completion Cuts Off Response and ignores structured outputs on complex prompts API api , structured-output	7	498	June 21, 2025
Citation format differs in GPT-4.1-mini file search: annotations missing, replaced with raw references Bugs assistants-api	0	69	June 17, 2025
ResponsesAPI WebSearch Issue: Same Response Text Despite Different URLs and Queries API api	2	223	March 21, 2025

Web search on Responses API breaks inline citations when passed a pydantic data model as text_format

Related topics