PDF input file is not seen by gpt-5, gpt-5-mini but sometimes by gpt-5-nano

stephen29 · August 7, 2025, 10:08pm

I’m exploring how well gpt-5, gpt-5-mini and gpt-5-nano can understand input PDFs. No matter what I try, I can’t manage to get gpt-5 and gpt-5-mini to ever recognise the PDF input_file content message item. Oddly gpt-5-nano sees it about 1/5th of the time.

This is broadly consistent with my finding with gpt-4.1 two months ago in this forum question:

Can anyone indicate what I might be doing wrong (see code snippet below)? Or shed some light on what the actual capabilities of these models are for direct PDF inputs?

Many thanks

Stephen

    def build_user_content(
        self,
        input_text: str | None = None,
        input_files: bytes | list[bytes] | None = None,
    ) -> tuple[dict[str, Any], str]:
        """
        Build the content for the OpenAI request and return (content_list, content_digest).
        """
        content: list[dict[str, Any]] = []
        message: dict[str, Any] = {"role": "user", "content": content} #, "type": "message"}
        blobs: list[bytes] = []

        if isinstance(input_files, bytes):
            input_files = [input_files]
        
        if isinstance(input_files, list):
            blobs.extend(input_files)
            for file_bytes in input_files:
                file_digest = calculate_file_digest(file_bytes)
                if file_type := filetype.guess(file_bytes): # type: ignore
                    file_name = f"{file_digest}.{file_type.extension}"
                    file_data = f"data:{file_type.mime};base64,{base64.b64encode(file_bytes).decode()}"
                    if file_type.extension in ('jpg', 'jpeg', 'webp', 'png', 'gif'):
                        item = {"type": "input_image", "image_url": file_data, "detail": "auto"}
                    elif file_type.extension in ('pdf',):
                        item = {"type": "input_file", "filename": file_name, "file_data": file_data}
                    else:
                        raise ValueError(f"Unsupported file type: {file_type.extension} for file with digest {file_digest}")
                else:
                    raise ValueError(f"Cannot determine file type for file with digest {file_digest}")
                content.append(item)

        if input_text:
            blobs.append(input_text.encode())
            item = {"type": "input_text", "text": input_text}
            content.append(item)

        digest = xxhash.xxh3_64_hexdigest(b''.join(blobs))        
        return message, digest

photogsy · November 13, 2025, 12:36pm

I know this is a while old now, but I have run up against a similar issue with gpt-5-mini ignoring files in the vector store.

There are a few PDF files in the vector store, and I ask about them, passing in the correct vector store id, (Responses API) but it ignores them and tells me it can’t find anything in the files.

The logs show that it didn’t actually search any files:

OpenAI_Support · December 3, 2025, 1:34am

Hello! I haven't been able to reproduce this just yet - are you still seeing these issues with PDFs?

_j · January 19, 2026, 8:47pm

You are the opposite of correct.

Have a read, either the example, the “build_pdf_contents()” docstring, or by reading the code itself over here that yields success:

In this topic’s top post, using a MIME discovery is kind of redundant, because you can only send a PDF, a:
f"data:application/pdf;base64,{b64}"

Topic		Replies	Views
Direct PDF file input now supported in the API Announcements	19	10357	September 17, 2025
Files API issue when sending pdf files to extract out info Bugs	2	279	June 4, 2025
Chat-completions API: multi-file input, only uses last file? Bugs	5	1008	June 27, 2025
Inconsistent Responses with PDF File Upload in OpenAI Chat Completion API Bugs	5	786	September 22, 2025
Unstable performance with PDF files in Responses API / Docs are also unclear on capabilities Bugs api , pdf	2	723	June 5, 2025

PDF input file is not seen by gpt-5, gpt-5-mini but sometimes by gpt-5-nano

Related topics