Chat Completions vs Responses and pdf file (new PDF file vision upload modality added to CC)

I recently started testing the OpenAI GPT API and came across a strange situation while experimenting in the Playground.

When I send a file for summarization, the Chat Completions API consumes significantly fewer input tokens compared to the Responses API.

Why is that?

Chat Completions
In: 2569 t
Out: 343 t

Responses
In: 77200 t
Out: 367 t

Code snippet from Playground:

Chat Completions

In: 2569 t
Out: 343 t

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {
      "role": "system",
      "content": [
        {
          "text": "You receive a file with an article.\n Summarize it in bullet points.",
          "type": "text"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": ""
        },
        {
          "type": "file",
            "file_data": "data:application/pdf;base64,JVBE....
			"file": {
            "filename": "file.pdf"
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "..."
        }
      ]
    }
  ],
  response_format={
    "type": "text"
  },
  temperature=1,
  max_completion_tokens=2048,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)


----
Responses

In: 77200 t
Out: 367 t

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
  model="gpt-4o-mini",
  input=[
    {
      "role": "system",
      "content": [
        {
          "type": "input_text",
          "text": "You receive a file with an article.\n Summarize it in bullet points."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "file_data": "data:application/pdf;base64,JVBE...
          "type": "input_file",
          "filename": "file.pdf",
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "..."
        }
      ]
    }
  ],
  text={
    "format": {
      "type": "text"
    }
  },
  reasoning={},
  tools=[],
  temperature=1,
  max_output_tokens=2048,
  top_p=1,
  store=True
)

Chat Completions does not support “file upload”, you say. You’d now be wrong.

Whatever the playground is doing, it should not be doing it without dialog or warning, for the benefit of a developer that might be expecting vector store behavior.

Starting fresh at Chat Completions Prompts Playground and attaching a 6MB PDF: 81000 input tokens.

image

The body sent by the playground is the whole file x4/3 bytes:

{"messages":[{"role":"system","content":[{"type":"text","text":"(system message)"}]},{"role":"user","content":[{"type":"text","text":"(my user message)"},{"type":"file","file":{"file_data":"data:application/pdf;base64,JVBERi0xLjUKJY8KMTgxIDAgb2JqCjw8IC9GaWx0ZXIgL0ZsYXRlRGVjb2RlIC9MZW5ndGggMT.. (continues for full file sent in “type”:“file”.)


With no big announcement, a file modality has been added to chat completions user role messages also, like images:

And the playground is sending them.

Since Chat Completions in the same API reference showing file part is “Currently, only functions are supported as a tool.”, so we have to go to the docs to see what is being done without vector stores, now with a non-prominent drop-down to switch to chat completions for near duplicate-looking documentation.

OpenAI models with vision capabilities can also accept PDF files as input. PDFs can be provided either as Base64-encoded data or via file IDs obtained after uploading files to the /v1/files endpoint through the API or dashboard.

and:

To help models understand PDF content, we put into the model’s context both extracted text and an image of each page—regardless of whether the page includes images. Before deploying your solution at scale, ensure you understand the pricing and token usage implications of using PDFs as input.

The token implication is $0.20 for a test message and an unknowable amount of context-filling vs. what you sent as context yourself in messages.

This is the same target as Responses’ file upload feature - not targeting a vector store, but only allowing PDF and the same context. Responses input_file takes a filename and file_data field.

1 Like

I checked my PDF file and it is an image (print to PDF a page in Firefox), size 1MB.

I’m attaching a file using attach file button in user message box.

I still don’t understand why using chat completions with a file uses 2500t, but responses uses 70000t input tokens.

You are likely also switching the billing for gpt-4o-mini or they have an issue.

Images given to gpt-4o-mini are billed at twice the actual cost and 33x the actual token consumption.

This PDF feature loads pages of images into the model.

Are you still running into this issue? I’m not able to replicate locally. If you are, could you send me an email at nikunj [at] openai.com and i’ll take a deeper look.

Sorry for the trouble