How does OpenAI charge tokens when sending PDF content in a prompt?

Hi everyone, I have a question regarding how OpenAI charges for PDF content sent in a prompt.

Here’s the context: I’m sending a PDF file (encoded in base64) as part of the messages property to extract data from it. I’m not using file storage—just embedding the content directly in the request.

The code:

from openai import OpenAI
import base64
from pydantic import BaseModel

client = OpenAI()

file_name = "my_pdf.pdf"

with open(file_name, "rb") as file:
    file_data = file.read()
    base64_data = base64.b64encode(file_data).decode('utf-8')


class DataExtractor(BaseModel):
    email: str


response = client.beta.chat.completions.parse(
  model="gpt-4.1-mini",
  messages=[
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a expert to extract data from a PDF files."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "PDF file:\n"
        },
        {
          "type": "file",
          "file": {
            "file_data": "data:application/pdf;base64," + base64_data,
            "filename": file_name,
          }
        }
      ]
    }
  ],
  response_format=DataExtractor
)

data : DataExtractor = response.choices[0].message.parsed
print(data)

When I check the OpenAI API pricing page, I don’t see any distinction between tokens generated from regular text vs. tokens coming from a file.

My assumption is that the token pricing applies the same way—regardless of whether the content originated from a PDF or plain text—since it’s all just tokens in the end.

This question is important to me because I’ve noticed that this strategy works surprisingly well when dealing with PDFs that contain text-based images. It effectively bypasses the need for a separate OCR process—which is a big win in terms of simplicity and performance.

Is this correct? Are base64-decoded PDF contents treated the same as any other text tokens for billing purposes?

Thanks in advance!

https://platform.openai.com/docs/guides/pdf-files#how-it-works

How it works

To help models understand PDF content, we put into the model’s context both the extracted text and an image of each page. The model can then use both the text and the images to generate a response. This is useful, for example, if diagrams contain key information that isn’t in the text.

https://platform.openai.com/docs/guides/pdf-files#usage-considerations

Token usage

To help models understand PDF content, we put into the model’s context both extracted text and an image of each page—regardless of whether the page includes images. Before deploying your solution at scale, ensure you understand the pricing and token usage implications of using PDFs as input. More on pricing.

It would probably require a count of tokens used in the response using something like tiktoken and the difference is the file input.

1 Like

@johnidouglas From the docs, it seems to be the case that images are also taken to ensure OCR runs well for image-based PDFs. Could you compare the token usage for the two versions of the same PDF: one with actual text, and the other with an image of the text (which would require OCR), and let us know what you find?

For reference, the pricing for images is available here, and is capped at 85 tokens the ‘low’ detail setting for the 4.1-mini model. My assumption then is that each page of the PDF only has a maximum overhead of 85 tokens (in addition to the tokens associated with the actual text), which is a really good deal.

1 Like

4.1-mini doesn’t have a detail parameter. It is calculated by a formula called “patches”, with a maximum cost of 1536 tokens. In the case of an 8.5x11 US letter paper, the cost would be 2424 tokens billed, $0.0009696 per image, or $0.01 per 10 pages for the image component of PDF.

I just grabbed a research paper PDF off my PC to make a image-only page. The content is narrower - and more expensive. Image calculations from passing the image at maximum size (or higher):

Input: 1304 × 2050 px → Resized: 992 × 1559 px (31 × 49 patches)
Model: gpt-4.1-mini $0.4000/1M Billing 1.6200000x
Price: $0.0009844 Base Tokens: 1519 Input Tokens: 2461

However, running the image-only PDF on the Responses API endpoint against gpt-4.1-mini bills: 1,296 tokens. Which, even understanding the additional text placed internally alongside each PDF page, is inexplicable, with the image thus being about 700x1100 per the billing, which doesn’t align with any published size. Maybe 108 pixels per inch.

So no, 1248 tokens billed for a PDF image plus its PDF overhead is not 85 tokens.

1 Like

I created this repository — openai-content-files — to demonstrate the results based on your suggestions.

The sample text used was:
“My name is John and my email is John@john.com.”
(Sorry for the typo — “an” should be “and.”)

The results show that PDFs containing text-based images consume significantly more tokens than plain-text PDFs:

  • doc-plain-text.pdf used 319 tokens
  • doc-text-based-images.pdf used 1,089 tokens

I assume the cost per token is the same for both cases (according to the model used in this example), but the PDF with text-based images results in higher token usage.

I’m not entirely sure that every PDF containing images will consistently result in more tokens. I’ll run additional tests to investigate this further.

1 Like