I recently started testing the OpenAI GPT API and came across a strange situation while experimenting in the Playground.
When I send a file for summarization, the Chat Completions API consumes significantly fewer input tokens compared to the Responses API.
Why is that?
Chat Completions
In: 2569 t
Out: 343 t
Responses
In: 77200 t
Out: 367 t
Code snippet from Playground:
Chat Completions
In: 2569 t
Out: 343 t
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": [
{
"text": "You receive a file with an article.\n Summarize it in bullet points.",
"type": "text"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": ""
},
{
"type": "file",
"file_data": "data:application/pdf;base64,JVBE....
"file": {
"filename": "file.pdf"
}
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "..."
}
]
}
],
response_format={
"type": "text"
},
temperature=1,
max_completion_tokens=2048,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
----
Responses
In: 77200 t
Out: 367 t
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-4o-mini",
input=[
{
"role": "system",
"content": [
{
"type": "input_text",
"text": "You receive a file with an article.\n Summarize it in bullet points."
}
]
},
{
"role": "user",
"content": [
{
"file_data": "data:application/pdf;base64,JVBE...
"type": "input_file",
"filename": "file.pdf",
}
]
},
{
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "..."
}
]
}
],
text={
"format": {
"type": "text"
}
},
reasoning={},
tools=[],
temperature=1,
max_output_tokens=2048,
top_p=1,
store=True
)