Is the “file upload + reference ID” pattern supported?

Augustine · August 2, 2025, 6:10am

Hi everyone,

I’m building a multimodal OCR/translation tool using the OpenAI Python SDK, and I want to avoid embedding large Base64 image strings in prompts because of the huge token cost. My ideal flow is:

Upload a preprocessed image (JPEG with resizing/compression) to OpenAI.
Get back a file/image reference ID.
Send a chat completion request referencing that uploaded image so the model can process it (e.g., OCR + translation) without me putting the whole Base64 in the prompt.

Environment / Context

openai Python SDK version: 1.97.1
Models: gpt-4.1 (and variants like gpt-4.1-mini, gpt-4.1-nano)
Python: 3.13 on macOS
Current fallback (works): Inline Base64 of a compressed JPEG (resize to max width 1024, quality=70) embedded in the prompt, but it costs ~30k+ tokens per image because of the encoded size.
Desired: Use “file upload + reference ID” instead, and only send a small textual prompt to the model.

What I’m seeing
• The inline Base64 path works and returns results, but consumes tens of thousands of tokens per image.
• The attempt to do client.files.upload(…) either fails or the upload path seems unavailable (in some runs the library reports no upload method, in others it throws errors).
• I have logging around the upload branch and fallback; when upload is skipped, it prints that it’s falling back to base64.

Specific questions
1. With openai==1.97.1 and the GPT-4.1 family, is the “file upload + reference ID” pattern supported for sending images (so the model can see/process the image) instead of inline Base64?
2. If so, what is the correct, current way to perform that in a chat.completions.create(…) call—i.e., once I have the file_id, how should I reference the image so the model uses it (without embedding the Base64)?
3. Why might client.files.upload be missing or not work in some contexts? Could it be due to:
• SDK misuse (wrong parameters / naming)?
• Using a non-vision-enabled endpoint or model configuration?
• Account/feature flags or required opt-ins?
4. What’s the semantic difference between using purpose=“vision” vs purpose=“assistants” when uploading an image to be used as input? Are there scenarios where one works and the other doesn’t?
5. If the file upload/reference path cannot be made to work in my environment, what is the best fallback strategy that minimizes token usage while still giving the model access to image content?

Thanks in advance!

Minimal reproducible snippet I’m using to probe the upload path:

from openai import OpenAI
import io

client = OpenAI(api_key="YOUR_API_KEY")  # openai==1.97.1

# Load and prepare image
with open("test.jpg", "rb") as f:
    img_bytes = f.read()
buf = io.BytesIO(img_bytes)
buf.name = "image.jpg"  # Ensure multipart upload can infer filename

# Debug introspection
print("[Debug] client attrs:", dir(client))
print("[Debug] has files:", hasattr(client, "files"))
if hasattr(client, "files"):
    print("[Debug] client.files attrs:", dir(client.files))

# Attempt file upload
try:
    file_obj = client.files.upload(file=buf, purpose="vision", file_name="image.jpg")
    print("Upload succeeded:", file_obj)
    file_id = getattr(file_obj, "id", None) or file_obj.get("id")
    print("Received file_id:", file_id)
except Exception as e:
    print("Upload failed:", e)
    # Fallback to inline Base64 path here

Topic		Replies	Views
Can Assistants API understand image files uploaded? API	11	11846	September 28, 2024
Ask GPT-4o about a file - Example python function with file upload base64 and tiktoken and usage history with forced json return API gpt-4o	3	4140	June 8, 2024
Regression - Support for File Uploads in Chat Completions Feedback	5	524	September 15, 2025
Can we use file uploads (e.g. file IDs) in batch requests to /v1/responses? API batch-api	1	224	July 24, 2025
Image upload in Chat Completions, Responses and Assistants Bugs api , chat-with-images , assistants-api , o1 , responses-endpoint	3	4001	March 25, 2025

Is the “file upload + reference ID” pattern supported?

Environment / Context

Minimal reproducible snippet I’m using to probe the upload path:

Related topics