TL;DR: application/octet-stream is the mime type YOU are sending. The API prefers you send images.
The “something wrong” is likely the method of SDK inspection of files and then creating the multipart/form-data request. Or it could be that the image exceeds that which can be sent, or was returned as a html wrapper. Write your own API call code.
For success, retrieve remote URLs, improve the vision by sizing for the model input (512x512 base for edits despite the larger output possible). If you are using the input detail parameter, an undescribed resolution increase might be possible.
Just an idea of what taking control of the object-oriented contents of your API request could look like.
@dataclass
class InMemImage:
"""A single in-memory image with filename, buffer, MIME, format, dims, alpha."""
filename: str
data: io.BytesIO
mime: str
fmt: str
width: int
height: int
alpha: bool
@classmethod
def from_path(cls, path: Path) -> "InMemImage":
"""Load bytes, sniff real format with Pillow, correct extension & MIME."""
raw = path.read_bytes()
bio = io.BytesIO(raw)
try:
# Open the original file bytes once
with Image.open(io.BytesIO(raw)) as im0:
# Capture format BEFORE any transforms (transpose/convert can unset it)
fmt = (im0.format or "").upper()
# Do EXIF orientation fix on a derived image for accurate dims/mode
im = ImageOps.exif_transpose(im0)
w, h = im.size
has_alpha = (im.mode in ("RGBA", "LA")) or (im.mode == "P" and ("transparency" in im.info))
except UnidentifiedImageError as e:
raise ValueError(f"Unrecognized/unsupported image file: {path}") from e
# Fallback: infer format from extension if Pillow didn't provide one
if not fmt:
ext_guess = path.suffix.lower().lstrip(".")
ext_to_fmt = {
"png": "PNG",
"jpg": "JPEG",
"jpeg": "JPEG",
"webp": "WEBP",
"gif": "GIF",
"tif": "TIFF",
"tiff": "TIFF",
"bmp": "BMP",
"avif": "AVIF",
"heif": "HEIF",
}
fmt = ext_to_fmt.get(ext_guess, "")
if not fmt:
# Still unknown type: bail with a helpful message
raise ValueError(f"Unrecognized image format: {path}")
fmt_to_ext = {
"PNG": "png", "JPEG": "jpg", "JPG": "jpg", "WEBP": "webp",
"GIF": "gif", "TIFF": "tiff", "BMP": "bmp", "AVIF": "avif", "HEIF": "heif",
}
fmt_to_mime = {
"PNG": "image/png", "JPEG": "image/jpeg", "JPG": "image/jpeg", "WEBP": "image/webp",
"GIF": "image/gif", "TIFF": "image/tiff", "BMP": "image/bmp", "AVIF": "image/avif", "HEIF": "image/heif",
}
ext = fmt_to_ext.get(fmt)
mime = fmt_to_mime.get(fmt)
if not ext or not mime:
raise ValueError(f"Unsupported image format {fmt} for {path}")
fixed_name = f"{path.stem}.{ext}" # correct/replace missing or wrong extension
return cls(
filename=fixed_name,
data=bio,
mime=mime,
fmt=fmt,
width=w,
height=h,
alpha=has_alpha,
)
def reencode_as_png(self, pil_image: Image.Image) -> None:
"""Write back as high-compression PNG; update filename/mime/dims/alpha."""
etc
Or instead of writing your code for you, I’ll have GPT-5 write about my edits code for you:
- How the request sends MIME types (including the input images)
- The body is multipart/form-data, built via httpx’s files= argument. For each image, InMemImage.as_multipart returns a tuple of the form:
("image[]", (filename, fileobj, content_type))
where content_type is self.mime (for example, “image/jpeg”, “image/png”, etc.). httpx uses that as the Content-Type for that individual part.
-
httpx also sets the overall request header Content-Type: multipart/form-data; boundary=… automatically; you don’t need to set it yourself.
-
Non-file fields (model, prompt, size, etc.) are added as (k, (None, str(v))) tuples, which makes them normal text parts in the multipart payload.
-
The Accept header in get_auth_headers controls the expected response type, not the upload MIME. For streaming it is “text/event-stream”; otherwise it’s “application/json”.
- How the code ensures a valid MIME type via file inspection
-
It reads the raw bytes and opens them with Pillow: with Image.open(io.BytesIO(raw)) as im0:. im0.format is taken as the authoritative file type (sniffed from the bytes), not the filename extension.
-
If Pillow can’t identify the image, the code raises a clear error (Unrecognized/unsupported image file).
-
If im0.format is missing, it falls back to a small extension-to-format map (png, jpg/jpeg, webp, gif, tiff, bmp, avif, heif).
-
It then maps the detected format to:
-
A canonical filename extension (fmt_to_ext)
-
A correct MIME type string (fmt_to_mime), e.g., JPEG → image/jpeg, PNG → image/png, etc.
If either extension or MIME can’t be found, it raises an error (Unsupported image format …).
-
If the code resizes an image in tweak_images, it re-encodes to PNG via reencode_as_png and updates both filename and mime to PNG. This guarantees the outgoing part’s Content-Type matches the actual bytes sent.
-
Net effect: the MIME type attached to each multipart file part is derived from the inspected bytes (Pillow) and only falls back to extension if Pillow didn’t provide a format, with safeguards and clear errors for unknown/unsupported formats.