The ChatGPT version can, but the API version still has these docs:
The latest o1 model supports both text and image inputs, and produces text outputs (including Structured Outputs). o1-mini currently only supports text inputs and outputs.
Great that it can accept image inputs. I put together a solution that seems to be working pretty well as a service that can take a pdf and return base64 encoded images of each page of the PDF that can then be fed in as an image_url: github/kgn/pdf2imgs