I am trying to write my app that can send both images and pdf attachments to ChatGPT 4o.
I am aware that using the openAI Assistant feature with FileID makes reading PDF possible with ChatGPT4o. And I am also aware of using normal completion API with image_url makes reading image possible.
The annoying part is that the Assistant feature doesn’t support images, and on the contrary sending PDF as image_url with completion API results in “Invalid MIME type. Only image types are supported.”
Do I really have to check file extension first to determine to use Assistant or Completion?
I see that Assistant thread message creation has this image_url feature, but it doesn’t work quite the same way as the CompletionAPI where it can accept local image_url rather than the thread message only accepts external image_url.
I guess I could first determine if the extension is non-pdf file, then I will upload the image file to an image hosting service to provide the image_url to thread message.
The Assistants API supports file upload of images first to your API storage, and then you can attach an image to a user message placed into a thread by using the file ID received.
That method is clearly depicted in the image of the expanded API reference above.
This will not currently work though, because OpenAI broke the API and has not rectified the issue for a week.
I see their API is giving me an error message “Sorry, something went wrong.” and because of that I thought the Thread Assistance does not have image functionality yet.
Thank you for clarifying this!
Has anyone gotten this to work yet? And uploading the image to a hosting service did not work since it needs to be a trusted public image source like Wikicommons, BBC, etc.
sorry for the delay, I just noticed your question. I am using chat completions for images and assistant for pdfs. Then I am getting their answer on my main assistant. Connecting those using function calls, and sometimes calling the needed pdf-assistant/image-chat-completion directly when I am receiving a file, not URL. For the context, I am doing these for WhatsApp and Telegram chatbots