There’s an example in python where they extract the frames from a video and then use gpt4o model to analyse the image in Introduction_to_gpt4o.ipynb
I am trying to do something similar but for Android using Kotlin and using images from the gallery.
Here’s my encoding code:
private fun encodeImageToBase64(uri: Uri): String {
val inputStream: InputStream? = contentResolver.openInputStream(uri)
val bitmap = BitmapFactory.decodeStream(inputStream)
inputStream?.close()
// Ensure the bitmap is not null
if (bitmap == null) {
throw IllegalArgumentException("Failed to decode bitmap from input stream")
}
// Resize and compress the bitmap
val resizedBitmap = Bitmap.createScaledBitmap(bitmap, 400, 400, true) // Smaller size
val byteArrayOutputStream = ByteArrayOutputStream()
resizedBitmap.compress(Bitmap.CompressFormat.JPEG, 10, byteArrayOutputStream) // Higher compression
val byteArray = byteArrayOutputStream.toByteArray()
return Base64.getEncoder().encodeToString(byteArray)
}
Like this the chat completion api call using gpt4o returns:“I’m sorry, but I can’t view images directly. However, if you describe the image or provide details about it, I can certainly help you with your math homework or any questions you have related to the image!”
If I don’t resize the image it tells me: Failed to get answer: This model’s maximum context length is 128000 tokens. However, your messages resulted in 651934 tokens. Please reduce the length of the messages.
Does anyone knows what can I do to make it work like in python?
Thanks in advance for the help/
It sounds like you are using the wrong method to send images, and the AI model is receiving the base64 as language input.
It has to be sent in a special type of message, where the “upload video” array method works if followed closely, but is not the documented method.
Here is the format for sending that one can also see in the API reference. The image is a second object when the user content is an array, also with a detail setting where “detail:low” is max 512x512.