What's the best way to properly encode an image in base64 string in kotlin to later recognize an image?

garcia.raul · May 18, 2024, 2:03am

There’s an example in python where they extract the frames from a video and then use gpt4o model to analyse the image in Introduction_to_gpt4o.ipynb
I am trying to do something similar but for Android using Kotlin and using images from the gallery.
Here’s my encoding code:

    private fun encodeImageToBase64(uri: Uri): String {
        val inputStream: InputStream? = contentResolver.openInputStream(uri)
        val bitmap = BitmapFactory.decodeStream(inputStream)
        inputStream?.close()

        // Ensure the bitmap is not null
        if (bitmap == null) {
            throw IllegalArgumentException("Failed to decode bitmap from input stream")
        }

        // Resize and compress the bitmap
        val resizedBitmap = Bitmap.createScaledBitmap(bitmap, 400, 400, true) // Smaller size
        val byteArrayOutputStream = ByteArrayOutputStream()
        resizedBitmap.compress(Bitmap.CompressFormat.JPEG, 10, byteArrayOutputStream) // Higher compression
        val byteArray = byteArrayOutputStream.toByteArray()

        return Base64.getEncoder().encodeToString(byteArray)
    }

Like this the chat completion api call using gpt4o returns:“I’m sorry, but I can’t view images directly. However, if you describe the image or provide details about it, I can certainly help you with your math homework or any questions you have related to the image!”

If I don’t resize the image it tells me: Failed to get answer: This model’s maximum context length is 128000 tokens. However, your messages resulted in 651934 tokens. Please reduce the length of the messages.

Does anyone knows what can I do to make it work like in python?
Thanks in advance for the help/

_j · May 18, 2024, 6:51am

It sounds like you are using the wrong method to send images, and the AI model is receiving the base64 as language input.

It has to be sent in a special type of message, where the “upload video” array method works if followed closely, but is not the documented method.

Here is the format for sending that one can also see in the API reference. The image is a second object when the user content is an array, also with a detail setting where “detail:low” is max 512x512.

[
    {
        "role": "system",
        "content": [
            {"type": "text", "text": system_prompt}
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this art style."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{encoded_string}"}
            }
        ]
    }
]

garcia.raul · May 18, 2024, 1:36pm

_j:

[
    {
        "role": "system",
        "content": [
            {"type": "text", "text": system_prompt}
        ]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this art style."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{encoded_string}"}
            }
        ]
    }
]

Thanks you are right. I had to switch from using the openai api for java to a simple http request using okhttp.

        val imageUrl = "data:image/jpeg;base64,$imageBase64"
        val jsonBody = JSONObject().apply {
            put("model", "gpt-4o")
            put("temperature", 0.0)
            put("messages", JSONArray().apply {
                put(JSONObject().apply {
                    put("role", "system")
                    put("content", JSONArray().apply {
                        put(JSONObject().apply {
                            put("type", "text")
                            put("text", "You are a helpful assistant that responds in Markdown. Help me with my math homework!")
                        })
                    })
                })
                put(JSONObject().apply {
                    put("role", "user")
                    put("content", JSONArray().apply {
                        put(JSONObject().apply {
                            put("type", "text")
                            put("text", "Describe this art style.")
                        })
                        put(JSONObject().apply {
                            put("type", "image_url")
                            put("image_url", JSONObject().apply {
                                put("url", imageUrl)
                            })
                        })
                    })
                })
            })
        }```

Topic		Replies	Views
java.io.IOException: Server returned HTTP response code: 400 API api	5	752	June 6, 2024
Moving from gpt-4-vision-preview to gpt-4o Image URL Base64 API gpt-4 , api , gpt-4-vision	2	845	September 11, 2024
Use base64 encoded images or urls within prompts? API gpt-4	2	6418	August 7, 2024
API 4o Assistant thread unable to take base64 image as input API assistants-api , gpt-4o , api-vision	3	346	October 18, 2024
Different errors when inputting an image into gpt-4-vision-preview API gpt-4 , api	1	1390	January 17, 2024

What's the best way to properly encode an image in base64 string in kotlin to later recognize an image?

Related topics