For long running conversations, we suggest passing images via URL’s instead of base64
at https://platform.openai.com/docs/guides/vision/managing-images
Just why?
For long running conversations, we suggest passing images via URL’s instead of base64
at https://platform.openai.com/docs/guides/vision/managing-images
Just why?
For long-running conversations, it may be that OpenAI caches the image request data, so a new download from the remote source is not needed when continuing to pass the URL in past chat, whereas if you sent the BASE64 every time (as you must do with chat completions), your re-sending the same data over and over would have more network delay.
I’m surprised they even support base64. That’s the worst way mankind has ever invented to handle images, unless it’s a tiny icon or something. The entire internet uses the URL way to access images, and that’s all that was needed.
EDIT: as mentioned below, despite horrible inefficiency of base64 it might be needed for sending from localhost.
Well, no.
If I have a local image I would prefer to send it over as b64 (as most email providers do) then upload it to a remote source
I should have said “most wasteful way” (of bandwidth), rather than “worst way” which is too vague, but you’re right I wasn’t considering cases when sending from localhost. I’ll edit that post. Thanks for pointing that out.
Fair. Yeah. Not as efficient as directly downloading the file.
Fortunately, modern HTTP takes care of any inefficiencies for you.
Let’s execute these steps.
First, let’s get the size of the uploaded image file.
import os
# Path to the uploaded file
file_path = '/mnt/data/redheadartifact-bing.jpg'
# Get file size in bytes
file_size = os.path.getsize(file_path)
file_size
Next, we’ll read the image file and encode it in BASE64.
import base64
# Read the image file and encode in BASE64
with open(file_path, 'rb') as image_file:
base64_encoded_str = base64.b64encode(image_file.read()).decode('utf-8')
len(base64_encoded_str)
Now, we’ll compress the BASE64 encoded string using Brotli.
import brotli
# Compress the BASE64 encoded string using Brotli
compressed_data = brotli.compress(base64_encoded_str.encode('utf-8'))
compressed_size = len(compressed_data)
compressed_size
Finally, we will provide the size of the compressed data.
I’ll run this code in the Python environment to get the results.
The image file was successfully encoded and compressed. The final size after Brotli compression is 87,279 bytes.
Yeah you can take a compressed image, decompress it, then put it in a less efficient format (base64) and then recompress it again, and send that compressed output. I realize that’s sensible for sending data from localhost, but for most networked components it’s the worst way possible, when you can just use a URL. I stand by my original statement, but it needed more context.
The whole IP stack is just a gigantic layer cake, I don’t know if nickel and dimeing a couple of megabytes here and there is gonna break the bank on your monthly terabits
Not just sending from localhost, any serverless app that just works classically on a device (or even web apps that are hosted statically) require this feature.
Sounds like we’re all in agreement that base64 should be avoided whenever possible, as it is a far less efficient approach than a URL.
I would say only if using URLs does not require a major change in your architecture (like introducing a server). Anyway, has someone measured the real performance impact?
We don’t need to conduct any experiments, when we know for sure base64 is 40% larger than the image. (on average). Some people will care. Some people will not care.
Only if network transmission or base64 encoding/decoding is the bottleneck. I would not be surprised if that was just a small share in comparison to the processing inside the model.
I just showed that the BASE64 image in transit is even smaller than the source jpg file when using Accept-Encoding: gzip,deflate,br
. Which you misclassed as re-encoding.
Well it’s definitely news to me if you’re claim is that taking JPGs, and encoding to BASE64, and then compressing the BASE64, gives a net-net overall decrease in size. But if so, that’s just a measure of how inefficient at compression JPG is then.
And that implies if you did take the actual raw image bytes (from an uncompressed format) and use your same compression algo, then you would get something even MORE highly compressed. Because you wouldn’t be compressing something that’s ALREADY been compressed, if that makes sense.
I was never critiquing the concept of using compression to solve the inefficiency of base64.