Issue while Fine-Tuning GPT-4o with Base64 Images

Hi everyone,

I’m encountering an issue while trying to fine-tune GPT-4o with base64 images. The preprocessing summary indicates that all 10 examples were skipped due to invalid image modes. Here are the details:

  • Error Message: status : Training file: Preprocessing Summary: File contains 10 examples with images that were skipped for the following reasons: invalid image mode (10). These examples will not be used for training. Please visit our docs to learn how to resolve these issues. The data is invalid because there are 0 valid examples in the file. Details - Samples of lines per error type: invalid image mode: Line numbers → 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Has anyone else encountered this issue? Any suggestions on how to resolve it would be greatly appreciated. I’ve checked the documentation but couldn’t find a solution.

Thanks in advance!

Here’s what I would do: Send that same JSON as body of a message to the API, without use of an OpenAI SDK, but without the final assistant message.

See if you get an AI response that uses vision to respond.

“invalid image mode” - could it be something about the construction of the image object as part of a user (only) role message, and its detail parameter?

1 Like

Hello @_j ,

Thank you for your quick response. I have tested it

Here’s what I would do: Send that same JSON as body of a message to the API, without use of an OpenAI SDK, but without the final assistant message.
See if you get an AI response that uses vision to respond.

and it works!

Best regards

1 Like

I’m not sure if this is deserving a “solution” yet or if it is solved.

Sending your example to chat completions shows that you are constructing user messages correctly with a base64 image part. It doesn’t verify that the fine-tuning isn’t broken with some unexpected message about invalid modes.

I whacked together some 1-shot Python code for testing an example, and only after the call works on chat completions do we add it to a training file, along with how the AI should be responding.

Let’s import the support and write a function to get your environment variable API key.

import httpx
import json
import base64
import os

def _get_headers() -> dict[str, str]:
    if not os.getenv('OPENAI_API_KEY'):
        raise ValueError("Please set the OPENAI_API_KEY environment variable.")
    return {'Authorization': f'Bearer {os.getenv("OPENAI_API_KEY")}'}

Then some functions to make the input messages, supposing that you just want one image to be answered about. I have a set system message, but a user message function that accepts text and an image path:

def _system_message():
    return [
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are a GPT-4 vision AI model. Analyze user images."
                },
            ],
        },
    ]

def _user_image_message(text, image_path):
    with open(image_path, "rb") as image_file:
        base64_encoded_image = base64.b64encode(image_file.read()).decode("utf-8")
    content_image_part = {
        "type": "image_url",
        "image_url": {
            "url": f"data:image/png;base64,{base64_encoded_image}",
            "detail": "low",
        },
    }
    return [{
        "role": "user",
        "content": [
            {"type": "text", "text": text},
            content_image_part,
        ],
    }]

Both return a list of one message, making easily-joinable lists.

Here’s making the minimal chat completions API call without using the OpenAI library, but instead just streaming an object as JSON with the httpx library (which openai also uses):

def chat_completions(message_list):
    request_body = {
        "model": "gpt-4o-mini",
        "messages": message_list,
        "max_tokens": 100
    }
    response = httpx.post(
        "https://api.openai.com/v1/chat/completions",
        headers=_get_headers(),
        json=request_body,
        timeout=120,
    )
    response.raise_for_status()
    return response.json()["choices"][0]["message"]["content"]

So we can now input a user message with an image, and send it off to chat completions. Let’s do that:

if __name__ == "__main__":
    user_message = "What's in the image?"
    image_filename = "img1.png"
    assistant_example = "The image contains the word \"Apple\" written in a simple, black font."

    messages = _system_message() + _user_image_message(user_message, image_filename)

    try:
        # Make the test API call.
        response_content = chat_completions(messages)
        print(response_content)
        
        # Build the training data with the identical messages plus the desired assistant example.
        training_data = {
            "messages": messages + [{"role": "assistant", "content": assistant_example}]
        }
        
        # Append the training data as a single JSON line to the fine-tuning file.
        with open("mytraining.jsonl", "a", encoding="utf-8") as f:
            f.write(json.dumps(training_data) + "\n")
            
    except httpx.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"An error occurred: {err}")

Was it successful as a “test call” to get a response? If so, I added that input and how you want the AI to actually be responding as a different example behavior as a line to a JSONL (discarding the test response).

See if that isn’t constructing the same type of JSONL that you’re currently using as your training file format for fine-tuning.

Most importantly, the minimum ten examples shows that fine-tuning now works, but it rarely is enough to actually make a better model. Using a fine-tuned model also costs more … you can invest that cost of a call into prompting instead, and see if you can just talk your way into the results you want without a huge investment in developing an ample training set.

1 Like