Issue with Sending Image URL to GPT-4o in Unity

Hi,

I am trying to implement the Open AI GPT-4o vision capabilities in Unity using an OpenAI-Unity pacakage.

Based on the OpenAI documentation, it seems like we can either pass an Image URL or base 64 encoded images to the GPT model. I have a JPG image uploaded to firebase and tried passing the URL to the GPT model.

When I passed the link in the python code snippet provided by Open AI, the model describes the image accurately, here is the snippet:

import openai
import json

# Set your API key
openai.api_key = ""

response = openai.ChatCompletion.create(
    model="gpt-4o",
   messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://firebasestorage.googleapis.com/v0/b/yoloholofirebase.appspot.com/o/frame2.jpg?alt=media&token=1be46bf7-efa4-4398-b914-c47bd777b129",
          },
        },
      ],
    }
  ],
    max_tokens=300,
)

print(response['choices'][0]['message']['content'])

This is the response it returns with the correct description:

However, in Unity, when I tried sending the request to ChatGPT with the same URL, it always gives a wrong description of the image.

This is the method I have for sending the message to the GPT 4o model

public async void SendImageUrlToGPT4(string imageurl)
    {

        var userMessage = new ChatMessage
        {
            Role = "user",
            Content = "[{\"type\": \"text\", \"text\": \"What do you see in this image? Limit yourself to 15 words and do not mention 'image' in your response, inform what you see. \"}, {\"type\": \"image_url\", \"image_url\": {\"url\": \"" + imageurl + "\"}}]"
        };


        messages.Add(userMessage);

        var request = new CreateChatCompletionRequest
        {
            Messages = messages,
            Model = "gpt-4o",
            MaxTokens = 300
        };

        var response = await openAI.CreateChatCompletion(request);

        if (response.Choices != null && response.Choices.Count > 0)
        {
            var chatResponse = response.Choices[0].Message;

            Debug.Log(chatResponse.Content);

            OnResponse.Invoke(chatResponse.Content);

            Debug.Log("Response Finished");
        }
        else
        {
            Debug.LogError("No response from GPT-4 Vision.");
        }
    }

where “messages” is a List containing all ‘ChatMessages’ which takes in the Role as well as the Content of the message. I am not sure as to why it works in python but not in Unity as it seems like it is not receiving the image as expected based on its response. Perhaps, the issue lies in the way the JSON content is being formatted and sent in Unity to ChatGPT. Any insights on how I can correctly send the image URL to GPT-4o in Unity would be greatly appreciated.

Thanks.

There is a mistake with the escaping in the JSONL string following Content =.

Correctly, it should be:

{\"type\": \"image_url\", \"url\": \"" + imageurl + "\"}]"

By replacing this part as follows, the URL should be correctly passed, and you should be able to get the image description from the model.

I hope this helps in some way :slightly_smiling_face:

1 Like

Hi,

I formatted the JSON as suggested and I am still encountering the same problem where the model provides random descriptions.

Here is the final JSON after formatting :

 Content = "[{\"type\": \"text\", \"text\": \"Tell me in max 15 words what do you see in this image?\"}, {\"type\": \"image_url\", \"url\": \"" + imageurl + "\"}]"