How to Interpret Images in OpenAI GPT-4 API with External Links?

Hello,

I’m trying to use the OpenAI GPT-4 API to interpret images provided as external URLs, specifically screenshots related to Autodesk Revit issues. My application sends a problem description and a screenshot link to the API, but the response indicates that the image cannot be processed.

Here is the code snippet I am using:

private async Task<string> SendMessageToGPT(string description, string imageUrl)
{
    using (HttpClient client = new HttpClient())
    {
        client.DefaultRequestHeaders.Add("Authorization", apiKey);

        var content = new
        {
            model = "gpt-4o-mini",
            messages = new[]
            {
                new { role = "system", content = "You are a Revit Support Assistant, solving issues related to Autodesk Revit." },
                new { role = "user", content = $"Here is a problem description: {description}. Also, take a look at this screenshot: {imageUrl}. Interpret the screenshot and provide a detailed response." }
            }
        };

        string jsonContent = JsonConvert.SerializeObject(content);
        HttpContent httpContent = new StringContent(jsonContent, Encoding.UTF8, "application/json");

        HttpResponseMessage response = await client.PostAsync("https://api.openai.com/v1/chat/completions", httpContent);
        string responseBody = await response.Content.ReadAsStringAsync();

        dynamic result = JsonConvert.DeserializeObject(responseBody);

        if (result?.choices != null && result.choices.Count > 0)
        {
            return result.choices[0].message.content;
        }
        else
        {
            return "There was an issue with the API response. Please check the response structure.";
        }
    }
}

Unfortunately, I keep receiving a message indicating that images cannot be processed.

I understand that the GPT-4 model may not directly support image analysis through a link in a text-based request. Is there a specific way to send an image (via URL or otherwise) for it to be interpreted by the API? Is there another OpenAI model or API endpoint that allows image processing or should I be using a different approach?

Any guidance on how to properly integrate image recognition within the GPT-4 API or related services would be greatly appreciated.

Thank you for your time!

Welcome to the community!

I think the answer you’re looking for is here:


https://platform.openai.com/docs/api-reference/chat/create

Note how the message content object can also be an array of content parts instead of just a string:

      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What'\''s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://uplo...k.jpg"
          }
        }
      ]

this allows you to interleave text and images within a single message.

And yeah, you can indeed send an image url, as long as it’s retrievable by the server. Alternatively, b64 url encoding works too.

IMO they did a pretty good job with this API design here.

Let us know if you hit any further snags!


PS:

If you prefer a more tutorial style guide instead of the API reference, you can also check out this: https://platform.openai.com/docs/guides/vision