How to Interpret Images in OpenAI GPT-4 API with External Links?

kieselalge-balsamico · September 18, 2024, 6:19am

Hello,

I’m trying to use the OpenAI GPT-4 API to interpret images provided as external URLs, specifically screenshots related to Autodesk Revit issues. My application sends a problem description and a screenshot link to the API, but the response indicates that the image cannot be processed.

Here is the code snippet I am using:

private async Task<string> SendMessageToGPT(string description, string imageUrl)
{
    using (HttpClient client = new HttpClient())
    {
        client.DefaultRequestHeaders.Add("Authorization", apiKey);

        var content = new
        {
            model = "gpt-4o-mini",
            messages = new[]
            {
                new { role = "system", content = "You are a Revit Support Assistant, solving issues related to Autodesk Revit." },
                new { role = "user", content = $"Here is a problem description: {description}. Also, take a look at this screenshot: {imageUrl}. Interpret the screenshot and provide a detailed response." }
            }
        };

        string jsonContent = JsonConvert.SerializeObject(content);
        HttpContent httpContent = new StringContent(jsonContent, Encoding.UTF8, "application/json");

        HttpResponseMessage response = await client.PostAsync("https://api.openai.com/v1/chat/completions", httpContent);
        string responseBody = await response.Content.ReadAsStringAsync();

        dynamic result = JsonConvert.DeserializeObject(responseBody);

        if (result?.choices != null && result.choices.Count > 0)
        {
            return result.choices[0].message.content;
        }
        else
        {
            return "There was an issue with the API response. Please check the response structure.";
        }
    }
}

Unfortunately, I keep receiving a message indicating that images cannot be processed.

I understand that the GPT-4 model may not directly support image analysis through a link in a text-based request. Is there a specific way to send an image (via URL or otherwise) for it to be interpreted by the API? Is there another OpenAI model or API endpoint that allows image processing or should I be using a different approach?

Any guidance on how to properly integrate image recognition within the GPT-4 API or related services would be greatly appreciated.

Thank you for your time!

Diet · September 18, 2024, 6:41am

Welcome to the community!

I think the answer you’re looking for is here:

https://platform.openai.com/docs/api-reference/chat/create

Note how the message content object can also be an array of content parts instead of just a string:

      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What'\''s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://uplo...k.jpg"
          }
        }
      ]

this allows you to interleave text and images within a single message.

And yeah, you can indeed send an image url, as long as it’s retrievable by the server. Alternatively, b64 url encoding works too.

IMO they did a pretty good job with this API design here.

Let us know if you hit any further snags!

PS:

If you prefer a more tutorial style guide instead of the API reference, you can also check out this: https://platform.openai.com/docs/guides/vision

Topic		Replies	Views
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3930	December 6, 2023
Need Assistance with ChatGPT-Image Integration API	2	1197	December 19, 2023
API Call not viewing images i upload / from url API gpt-4 , api , image-reading	3	512	July 17, 2024
What are the APIs for image analysis? API gpt-4 , api	2	10443	May 17, 2024
Using "gpt-4-vision-preview" for Image Interpretation from an Uploaded Folder API gpt-4	2	2386	November 10, 2023

How to Interpret Images in OpenAI GPT-4 API with External Links?

Related topics