Gpt4 vision api in swift: can not view images directly

berryroad · December 31, 2024, 10:27am

I am trying to create an application that lets gpt analyze a picture taken by the phones camera.
I can connect successfully to the api. But the problem is that I only get the response: I’m sorry, but I can’t view images directly. However, if you describe the image or provide details about the food product, I can help identify it and find out more about it

Here is the code of the function:

 func sendImageToOpenAI(image: UIImage, prompt: String) {
        guard let imageData = image.jpegData(compressionQuality: 0.8) else {
            errorMessage = "Failed to convert image to JPEG data."
            return
        }
        
        
        let url = "https://api.openai.com/v1/chat/completions"
        let apiKey = "not shown" // Replace with your OpenAI API key
        let model = "gpt-4o" // Vision-capable model

        // Encode image to base64
        let base64Image = imageData.base64EncodedString()
        
        let imgurlstring = "data:image/jpeg,base64,\(base64Image)"

        // Prepare the request payload
        let requestBody: [String: Any] = [
            "model": model,
            "messages": [
                ["role": "system", "content": "You are trained to interpret images of food products so that you can identify the product and find out its best before date."],
                ["role": "user", "content": prompt, "image_url": imgurlstring]
            ]
        ]

        // Convert payload to JSON data
        guard let jsonData = try? JSONSerialization.data(withJSONObject: requestBody) else {
            errorMessage = "Failed to serialize JSON payload."
            return
        }

        // Set headers
        let headers: HTTPHeaders = [
            "Authorization": "Bearer \(apiKey)",
            "Content-Type": "application/json"
        ]

        isLoading = true

        // Create the request manually
        var urlRequest = URLRequest(url: URL(string: url)!)
        urlRequest.httpMethod = "POST"
        urlRequest.httpBody = jsonData
        //urlRequest.httpBody = requestBody
        urlRequest.allHTTPHeaderFields = headers.dictionary

        // Use Alamofire to send the request
        AF.request(urlRequest)
            .validate()
            .responseJSON { response in
                isLoading = false

                // Log detailed diagnostics
                print("Request URL: \(url)")
                print("Headers: \(headers)")
                print("Request Body: \(String(data: jsonData, encoding: .utf8) ?? "N/A")")
                print("Response: \(response)")

                switch response.result {
                case .success(let value):
                    if let json = value as? [String: Any],
                       let choices = json["choices"] as? [[String: Any]],
                       let message = choices.first?["message"] as? [String: Any],
                       let content = message["content"] as? String {
                        analysisResult = content
                    } else {
                        analysisResult = "Unexpected response format."
                    }
                case .failure(let error):
                    errorMessage = error.localizedDescription
                    if let data = response.data,
                       let body = String(data: data, encoding: .utf8) {
                        print("Response Body: \(body)")
                    } else {
                        print("No response body available.")
                    }
                }
            }
    }

Do you have any thoughts on what I might be doing wrong?

Diet · December 31, 2024, 10:40am

Welcome to the community!

I don’t think that’s quite correct

this is what the API says: https://platform.openai.com/docs/api-reference/chat/create (click on image input)

"messages": [
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "What'\''s in this image?"
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
        }
      }
    ]
  }
]

It looks like you’re trying to mix the old (simplified) completion message api with the newer multimodal content array message api, that doesn’t really work.

look here:

the user message object doesn’t have an “image_url” property. you need to pass content as an array of content parts, which does have that. It’s a bit confusing and convoluted but that’s how the API evolved over the years…

it should be a fairly straight forward fix tough! good luck!

berryroad · January 3, 2025, 9:48am

Thank you @Diet !

I figured it out and now it’s working. I used the multimodal content as you stated!

For the ones looking for the some problem here some swift code:

func sendImageToOpenAI(image: UIImage, prompt: String) {
        guard let imageData = image.jpegData(compressionQuality: 0.8) else {
            errorMessage = "Failed to convert image to JPEG data."
            return
        }
        
        
        let url = "https://api.openai.com/v1/chat/completions"
        let apiKey = "YOUR_API_KEY" // Replace with your OpenAI API key
        let model = "gpt-4o-mini" // Vision-capable model

        // Encode image to base64
        let base64Image = imageData.base64EncodedString()
        
        let imgurlstring = "data:image/jpeg,base64,\(base64Image)"

        // Prepare the request payload
        let requestBody: [String: Any] =
        [
            "model": model,
            "messages":
            [
                [
                    "role": "system",
                    "content": [
                        [
                        "type": "text",
                        "text": "YOUR SYSTEM PROMPT",
                        ]
                    ]
                ],
                [
                    "role": "user",
                    "content": [
                        [
                            "type": "text",
                            "text": prompt
                        ],
                        [
                            "type": "image_url",
                            "image_url": [
                                "url": "\(imgurlstring)",
                                "detail": "low"
                                ],
                        ]
                    ]
                ]
            ]
        ]


        // Convert payload to JSON data
        guard let jsonData = try? JSONSerialization.data(withJSONObject: requestBody) else {
            errorMessage = "Failed to serialize JSON payload."
            return
        }

        // Set headers
        let headers: HTTPHeaders = [
            "Authorization": "Bearer \(apiKey)",
            "Content-Type": "application/json"
        ]

        var urlRequest = URLRequest(url: URL(string: url)!)
        urlRequest.httpMethod = "POST"
        urlRequest.httpBody = jsonData
        urlRequest.allHTTPHeaderFields = headers.dictionary

        // Use Alamofire to send the request
        AF.request(urlRequest)
            .validate()
            .responseJSON { response in
                isLoading = false
                switch response.result {
                case .success(let value):
                    if let json = value as? [String: Any],
                       let choices = json["choices"] as? [[String: Any]],
                       let message = choices.first?["message"] as? [String: Any],
                       let content = message["content"] as? String {
                        self.parseResponse(content)
                       } else {
                           self.analysisResult = "Failed to process content."
                       }
                    
                case .failure(let error):
                    errorMessage = error.localizedDescription
                    if let data = response.data,
                       let body = String(data: data, encoding: .utf8) {
                        print("Response Body: \(body)")
                    } else {
                        print("No response body available.")
                    }
                }
            }
    }

Topic		Replies	Views
Gpt-4o not reading images API	0	215	July 13, 2024
Error Sending Base64 Image Data in API Request to GPT-4-vision Model API api , gpt-4-vision	7	18822	December 19, 2023
Swift + gpt-4-vision-preview API Base64 Imag error, unsupported image API gpt-4-vision	2	1547	December 10, 2023
Multiple image analysis using gpt-4o API gpt-4-vision	10	336	December 10, 2024
Uploading images to the ChatGPT API? API	5	2941	November 15, 2024

Gpt4 vision api in swift: can not view images directly

Related topics