How to Use Vision Capabilities with GPT-4 via API?

Hi,

I’m trying to use the vision capabilities of GPT-4 via the API to analyze an image and respond to a prompt, similar to how it works on the ChatGPT website. However, I’ve encountered some issues:

  1. When attempting to use the gpt-4-vision model, I get an error stating I don’t have access, despite having a paid subscription. Here’s the API request that resulted in the permissions error:

     headers = {
         "Content-Type": "application/json",
         "Authorization": f"Bearer {API_KEY}"
     }
     
     payload = {
         "model": "gpt-4-vision",
         "messages": [
             {
                 "role": "user",
                 "content": prompt
             }
         ],
         "image": f"data:image/jpeg;base64,{base64_image}",
         "max_tokens": 300
     }
    
  2. Using the standard gpt-4 model, I send the image as part of the messages, but the responses are inaccurate compared to the ChatGPT website, where I consistently get correct results with the same image and prompt.

Questions:

  • Is the gpt-4-vision model available via API, and how can I access it?
  • Why are the API results with gpt-4 less accurate than the website’s vision capabilities?
  • Are there best practices for working with images in the API to improve accuracy?

I’d appreciate any guidance or clarification on these issues.
Thanks!

Hey, I’m actually working on the same topic right now. When you look at the API key limits, that you need to be on Tier 2 to claim the number of tokens required for something like analyzing an image. At the moment, it seems you can only work with GPT-4o mini for that. Do you know more meanwhile?