Hi,
I’m trying to use the vision capabilities of GPT-4 via the API to analyze an image and respond to a prompt, similar to how it works on the ChatGPT website. However, I’ve encountered some issues:
-
When attempting to use the
gpt-4-vision
model, I get an error stating I don’t have access, despite having a paid subscription. Here’s the API request that resulted in the permissions error:headers = { "Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}" } payload = { "model": "gpt-4-vision", "messages": [ { "role": "user", "content": prompt } ], "image": f"data:image/jpeg;base64,{base64_image}", "max_tokens": 300 }
-
Using the standard
gpt-4
model, I send the image as part of themessages
, but the responses are inaccurate compared to the ChatGPT website, where I consistently get correct results with the same image and prompt.
Questions:
- Is the
gpt-4-vision
model available via API, and how can I access it? - Why are the API results with
gpt-4
less accurate than the website’s vision capabilities? - Are there best practices for working with images in the API to improve accuracy?
I’d appreciate any guidance or clarification on these issues.
Thanks!