Vision API - Through Azure Blind or what am I missing?

momchrysti · March 1, 2024, 8:37pm

HI All,
*** Update: when sending in Base64 image data, you have to specify the file type in small case like png, jpg etc It won’t see the image with the wrong file type or if in capital letters

I’m setting up GPT-4 with Vision API through Azure. I’m sending the file in as base64 but the hallucinations are out of control and I’m not getting the same results. It’s not working at all like it does in the ChatGPT Plus in the UI.

I do get a response and I’m able to parse everything. I set the detail to high but it still doesn’t work as well. I suspect that this needs to be paired with an OCR to extract the text then send that in with the image or is this overkill? I’ve tried different image formats and i’ve even converted it to pdf but nothing is working to get it to see my image as clearly as it does in the chatgpt UI.

Please help, what am I missing?

Note: I did notice chatgpt appears to use tesseract, is it this required to get the vision to work well?

sps · March 1, 2024, 8:46pm

How are you implementing this? More info like API call with the params, response received and other specific details can help understand why that’s happening.

momchrysti · March 3, 2024, 6:52pm

Hi, its an API call using the Azure endpoint. It’s a lot like this but I have detail: high also in the parameters:

payload = {
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}
```  *** Update: when sending in Base64 image data, you have to specify the file type in small case like png, jpg etc  It won't see the image with the wrong file type or if in capital letters


I know that it's getting the image because when I sent in a simple image of a cat it responded that it's a cat and the color. But when I give it an image of an order form it hallucinates and provides incorrect information that is not on the order form. When I upload that same order form in the chatgpt playground, the response is exactly correct with the information from the order form.  Basically, I'm working on an automation project with order forms that are 'noisy' and not easily read by other OCRs. I know this vision feature can see these types of forms but not through this API? Or do I have to pair it with another text extracting OCR to make it work correctly? OR is is that the image type 'jpg' and it doesn't read jpg well?

sps · March 4, 2024, 10:47am

Can you also share a sample image that’s resulting in the problem?

Topic		Replies	Views
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3551	December 6, 2023
Getting data from other peoples images on vision API Bugs gpt-4	1	66	August 17, 2024
GPT-4 omni text recognition via API works worse than on chatgpt.com API gpt-4 , api	4	1072	August 13, 2024
How to extract text from images using API? API gpt-4	2	124	January 31, 2025
Azure OpenAI Platform Vs API API gpt4-vision , azure	0	528	March 31, 2024

Vision API - Through Azure Blind or what am I missing?

Related topics