Analyse bullet questions from image?

rapid1898 · February 2, 2025, 1:36pm

Hello - i try to find the filled out bullets in ths attached questionary using the following code:

 inpModell = "gpt-4o"
  fn = os.path.join(path, "questions.txt")
  with open(fn, encoding="mac_roman", errors="ignore") as f:
    inpQuestion = f.read()

  wImg = os.path.join(path, "inp.jpg")
  wImg = os.path.join(path, "IMAGES", wImg)
  base64_image = encode_image(wImg)
  print(f"Answering with openai...")
  errorFlag = False
  wAnsw3 = "N/A"
  for i in range(4):
    try:
      resp = client.chat.completions.create(
        model = inpModell,
        messages=[
          {
            "role": "user",
            "content": [
              {"type": "text", "text": inpQuestion},
              {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"
      }
              },
          ],
          }
        ],
        max_tokens=1000,
        temperature = 0.5,
      )
      break
    except Exception as e:
      print(f"Error happened: {e}")
      tmpRow = [f"Error {e}"]
      errorFlag = True      
  if not errorFlag:
    wAnswer = resp.choices[0].message.content
  else:
    wAnswer = "N/A"

Using this prompt:

Read as input an image of a bullet selection for questions
There are 3 columns of questions
Every question has a specific number from 1 to 50
And each question has possible bullets with A, B, C and D in that order
Find the filled out bullet for every question and output this as an answer
Every question and the selected bullet should go in a seperate line in the answer

But the answer which i get is more or less completey wrong:

Based on the image, here are the filled-out bullets for each question:

1. A
2. B
3. C
4. D
5. A
6. C
7. B
8. C
9. B
10. A
11. B
12. C
13. B
14. D
15. C
16. B
17. D
18. C
19. A
20. D

Questions 21 to 50 have no filled-out bullets.

Any way to improve that so i get the correct answers using OpenAI?

merefield · February 2, 2025, 1:46pm

I think you should call Francoise Chollet. You’ve invented a nice new benchmark

This is a tough problem. Have you tested accuracy for smaller test sets? Like only 3 questions per page?

Traditional OCR might work better given this is so predictably structured.

Take a look at this paper:

sathishthiru94 · February 4, 2025, 4:05am

Consider trying the OpenAI o1-preview model. While it might take a bit of time, its responses are based on logical reasoning, providing insightful and well-founded answers. Keep in mind, the efficiency and quality of responses can vary, so it’s always a good idea to test them out and see if they meet your needs.

Additionally, experimenting with the OpenAI Assistant could lead to positive results, so consider this too.

merefield · February 4, 2025, 11:54am

Have you evidence o1 is better at this kind of vision recognition?

Intuitively I can’t see how “reasoning” capabilities will help here.

sathishthiru94 · February 4, 2025, 2:51pm

Yes, I’ve tried o1-preview and found it to be more effective than GPT-4o. When you send the image URL along with your query using o1-preview, you’re likely to receive the response you’re looking for. It should work seamlessly.

Topic		Replies	Views
The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis API gpt-4 , chatgpt , gpt-4-vision , gpt4-vision , api-vision	5	1029	July 27, 2024
How to solve the problem that GPT-API cannot read text using OCR? API	19	3625	July 10, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3817	December 6, 2023
Improve image processing with number "1" and "7" API gpt-4-vision	6	188	November 19, 2024
How to Programmatically Extract Text from Images Using GPT-4 API gpt-4 , chatgpt , api , assistants-api	9	7143	October 14, 2024

Analyse bullet questions from image?

Related topics