Analyse bullet questions from image?

Hello - i try to find the filled out bullets in ths attached questionary using the following code:

 inpModell = "gpt-4o"
  fn = os.path.join(path, "questions.txt")
  with open(fn, encoding="mac_roman", errors="ignore") as f:
    inpQuestion = f.read()

  wImg = os.path.join(path, "inp.jpg")
  wImg = os.path.join(path, "IMAGES", wImg)
  base64_image = encode_image(wImg)
  print(f"Answering with openai...")
  errorFlag = False
  wAnsw3 = "N/A"
  for i in range(4):
    try:
      resp = client.chat.completions.create(
        model = inpModell,
        messages=[
          {
            "role": "user",
            "content": [
              {"type": "text", "text": inpQuestion},
              {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"
      }
              },
          ],
          }
        ],
        max_tokens=1000,
        temperature = 0.5,
      )
      break
    except Exception as e:
      print(f"Error happened: {e}")
      tmpRow = [f"Error {e}"]
      errorFlag = True      
  if not errorFlag:
    wAnswer = resp.choices[0].message.content
  else:
    wAnswer = "N/A"

Using this prompt:

Read as input an image of a bullet selection for questions
There are 3 columns of questions
Every question has a specific number from 1 to 50
And each question has possible bullets with A, B, C and D in that order
Find the filled out bullet for every question and output this as an answer
Every question and the selected bullet should go in a seperate line in the answer

But the answer which i get is more or less completey wrong:

Based on the image, here are the filled-out bullets for each question:

1. A
2. B
3. C
4. D
5. A
6. C
7. B
8. C
9. B
10. A
11. B
12. C
13. B
14. D
15. C
16. B
17. D
18. C
19. A
20. D

Questions 21 to 50 have no filled-out bullets.

Any way to improve that so i get the correct answers using OpenAI?

3 Likes

I think you should call Francoise Chollet. You’ve invented a nice new benchmark :+1:

This is a tough problem. Have you tested accuracy for smaller test sets? Like only 3 questions per page?

Traditional OCR might work better given this is so predictably structured.

Take a look at this paper:

3 Likes

Consider trying the OpenAI o1-preview model. While it might take a bit of time, its responses are based on logical reasoning, providing insightful and well-founded answers. Keep in mind, the efficiency and quality of responses can vary, so it’s always a good idea to test them out and see if they meet your needs.

Additionally, experimenting with the OpenAI Assistant could lead to positive results, so consider this too.

Have you evidence o1 is better at this kind of vision recognition?

Intuitively I can’t see how “reasoning” capabilities will help here.

Yes, I’ve tried o1-preview and found it to be more effective than GPT-4o. When you send the image URL along with your query using o1-preview, you’re likely to receive the response you’re looking for. It should work seamlessly.

1 Like