How to Extract Data from Images Using OpenAI API?

Hi everyone,

I’m working on a project where I need to extract structured data (like invoice numbers, dates, vendor names, etc.) from images. I initially explored using the OpenAI API, but I encountered some challenges.

I understand that GPT-4 Vision can handle image inputs, but it appears that this functionality isn’t yet available through the OpenAI API. Is there a way to extract data from images using GPT-4 via the API, or should I use an alternative approach?

Here’s what I’m thinking of doing:

  1. Extract text from the image using an OCR tool (e.g., Google Cloud Vision or Tesseract).
  2. Send the extracted text to the GPT-4 API with a prompt to format and extract the relevant data (like invoice numbers, dates, etc.).

I’d love to know if anyone has:

  • Successfully used GPT-4 for this type of task.
  • Found workarounds or alternative methods for extracting structured data from images.
  • Any updates on when GPT-4 Vision might be available via the API.

Looking forward to your suggestions and advice. Thanks in advance!

Welcome to the forum.

If I understand you correctly then I did that the other day as an example for another problem, it even used an image of an invoice.

Note: This uses ChatGPT but the same should work for the API if the same model is used. Sorry I can not give you any working API code as I do not use that often but this should show that what you seek is doable.

Also check the OpenAI cookbook: https://cookbook.openai.com/