Hi everyone,
I’m working on a project where I need to extract structured data (like invoice numbers, dates, vendor names, etc.) from images. I initially explored using the OpenAI API, but I encountered some challenges.
I understand that GPT-4 Vision can handle image inputs, but it appears that this functionality isn’t yet available through the OpenAI API. Is there a way to extract data from images using GPT-4 via the API, or should I use an alternative approach?
Here’s what I’m thinking of doing:
- Extract text from the image using an OCR tool (e.g., Google Cloud Vision or Tesseract).
- Send the extracted text to the GPT-4 API with a prompt to format and extract the relevant data (like invoice numbers, dates, etc.).
I’d love to know if anyone has:
- Successfully used GPT-4 for this type of task.
- Found workarounds or alternative methods for extracting structured data from images.
- Any updates on when GPT-4 Vision might be available via the API.
Looking forward to your suggestions and advice. Thanks in advance!