OCR of PDF and JPG documents

Snowskier · December 10, 2024, 6:27pm

My team built a poc using aws, wondering the pros and cons of using OpenAI.

We build UI in Salesforce to allow a user to upload a document, passes it to AWS and returns extracted text.

How would this be done with OpenAI and do you magicians have any thoughts on it.

Foxalabs · December 10, 2024, 10:03pm

You can achieve the same thing with a little front end web code and some python on the backend to make a call to the 4o model and it will extract text and pass it back to you in whatever format you specify.

Not a difficult thing to do with the OpenAI API.

Simple upload example here:

import base64
from openai import OpenAI

client = OpenAI()

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Please extract the text from this document image and return the result as a single string, no other output",
        },
        {
          "type": "image_url",
          "image_url": {
            "url":  f"data:image/jpeg;base64,{base64_image}"
          },
        },
      ],
    }
  ],
)

print(response.choices[0])

_j · December 10, 2024, 10:18pm

The above describes passing an image file to an AI model for its vision skill. You then can converse with the AI about what type of analysis to do on the image - describe entities within, provide contextual analysis…or use computer vision to read text.

This requires images, of course. It would need a page or section of a PDF to be rendered out as an image in a cohesive manner programmatically.

Is this an AI-based solution that you describe? Or rather, is it an OCR method that is tailored for PDF documentation extraction, creating searchable text metadata for even images within PDFs (such as scanned documents)?

The OpenAI Assistants endpoint allows PDF documents to be uploaded for document extraction. This only employs pre-existing search text within the document; it cannot analyze images, charts, or documents that are based on scans. You then also can only then interact by a search function, using the PDF’s text as knowledge to inform an AI.

AI image recognition can be part of a highly-integrated product, but I would first rely on proven OCR text recognition technology, such as Adobe Acrobat batch processing, or those open-source tools that also have advanced knowledge of PDF structure and can use Tessaract or other OCR character recognition.

Snowskier · January 3, 2025, 4:37pm

Thank you for that. It would an image of a 4x6inch card, taken with an iphone. The card is structured, and we would want to extract the handwriting. When I use ChatGPT, and upload an image, it does it perfectly. Was hoping to have a similar outcome related to its accuracy. We have dev’s ready to work on it, I’m just trying to educate myself prior to suggesting openAI. We do have Textract (AWS) working…

Topic		Replies	Views
Can an assistant help me with OCR? API gpt-4	7	2803	June 6, 2024
How to Programmatically Extract Text from Images Using GPT-4 API gpt-4 , chatgpt , api , assistants-api	9	5445	October 14, 2024
Best practice scanned PDF / What model to use? API chatgpt , plugin-development , api , gpt-4-vision	3	311	February 19, 2025
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3670	December 6, 2023
How to Process PDF Files with OpenAI's Tools and APIs for Invoice Automation? API api , gpt-4-vision , ocr	1	481	January 15, 2025

OCR of PDF and JPG documents

Related topics