OCR of PDF and JPG documents

You can achieve the same thing with a little front end web code and some python on the backend to make a call to the 4o model and it will extract text and pass it back to you in whatever format you specify.

Not a difficult thing to do with the OpenAI API.

Simple upload example here:

import base64
from openai import OpenAI

client = OpenAI()

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Please extract the text from this document image and return the result as a single string, no other output",
        },
        {
          "type": "image_url",
          "image_url": {
            "url":  f"data:image/jpeg;base64,{base64_image}"
          },
        },
      ],
    }
  ],
)

print(response.choices[0])
3 Likes