OCR of PDF and JPG documents

Foxalabs · December 10, 2024, 10:03pm

You can achieve the same thing with a little front end web code and some python on the backend to make a call to the 4o model and it will extract text and pass it back to you in whatever format you specify.

Not a difficult thing to do with the OpenAI API.

Simple upload example here:

import base64
from openai import OpenAI

client = OpenAI()

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Please extract the text from this document image and return the result as a single string, no other output",
        },
        {
          "type": "image_url",
          "image_url": {
            "url":  f"data:image/jpeg;base64,{base64_image}"
          },
        },
      ],
    }
  ],
)

print(response.choices[0])

Topic		Replies	Views
Can an assistant help me with OCR? API gpt-4	7	3344	June 6, 2024
How to Programmatically Extract Text from Images Using GPT-4 API gpt-4 , chatgpt , api , assistants-api	9	6998	October 14, 2024
Best practice scanned PDF / What model to use? API chatgpt , plugin-development , api , gpt-4-vision	3	808	February 19, 2025
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3809	December 6, 2023
How to Process PDF Files with OpenAI's Tools and APIs for Invoice Automation? API api , gpt-4-vision , ocr	1	822	January 15, 2025

OCR of PDF and JPG documents

Related topics