You can achieve the same thing with a little front end web code and some python on the backend to make a call to the 4o model and it will extract text and pass it back to you in whatever format you specify.
Not a difficult thing to do with the OpenAI API.
Simple upload example here:
import base64
from openai import OpenAI
client = OpenAI()
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "path_to_your_image.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please extract the text from this document image and return the result as a single string, no other output",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
],
)
print(response.choices[0])
The above describes passing an image file to an AI model for its vision skill. You then can converse with the AI about what type of analysis to do on the image - describe entities within, provide contextual analysis…or use computer vision to read text.
This requires images, of course. It would need a page or section of a PDF to be rendered out as an image in a cohesive manner programmatically.
Is this an AI-based solution that you describe? Or rather, is it an OCR method that is tailored for PDF documentation extraction, creating searchable text metadata for even images within PDFs (such as scanned documents)?
The OpenAI Assistants endpoint allows PDF documents to be uploaded for document extraction. This only employs pre-existing search text within the document; it cannot analyze images, charts, or documents that are based on scans. You then also can only then interact by a search function, using the PDF’s text as knowledge to inform an AI.
AI image recognition can be part of a highly-integrated product, but I would first rely on proven OCR text recognition technology, such as Adobe Acrobat batch processing, or those open-source tools that also have advanced knowledge of PDF structure and can use Tessaract or other OCR character recognition.
Thank you for that. It would an image of a 4x6inch card, taken with an iphone. The card is structured, and we would want to extract the handwriting. When I use ChatGPT, and upload an image, it does it perfectly. Was hoping to have a similar outcome related to its accuracy. We have dev’s ready to work on it, I’m just trying to educate myself prior to suggesting openAI. We do have Textract (AWS) working…