Hello everyone,
So i got some pdfs containing tables that i want to extract, those tables are hard to extract using libraries like pypdf. So i thought about using OCR, i used Chatgpt (4o) and it performed well. So i tried to use the API and do the same:
def encode_image(pil_image):
buffered = BytesIO()
pil_image.save(buffered, format="JPEG")
return base64.b64encode(buffered.getvalue()).decode("utf-8")
def gpt_extract_text_from_image(image):
base64_image = encode_image(image)
response = client.responses.create(
model="gpt-4o",
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": (
"Extract the full table from this image with all data, including headers.\n"
"You are free to format the output as you see fit for clarity. Just make sure all data is readable and aligned correctly."
),
},
{
"type": "input_image",
"image_url": f"data:image/jpeg;base64,{base64_image}",
},
],
}
],
)
return response.output_text
pdf_path = "mypdf.pdf"
images = convert_from_path(pdf_path, dpi=700)
first_image = images[1]
extracted_text = gpt_extract_text_from_image(first_image)
print(extracted_text)
Except that the output contained a lot of errors, I tried making the resolution higher (I used a DPI of 1200, which results in a very high resolution). I dont understand the difference between chatgpt and the API ? And can you suggest any other alternatives ?