So, using ChatGPT with the GPT4 model I’ve managed to get great results with manual transcription of handwritten historic documents. I’d now like to scale up and automate this process, so to test it I implemented the following code in line with the API examples:
import os
from openai import OpenAI
import base64
import mimetypes
client = OpenAI(api_key='apikey)
def image_to_base64(image_path):
# Guess the MIME type of the image
mime_type, _ = mimetypes.guess_type(image_path)
if not mime_type or not mime_type.startswith('image'):
raise ValueError("The file type is not recognized as an image")
# Read the image binary data
with open(image_path, 'rb') as image_file:
encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
# Format the result with the appropriate prefix
image_base64 = f"data:{mime_type};base64,{encoded_string}"
return image_base64
def transcribe_image(image_path):
base64_string = image_to_base64(image_path)
# Make an API call to submit the image for transcription
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Manually transcribe this handwriting"},
{
"type": "image_url",
"image_url": {
"url": base64_string,
"detail": "low"
}
},
],
}
],
max_tokens=300,
)
# Print the transcription result
print(response)
# Example usage
image_path = 'testimage.png'
transcribe_image(image_path)
and I used an example image of handwriting that ChatGPT had been able to extract the text from very well. However, when submitting it through the API call I get text returned that doesn’t even remotely resemble the text in the image. For example, the text in the image starts:
"A report of the By-law committee, and brought up by the Secty, was explained, that every brother would have a rough copy sent to him, before the next meeting "
and this is the response I get from the API:
ChatCompletionMessage(content='Sure, here is the transcription of the handwritten text:\n\nThe great orchard & lawn extended to the foot of a pretty precipitous hill, from one extremity of the house, north, it being rectangular, oblong, highly cultivated, and more proper of a rich soil, and when in bloom was indeed a charming sight. On its west side was the large vegetable garden, walled in, and to the west of that, still further, was a large pasture with some fine cattle, silky kine, which you know, I always liked and enjoyed, as well as a pretty piece of water, say pond, near the root of the hill, running to the N E or nearly so; - and in the pond, the beautiful White and other swans healthily, happily, quacking for the plentiful supply of excellent food given them, from which they were never distant, but when they chose to sail twenty yards to the opposite side, beautiful to look at;\nFrom the S E corner of the house, the view was enchanting, bounded by the mountain appropriate the Helderberg, but nearer to us the fruitful, highly cultivated beauteous farm, under high cultivation, and not the least agreeable, was the neat water sawmill with the small pond before the door and the larger one back of it; and still further east the splendid water flouring mill, when, if ever, happier days should again come to us, would have been for sale, whilst to', role='assistant', function_call=None, tool_calls=None))], created=1714403295, model='gpt-4-1106-vision-preview', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=300, prompt_tokens=98, total_tokens=398))
Am I somehow doing something incredibly stupid here or missing the point? This is my first time using the OpenAI API set and I would really appreciate any advice. Thank you in advance!