OCR using API for text extraction

I have a PDF. It’s all images. I want to use ChatGPT’s API to perform OCR on it and extract the text in to a .json file and a .txt file. Is the OCR only for the customers who have take the subscription ? What model should I use? If anyone knows any blogs, youtube videos or github repos about this, please let me know.

I’ve been doing some experiments for a client project: in order to do that you need to use a workflow like this.

  • extract each page of the pdf as single images. You can use some tools online or you can use python libraries.
  • check the image resolution: for hight quality scans i’ve seen good result at 2000 pixel of height. For a manual which has low quality images, i had to resize them at 3500 pixel of height.
  • Process each image with the APIs. The Vision APi will extract the text and describe illustrations (something a normal OCR can’t do!)
  • Save the resulting chat a single text file.
  • you can also ask for a structured output in json format, save it and then use python to save a text version.
  • Loop the process for each image.
  • And that’s it.

It’s not quick, for about 50 pages it took 12 minutes, but the result was excellent. The cost was about 1 dollar.