ChatGPT's OCR capabilities but in the ChatCompletions API

Armouti · September 25, 2024, 5:32am

I would like to be able to use OCR/text-extraction of the same quality available through ChatGPT’s attachment feature when using the ChatCompletions API, any advice?

Background:
I tried to build an OCR/text-extraction layer in-front of the ChatCompletion API for uploaded documents, but realised that with complex document layouts OCR/text-extraction gets exponentially hard as I have to reconstruct the document’s original layout for GPT-4o to understand the context of certain pieces of text due to their positioning in the layout.
(Even with the top tools like Google Document AI)

However when I attached these documents on ChatGPT and sent them, it seemed to have perfect understanding of the layout of the document out of the box.

This lead me to think, if I can just attach these documents in the same manner I do with ChatGPT but through the API, It would make things much simpler.

I noticed the Assistant’s API might support the attachment feature, but since the Assistant’s API does not accept fine-tuned models, I cannot use it.

Any advice on passing complex-layout documents with ChatCompletions API?

Topic		Replies	Views
Extracting Data from ChatGPT API Without Python – Alternatives for SAP Integration? API api	2	236	January 30, 2025
Best Approach For Analyzing Imagery (Written Paper) API gpt-4 , chatgpt , fine-tuning , api , assistants-api	1	115	February 2, 2025
Can an assistant help me with OCR? API gpt-4	7	4109	June 6, 2024
OCR of PDF and JPG documents Community api	3	5387	January 3, 2025
Make OpenAI Vision API Match GPT4 Vision API chatgpt	5	4174	April 24, 2026

ChatGPT's OCR capabilities but in the ChatCompletions API

Related topics