ChatGPT's OCR capabilities but in the ChatCompletions API

I would like to be able to use OCR/text-extraction of the same quality available through ChatGPT’s attachment feature when using the ChatCompletions API, any advice?

Background:
I tried to build an OCR/text-extraction layer in-front of the ChatCompletion API for uploaded documents, but realised that with complex document layouts OCR/text-extraction gets exponentially hard as I have to reconstruct the document’s original layout for GPT-4o to understand the context of certain pieces of text due to their positioning in the layout.
(Even with the top tools like Google Document AI)

However when I attached these documents on ChatGPT and sent them, it seemed to have perfect understanding of the layout of the document out of the box.

This lead me to think, if I can just attach these documents in the same manner I do with ChatGPT but through the API, It would make things much simpler.

I noticed the Assistant’s API might support the attachment feature, but since the Assistant’s API does not accept fine-tuned models, I cannot use it.

Any advice on passing complex-layout documents with ChatCompletions API?