Document understanding (extract info in json format from erp document like invoices etc..)

Hi to all,
I tried document info extraction manually from chatgpt web app and gpt4o model. I saw good result. Than I tried to replicate that result via api but didn’t find the proper way to preprocess the PDF files prior to feeding its content to llm. The question is, what kind of preprocessing the system do when I upload the pdf document via chatgpt web interface ? How can I replicate the very same result via API ?
Thank’s
Maurizio