I mean GPT4o is multi modal. It can take images as well as pdf files I guess.
So you base 64 encode the file and send it.
Of couse you should also change the prompt to something you are doing in ChatGPT normally.
But if you want to save on API cost I would suggest to use something like ghostscript to split the PDF in single tiff files and pytesseract to convert the PDF to hocr (in a loop over each tiff).
And then use GPT-3.5 with a prompt like
give me markdown from this hocr:
[hocr]