References to geometry of regions of interests inside a PDF with GPT5

I’m processing PDF documents with GPT5 through the Responses API. I’m trying to make the LLM reference specific regions of the PDF by outputting bounding rectangles that wrap the content of interest. However, this is not working: the rectangles are placed incorrectly when mapped back to the PDF. I’m also prompting the LLM to output the text that it’s supposed to be in each rectangles, which shows that there’s no correspondence between content and rectangles.

Is this too much for the current pdf processing capabilities?

And if not: is there any specific documentation on how PDFs are passed to the LLM, especially with regard to the image-based representation that accompanies the text? This would help with debugging. The guide on file-inputs does not go into these details.