Uploading files through ChatGPT - Under the Hood

When I upload a file (say a pdf) through the UI (ChatGPT4) and ask questions about it, how does that actually work? Is RAG (retrieval augmented generation) occurring behind the scenes? Or is it just that the multimodal model can interpret the pdf? Curious how that works “under the hood”