Have people tried using vision models to perform PDF rag? What is the type of accuracy you are seeing? Even the latest models arent able to quite read pdf documents without actual text provided (OCR) - or is this a prompting issue?
For some reason it does not allow me to post a link to the run - but below I tried if you want to look at the prompt and tell me if this is a prompting issue
app_promptjudy_com/public-runs?runId=vision-retrieval-augmented-generation-1631582502-gpt-4o%23VMVNNCdEXlmKSWu7uN0ZA
I Send this prompt with 4 images of the links mentioned in the prompt and pretty much all the models do hallucinate on one or more questions. On the other hand, If i send the text of the pages, they all do great… Here is the text only version of the same prompt:
https://app_promptjudy_com/public-runs?runId=retrieval-augmented-generation–1385570120-gpt-4o-mini%23j9LH1lvUmgLQmNM5B22Vo
Below is the performance of vision vs non vision: