I want to create an AI assistant which has access to a repository of PDFs and is able to reference those to provide answers AND provide working links or attachments of any PDFs which it refers to in its answers.
I’ve tried both custom GPTs and the Assistants API (Playground) to create this, and am able to get them to correctly reference the PDFs, but any links provided are incorrect. They either don’t work, linked to another resource from the internet or get error “File not found”.
Anyone found a solution to this? Providing the actual PDF attachments would also be acceptable.
To create an AI assistant that correctly references PDFs and provides working links or attachments, consider setting up a dedicated storage solution, such as AWS S3 or Google Cloud Storage, for your PDFs. Then, ensure your AI retrieves the correct file paths or URLs from this storage. Verify that the URLs are publicly accessible or properly authenticated to avoid “File not found” errors. If attachments are preferred, configure your AI to include the PDFs directly in its responses.
Thanks for your reply! Yes that could well be the direction I need to go, was hoping to be able to validate it via a PoC (by using Assistants on Playground and manually uploading the images) first before committing more to development. So I’m assuming working links not possible via this method?
Regarding attaching files directly, would this be possible from prompting based on people’s experience (haven’t gotten it to work myself), or would this require custom development?
I’d recommend extracting text from PDFs to respective text files and supplying that, especially if you want to build an efficient knowledge base for the assistant.
The reason is that PDF is a pretty complex format where data can be text, scanned text, image, or a mixture of these, and this makes it very difficult to ensure that the assistant can really access the knowledge you want it to use.
I tried using the assistants API which created a vector store with the uploaded PDF files (did this in playground), which the AI now references - not sure if this step already covers the text extraction step or still better to do this separately?
Unfortunately reference links still not working yet, but think I might need to go with @shafique1 's advice and host the files in the cloud and provide the links to the AI somehow (will need to do some tinkering there). Will have a play around over the coming days and post any updates if there is progress.
In the meantime if anyone has solved this use case themselves, would love to hear how you’ve done it!