Hello everyone,
I am trying to use a ChatGPT Assistant to extract very precise information from PDF catalogs related to training courses.
During my tests, I noticed that when ChatGPT works directly with PDF files, it tends to invent information about the courses. To improve accuracy, I converted the PDFs into text files and organized each course on a different numbered page, separated by dashed lines. This has partially solved the problem: if the course actually exists in the catalog, the system often extracts the information correctly, even indicating the exact page. However, if the course does not exist in the catalog, the system tends to invent details.
I would like to know if anyone has had similar experiences or if there is a recommended approach to further improve the accuracy of information extraction in scenarios like this. What best practices could you suggest to ensure that the Assistant does not “invent” information not present in the original documents?
Thank you for your attention and any advice!
Ciro