Document Q&A on technical documents

am trying to setup an openAl document Q&A. The documents are about technical specifications of a communication protocol (e.g., IEEE protocols). chatgpt (3.5-turbo or 4) has already some knowledge about this protocol and its principles, but does not have the latest updated specifications. I wanted to use chatGPT knowledge along with the technical documents provided to get accurate Q&A.
Each PDF document is divided into sections, subsection, etc… and has a header in every page mentioning the document name and version. The document contains also mathematical expressions, tables, and figures.
So far I used chromdb (lang chain) for vectordb, openAl embeddings, and chatogpt-3.5-turbo as Q&A retriever.

Results are good overall but the referencing is usually wrong (sometimes it gives correct information but the document/section referenced is wrong). And sometimes it does not answer the right way. But for more simple questions it’s usually accurate enough.

-Any idea how to improve the model?
-Does anyone know the best way to load/embed a PDF with math equations, tables, and figures? So far I used PyPDF of lang chain.

1 Like

Welcome to the forum.

Seems like you’ve made a lot of great progress.

While it’s not related to the math equations, you might find this recent thread helpful …

1 Like