I have implemented search with OpenAI APIs .
I am opening up source code to solicit ideas and helps from the community.
- Any suggestions on building some impactful apps based on this code?
- How to improve the way to extract contents from a document, specifically for table contents, multi-column documents, diagrams.
- Any other ideas and critiques on this project?
What this program does is to scan web pages (either PDF or HTML), extract contents with original document structure (in case of HTML, it’s h1, h2, h3) and put them into a Pandas dataframe as local knowledgebase, then try to respond to user questions from local knowledgebase first. If none is found, fall back to OpenAI. More details in Github.