I have a ChatGPT account, and I created a private custom GPT by uploading a few PDF files. The resulting accuracy is excellent.
However, when I use OpenAI’s API, perform chunking, and create a RAG system based on the same PDF files, the accuracy of my RAG system is far lower compared to OpenAI’s custom GPTs.
Is there a way to find out how ChatGPT creates a RAG for custom GPTs so that I can replicate something similar?
What is the algorithm or process behind ChatGPT’s custom GPTs?
The AI has a tool it can call with a search query, rather than embeddings being run on user input or input context.
The return format, placement of ranked chunks, is not disclosed, but one thing ChatGPT reserves for itself is the use of source file names, such as giving the search file names available (which are of limited count in ChatGPT) and also file names where chunks were returned from.
The first step is ensuring you have high-quality document extraction. PDFs are not great for obtaining a text input format for AI comprehension, and companies are built on doing this.