I started experimenting with creating my own GPTs. I purely configure by hand.
I uploaded 2 PDFs with a total of 3mb size.
The system prompt instructs the model to first and foremost always consult the knowledge before answering.
70% of the times, after I type my query, the model hangs on “Searching my knowledge”, generates nothing at all, or just errors out: “Message not found in conversation” error.
A bunch of questions apart from this report:
Why the slowdown when querying with retrieval enabled?
How much time, in the background, does it take for my documents to be considered fully indexed, embedded, etc. before I am able to fully query them via my custom GPT?
What kind of chunking is happening behind the scenes?
Since, OAI’s retrieval is a blackbox for us, what is the best advise to format our documents to be consumed by the retrieval tool?
My GPT is searching the PDFs I uploaded, and so far it is not crashing - but it doesn’t seem to be very good at actually finding stuff. In a few cases, its responses reflect the content of the knowledge base I uploaded (a manual for artistic research), but more often it says it can’t find sections or topics which are most certainly in the PDFs I uploaded. Could this be related to the formatting of these documents? Are there ways to optimize the PDFs for searching by my GPT?
Yeah, i have the same issue with regards to PDF’s as knowledge. Either the search takes a very long time and provides only sections of the pdf. Most of the time though it will reply with “message in conversation not found”, or replies it’s not able to search it’s knowledgebase. In the edit section I will have it check if it has access to the documents and it will confirm it does. However if I ask to give me the first 5 sentences of each document it will go into 'message in conversation not found.
Ok, I may not be the first, but it seems all our issues stem from the fact that: we are making assumptions.
Who told us that OAI does embeddings at all? Or uses vector dbs or even goes through the trouble of implementing RAG.
Not sure how to repro this, but I caught OAI emit some code as with it was “Searching my knowledge”.
It showed me its search() function
import fitz # PyMuPDF
# Open the documentation
sdk_doc = fitz.open("/mnt/data/sdk.pdf")
# Function to search for information in the document
def search_document(doc, query, max_pages=5):
results = 
for page_num in range(min(max_pages, len(doc))): # Limiting the number of pages to scan
page = doc[page_num]
text = page.get_text()
if re.search(query, text, re.IGNORECASE):
# Search for information about enumerating structures and their names
search_results = search_document(sdk_doc, r"enumerate structures|structure name")
Now the user asked: “how to enumerate structures and their name”
Note how the user question got translated to a regular term(s) search: