What are your observations about using the "Knowledge" in GPTs?

What is your understanding of how GPTs use the “Knowledge” capability?
How do you think it works?
What are your observations?

Personally, I assumed it was a RAG pipeline behind the scenes. The files we upload get chunked, embedded, etc. shortly after upload.
At least that’s what I assumed! :wink:

But then, as I experimented more and more with it, I discovered a lot of issues:

  • Sometimes simple and straight forward questions yield zero results
  • It feels that the way you write the question (keywords wise) and what’s inside knowledge files should match word to word to some degree (also feels no embeddings taking place)
  • It is very slow
  • It hangs
  • It is hard to steer the LLM to use the knowledge first before it uses its previous training data. It drifts and starts to forget instructions
  • Your knowledge files are exposed to the world. They are in /mnt/data. Turning off code interpreter is not an option

Once you have some more understanding of how GPTs and knowledge work for now, you can make better informed decisions and what to expect.

One day, I was using my GPT and then it was doing its “Searching my knowledge”, then it exposed the search function:

import fitz  # PyMuPDF
import re

# Open the documentation
sdk_doc = fitz.open("/mnt/data/sdk.pdf")

# Function to search for information in the document
def search_document(doc, query, max_pages=5):
    results = []
    for page_num in range(min(max_pages, len(doc))):  # Limiting the number of pages to scan
        page = doc[page_num]
        text = page.get_text()
        if re.search(query, text, re.IGNORECASE):
            results.append((page_num, text))
    return results

# Search for information about enumerating structures and their names
search_results = search_document(sdk_doc, r"enumerate structures|structure name")

search_results

Now the user asked: “how to enumerate structures and their name”

Note how the user’s question got translated to regular search term(s):

search_document(sdk_doc, r"enumerate structures|structure name")

And if this document search function is invoked each time the LLM wants to access the knowledge base, then that explains why it is too slow to respond.

So I feel, for now, OAI implemented ‘Knowledge’ quick and dirty. Later they can perhaps improve it.

1 Like

Very interesting.

One day, I was using my GPT and then it was doing its “Searching my knowledge”, then it exposed the search function:

What do you mean by this? where did you find the code with PyMuPDF?

It glitched and revealed its internals to me. Post self explanator: What do you mean ‘what do you mean’?

There is no RAG, No chunking , no nothing. Plain and simple term search via RegEx :joy:

what do you mean by “glitched and revealed its internals to me”? You mean that chatgpt itself gave you that code? There’s no way source code could be exposed like that.

There’s one way to settle this: for once have OpenAI be helpful on the forums and chime in.