What are your observations about using the "Knowledge" in GPTs?

What is your understanding of how GPTs use the “Knowledge” capability?
How do you think it works?
What are your observations?

Personally, I assumed it was a RAG pipeline behind the scenes. The files we upload get chunked, embedded, etc. shortly after upload.
At least that’s what I assumed! :wink:

But then, as I experimented more and more with it, I discovered a lot of issues:

  • Sometimes simple and straight forward questions yield zero results
  • It feels that the way you write the question (keywords wise) and what’s inside knowledge files should match word to word to some degree (also feels no embeddings taking place)
  • It is very slow
  • It hangs
  • It is hard to steer the LLM to use the knowledge first before it uses its previous training data. It drifts and starts to forget instructions
  • Your knowledge files are exposed to the world. They are in /mnt/data. Turning off code interpreter is not an option

Once you have some more understanding of how GPTs and knowledge work for now, you can make better informed decisions and what to expect.

One day, I was using my GPT and then it was doing its “Searching my knowledge”, then it exposed the search function:

import fitz  # PyMuPDF
import re

# Open the documentation
sdk_doc = fitz.open("/mnt/data/sdk.pdf")

# Function to search for information in the document
def search_document(doc, query, max_pages=5):
    results = []
    for page_num in range(min(max_pages, len(doc))):  # Limiting the number of pages to scan
        page = doc[page_num]
        text = page.get_text()
        if re.search(query, text, re.IGNORECASE):
            results.append((page_num, text))
    return results

# Search for information about enumerating structures and their names
search_results = search_document(sdk_doc, r"enumerate structures|structure name")

search_results

Now the user asked: “how to enumerate structures and their name”

Note how the user’s question got translated to regular search term(s):

search_document(sdk_doc, r"enumerate structures|structure name")

And if this document search function is invoked each time the LLM wants to access the knowledge base, then that explains why it is too slow to respond.

So I feel, for now, OAI implemented ‘Knowledge’ quick and dirty. Later they can perhaps improve it.

1 Like

Very interesting.

One day, I was using my GPT and then it was doing its “Searching my knowledge”, then it exposed the search function:

What do you mean by this? where did you find the code with PyMuPDF?

It glitched and revealed its internals to me. Post self explanator: What do you mean ‘what do you mean’?

There is no RAG, No chunking , no nothing. Plain and simple term search via RegEx :joy:

what do you mean by “glitched and revealed its internals to me”? You mean that chatgpt itself gave you that code? There’s no way source code could be exposed like that.

There’s one way to settle this: for once have OpenAI be helpful on the forums and chime in.

I have done about everything. I have tried altering the documents, using Q&A, Knowledge Graphs, and Straight Text. Nothing seems to work. The knowledge base retrieval is very hit or miss, and mostly, it’s a miss. I could never use this for my company as it is now. I can get my own tools using LlamaIndex to near perfection, but the GPTs are terrible. I really want to use them. I like the look and all that. I am sure they will get there one day but for now, I am going to use my own.

1 Like

Just out of curiosity, how did you use knowledge graphs in your example?

Honestly, it was a file that was uploaded and only one test. I have mostly tried using straight text and Q&A. I find GPTs hit or miss adhering to the instructions as well. It’s not that it never does, but a lot of times it doesn’t. It mostly does not pay attention or limited attention to the knowledge base.