I am trying to add a lot of documentation for a platform into one GPT to help me work better.
I’ve been gradually adding to a master Markdown text document, and periodically I would test it in the GPT I am building. Recently I have come to point where the GPT would just error and not respond at all…
I started to do some troubleshooting and it worth noting that I am under the assumption that GPT’s are the same as Assistants in the API, and that the ‘Knowledge’ part of GPT’s is the same as Assistants tool ‘Knowledge Retrieval’ in the API.
Knowledge retrieval (from docs)
Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.
For anyone that doesn’t know about chunking/embedding/Vector search, this empowers a type of search that people have been doing for a while with the OpenAI Embedding model called Semantic Search or Similarity Search.
So my assumption is that Knowledge in GPT’s would chunk our documents we upload, embed them and store those embedding into a vector DB for the GPT to query in our chats.
The whole point of Semantic search is to offload knowledge externally so that it doesn’t overload the token context window of a GPT which is now 128,000 tokens, this should let GPT be able to have a MASSIVE knowledge base that is can search (ideally iteratively) and get specific (chunks of) info that it uses in its actual context to answer our questions.
This is where I feel my assumption should be right… but its wither wrong or I am misunderstanding how OpenAI what us to interact with Knowledge.
These are the current test I have done. (Failed is the same as the image above, no response)
- 136k word markdown : failed
- 122K word markdown : failed
- 110k word markdown : failed
- 100K word markdown : failed
Then I tried it in another GPT just for testing and it actually replied but it never used knowledge retrieval…It actually couldn’t which is crazy strange. It would only retrieve when code interpreter was turned on but that was using code not the semantic search to search the document…
Retrieval was working when I first started with this project…have I been shadow banned or something with all my troubleshooting
I am at a stage when feel like the platform is being rather unreasonable with me as a lot is nonsensical
So I guess I have many closed loops right now which I don’t expect anyone to answer all my questions. But here are a few:
- Are people using Knowledge and is it working fine for you?
- If yes, how many files, words, etc? File types?
- Has anyone tried to push its limits? Whats the most files/words you have added to a knowledge base?
@openai can someone get ratelimited in the GPT creation environment? or limited in any way?
I’d love to hear from anyone about their GPT knowledge journey.