GPTs knowledge capacity limits

My frustration is that each time it is going back and ‘Searching my knowledge’ and that is taking anywhere between 5 and 15 seconds. Painful after a while. I have been playing with instructions to stop it but not always working.

I have a relatively small text file. 117kb

Any tips on how to optimise your source file. I have a 60 page doc in Google Docs that is my ‘Master’ and from that I export a .txt file.

|Pages|58|
|Words|14887|
|Characters|105274|
|Characters excluding spaces|91784|

Played around with GPT4 ‘regular’ to optimise the file and it can do some stuff that looks clever with a prompt but not always do I get a result and mostly it fails analysing:

Please process the attached text file for AI chatbot readability. The text should be cleaned of any headers or footers, page number and segmented into clear, readable sentences or bullet points. Don't remove any of the content. Pay special attention to maintain any URLs intact, as they are essential for the chatbot's reference. Format the output into a text file optimized for the chatbot's understanding.

Be good to understand what others are thinking when it comes to KB optimisations to avoid lengthy ‘Searching my knowledge’!

Best practice on formatting your source doc would be good as I’m sure a good structure will deliver a better prompt to output speed.

Thanks

4 Likes

Thanks for this share. Nice idea :ok_hand:

Am I wrong in assuming that it shouldn’t matter how large a file is as it chunks the data embeds it for vector search (Knowledge retrieval). It has to be the same functionality as the Knowledge retrieval tool for the Assistants API.

The whole point of this is that it doesn’t overload the context and it can semantically search the uploaded documents with ease. I have created my own before assistants came out and its super fast to search.

I’d assume that OpenAI would of optimized chunking and retrieval and maybe even made a better Embedding model (high hopes) for their own Knowledge retrieval product…I feel that it might be broken right now to be honest, i just posted about this and its acting super weird.

6 Likes

I agree! The human experience is not replaceable at all, It seems to me our virtual assistants are getting better!

For me it seems, that with GPT Builder + Knowledge, it will load the file on the fly and only reads parts of it (I’m using a markdown-file with 13k lines). In my tests, it will sometimes only read the first 500 lines and then it says, that it couldn’t find the information I was looking for. In other scenarios, it will read the full file, but tries to find the keywords the user was looking for:

Prompt: Please show me the code for classic perlin noise from lygia

# Let's read the content of the uploaded file to find information about Classic Perlin Noise.

with open('/mnt/data/lygia.md', 'r') as file:
    lygia_content = file.read()

# Search for the section about Classic Perlin Noise in the content
perlin_noise_section = None

if 'Classic Perlin Noise' in lygia_content:
    start_index = lygia_content.find('Classic Perlin Noise')
    end_index = lygia_content.find('\n\n', start_index)
    perlin_noise_section = lygia_content[start_index:end_index]

perlin_noise_section

This approach doesn’t make a lot of sense, as I don’t want it to do a exact-text-search, but instead convert the query of the user into a vector and search in a vector store for the content the user is looking for. I was expecting, that it’s working like this.

I tried to upload just 17 smallish PDF files and I got “unable to save” errors until I deleted the number down to 10 or less. Is this normal? What are the limits for a Custom GPT?

Hi and welcome to the Developer Forum!

Yes, 10 does seem to be the file count limit.

Thanks! And what is the upper limit on the file size? And is there a limit to the combined size of the 10 files?

I have another related issue that when folks talk to the GPT and it tries to give citations from my PDFs, it shows an error “Malformed citation 【Circular Economy.pdf†source]” (the PDF file was Circular Economy.pdf). Is there a way to provide a URL from which I generated the PDF? Because when I tried to feed it the URLs of my articles, it said that it was unable to do so.

Try uploading a ZIP file with all your files. It may work. You can do a quick test by ZIP’ing a few files and ask GPT-4 to tell you what is in the zip file

Not sure on the size, I can make a guess of 256MB from the fact that the Assistants API allows 20 files of 512MB and the GPTs allows 10 files… at (maybe?) 256MB so basically half… total guess though.

1 Like

you say the same thing over and over, we heard the first time😭

I solved it by sending a query from GPT through the API to my server.

The server searched in the SQL, returns data with context and GPT creates a response for it.

5 Likes

yes, i think they might expand the 10 files cap, btw if you try to upload small txt after the 10 files cap it seems you can fit a couples more files as well compressing the pdf’s helps a ton reducing the pdf’s size by a 50%

2 Likes

Can you tell us more about using an SQL database for this type of application? It seems most are using a vector database for this type of operation, but perhaps that isn’t really needed in some cases?

What are the token character limited?

I did upload a PDF file around 200k characters and 28k words

it is about gradio documentation

single file

I agree on the 10 file limit. Cannot get beyond it. Since Wikidata is not yet a GPT, I looked at the latest Wikidata dumps. There is a gzipped 12GB file but I imagine that is too large, and ChatGPT states that it is not trained on Wikidata.

Could you upload your files to a personal website and point your GPT to that website to search for the information?

1 Like

There is a workaround! You can archive your files and upload them as a zip. Enable Code interpreter. Next give such a prompt: “I have enable Code Interpreter, unzip filename and deeply analyze all the data you will find and store it as a knowledge and update GPT”
There is one problem. It will summarize data from your files, so it’s not storing it in full, probably because of a memory limit. Enjoy.

3 Likes

Yes, that’s my experience. 10. I’ve done all pdfs and all .txts. Had generating errors with all pdfs.

But, if you mix and match .txt with .pdf I found the only balance that worked for was 8 .txt and 2 .pdf. Any increase in pdfs it crapped out over and over.

Hey, @brisklad is there any limit for zip file?