I have not found any information as to how much data we can upload for GPTs knowledge. Any one know?
For me it is not working.
There seems to be a hard coded limit to 10 files.
But I have 250.000 dedicated science / history articles and it doesn’t accept that.
I can roughly upload 1.000 articles, that’s it.
After 10 files, the system stops saving and if you merge more data into one file it also stopping (too much context encountered).
I tried JSON, SQL, CSV, XLS, XLSX, TXT, PDF, HTM, HTML.
With a range from one file (500mb) to one hundred files.
It’s okay when you upload “just” 10 files and ‘about’ 1.000 articles, but it is not capable for really much data.
So long for the LLM principle.
I can upload a XLSX file (500mb) with my 250.000 records, but the results are so extreme slow (one minute waiting per turn) and bad (it only finds it when you point it to the right DB #ID), that it is not usable.
Also I can’t take “entry 48239” and “entry 47” and make up things out of the combined entries.
I came across a post on reddit claiming is a 20 file 10GB limit. Waiting to hear where they got that from.
In the documentation for Assistants:
You can attach a maximum of 20 files per Assistant, and they can be at most 512 MB each. In addition, the size of all the files uploaded by your organization should not exceed 100GB. You can request an increase in this storage limit using our help center.
Not sure if this also applies to GPTs
I had 100 files (plain UTF text), 450mb all together - it refused to upload more than 10.
Than I tried 10 files (plain UTF text), 45mb per file - it refused because “too much content”.
Now I have 1 file active (XLSX, 150mb) but that doesn’t work.
It can’t even search by keyword inside the file and it takes minutes before it crashes.
EDIT more info @crosslink
The 20 file 10GB limit is for Assistants, not GPT’s - I’m not sure what the knowledge limit is for GPT’s.
And not too many content, the LLM can’t handle that.
I have 250.000 high-end quality articles, all written in 10 years time by several dedicated and trustful sources.
But the LLM can’t handle it; too much text.
It’s like DALL-E3… it looks nice at the start, but for serious work it’s just a toy, not a tool.
You’re going to have better luck achieving what you’re trying to accomplish with Assistants, not GPT’s - you can find them here: OpenAI Platform. You can upload more files, more data. If you are still hitting your upload limit, you can create multiple assistants and have them communicate with each other utilizing a tool like AutoGen.
Maybe have an Assistant that “specializes” in a certain era of history communicate with one that specializes in another.
Perhaps another user can offer a different solution to your problem. Please pay attention to price when running inquiries with a large knowledge dataset.
I did, but the purpose is different from what I want.
I want to chat with a bot about a specific topic, powered by the 250.000 articles I have fed it with.
GPT can do (well, not with that amount of data) but the assistant can’t.
Also, branding and publicity is important for my purpose and the assistent has a high threshold form my target audience.
I did some tests with CURL and Postman, but it’s not even close from what I want to achieve.
On paper, the GPT seems the best, but it’s lack of large models it a culprit.
I created my own front-end and a searchroutine with elastic search, but I wanted to open it for free by GPT (for a bigger audience).
I mean, what’s the purpose of a LLM, when it can’t even query large scale content?
And 250.000 articles isn’t that much, regarding the subject given.
Why not just give the GPT access to the internet to find it’s own articles? I assume the collection of historic and scientific information you have saved came from internet sources, no? You can still restrain the context and get accurate data - maybe condense and upload the most important pieces of information you have so you’re able to meet the constraint requirements.
250,000 articles may not be that much information regarding a certain subject, but it is certainly a lot to upload into a custom Assistant - and I’m willing to bet there is a massive amount of redundant information in those articles. These are simply the constraints we all face in the LLM’s current stage of development.
Those are my articles.
I wanted to hand them over to the community.
You can try with Breebs, available since today as Breebs GPT.
Limits for a Breeb are 50 files, 500Mo and 8M characters. Not enough for your 250000 articles, but better than what ChatGPT currently allows.
All Breebs are public.
Personally i can accept AI being limited at the moment since its the first release of GPTs but it would be nice to know the exact limitations
We know that we can upload 10, maybe 20 seperate documents. We know they have to be less than 500mb. But we dont know the token limit before quality begins to decline
In my prior attempts to make bots with data bases putting transcripts into them was no problem, but 60,000 lines of text in a document was to much for it to be of any use in a conversation. I would say there is a point where each document caps out and becomes innefficient and id really like to know those numbers.
My guess with gpt is its going to be around 25 to 30k tokens, because you have the instructions, the prompt, the system prompt, and reading the dataset all taking up the context window, and so more than 30k per document sounds risky to me until we test it and find an exact cap. Any thoughts?
10 .txt files. What I found is that if I did 8 .txt files and 2 .pdf files that it would still work. When I moved to 7 .txt files and 3 pdf files, the system generated errors. Could be due to the content in the pdf files OR the size.
It cannot read compressed or zip files. It cannot successfully read a .txt if character spaces are removed (that’s a kind of “duh”, but I tried).
Size limitations: In Create mode it will tell you that 25mb is the ideal cap to avoid errors, but you can upload up to 50mb.
Character limitations: Based on my experimentation you can upload a single doc (.txt) that is less then 1.5mm characters.
So far working within theses parameters I was met with success.
I’ll update as I continue working with GPTs, but definitely interested in others’ answers.
The Assistants documentation on knowledge cap/docs doesn’t apply to GPTs, but the only reason I know even a tiny bit about this is via experimentation and asking the Create bot specific questions. See my findings in my other response. Of course, always evolving!
I just added a 317 pages PDF to one of mine and broke it. There are a total of 4 pdfs and the biggest is 1.1 MB.
same here. I uploaded a pdf of a book and it fails to generate any responses even unrelated to the pdf.
Hi, try pdf’s as well wikipedia articles in downloable pdf articles, if anything try converting them to plaintext or docx. btw it seems you can upload up to 8 files.
prob around 2k characters and you can upload up to 8 files.
PDF’s are fine, but also those are restricted in size.
You can simply not “feed” the knowledge with lot’s of text.
I had 250.000 high-end articles, but was only able to upload about 15.000 of them (which is a lot, but not enough for my purpose).
So I killed the GPT : it is a fun tool, not ment for serious research at scientific / historical / journalism level.