I have not found any information as to how much data we can upload for GPTs knowledge. Any one know?
For me it is not working.
There seems to be a hard coded limit to 10 files.
But I have 250.000 dedicated science / history articles and it doesnât accept that.
I can roughly upload 1.000 articles, thatâs it.
After 10 files, the system stops saving and if you merge more data into one file it also stopping (too much context encountered).
I tried JSON, SQL, CSV, XLS, XLSX, TXT, PDF, HTM, HTML.
With a range from one file (500mb) to one hundred files.
Nothing works.
Itâs okay when you upload âjustâ 10 files and âaboutâ 1.000 articles, but it is not capable for really much data.
So long for the LLM principle.
I can upload a XLSX file (500mb) with my 250.000 records, but the results are so extreme slow (one minute waiting per turn) and bad (it only finds it when you point it to the right DB #ID), that it is not usable.
Also I canât take âentry 48239â and âentry 47â and make up things out of the combined entries.
I came across a post on reddit claiming is a 20 file 10GB limit. Waiting to hear where they got that from.
In the documentation for Assistants:
You can attach a maximum of 20 files per Assistant, and they can be at most 512 MB each. In addition, the size of all the files uploaded by your organization should not exceed 100GB. You can request an increase in this storage limit using our help center.
Not sure if this also applies to GPTs
I had 100 files (plain UTF text), 450mb all together - it refused to upload more than 10.
Than I tried 10 files (plain UTF text), 45mb per file - it refused because âtoo much contentâ.
Now I have 1 file active (XLSX, 150mb) but that doesnât work.
It canât even search by keyword inside the file and it takes minutes before it crashes.
EDIT more info @crosslink
The 20 file 10GB limit is for Assistants, not GPTâs - Iâm not sure what the knowledge limit is for GPTâs.
10 files.
And not too many content, the LLM canât handle that.
I have 250.000 high-end quality articles, all written in 10 years time by several dedicated and trustful sources.
But the LLM canât handle it; too much text.
Itâs like DALL-E3⌠it looks nice at the start, but for serious work itâs just a toy, not a tool.
Youâre going to have better luck achieving what youâre trying to accomplish with Assistants, not GPTâs - you can find them here: OpenAI Platform. You can upload more files, more data. If you are still hitting your upload limit, you can create multiple assistants and have them communicate with each other utilizing a tool like AutoGen.
Maybe have an Assistant that âspecializesâ in a certain era of history communicate with one that specializes in another.
Perhaps another user can offer a different solution to your problem. Please pay attention to price when running inquiries with a large knowledge dataset.
I did, but the purpose is different from what I want.
I want to chat with a bot about a specific topic, powered by the 250.000 articles I have fed it with.
GPT can do (well, not with that amount of data) but the assistant canât.
Also, branding and publicity is important for my purpose and the assistent has a high threshold form my target audience.
I did some tests with CURL and Postman, but itâs not even close from what I want to achieve.
On paper, the GPT seems the best, but itâs lack of large models it a culprit.
I created my own front-end and a searchroutine with elastic search, but I wanted to open it for free by GPT (for a bigger audience).
I mean, whatâs the purpose of a LLM, when it canât even query large scale content?
And 250.000 articles isnât that much, regarding the subject given.
Why not just give the GPT access to the internet to find itâs own articles? I assume the collection of historic and scientific information you have saved came from internet sources, no? You can still restrain the context and get accurate data - maybe condense and upload the most important pieces of information you have so youâre able to meet the constraint requirements.
250,000 articles may not be that much information regarding a certain subject, but it is certainly a lot to upload into a custom Assistant - and Iâm willing to bet there is a massive amount of redundant information in those articles. These are simply the constraints we all face in the LLMâs current stage of development.
Those are my articles.
I wanted to hand them over to the community.
You can try with Breebs, available since today as Breebs GPT.
Limits for a Breeb are 50 files, 500Mo and 8M characters. Not enough for your 250000 articles, but better than what ChatGPT currently allows.
All Breebs are public.
Personally i can accept AI being limited at the moment since its the first release of GPTs but it would be nice to know the exact limitations
We know that we can upload 10, maybe 20 seperate documents. We know they have to be less than 500mb. But we dont know the token limit before quality begins to decline
In my prior attempts to make bots with data bases putting transcripts into them was no problem, but 60,000 lines of text in a document was to much for it to be of any use in a conversation. I would say there is a point where each document caps out and becomes innefficient and id really like to know those numbers.
My guess with gpt is its going to be around 25 to 30k tokens, because you have the instructions, the prompt, the system prompt, and reading the dataset all taking up the context window, and so more than 30k per document sounds risky to me until we test it and find an exact cap. Any thoughts?
10 .txt files. What I found is that if I did 8 .txt files and 2 .pdf files that it would still work. When I moved to 7 .txt files and 3 pdf files, the system generated errors. Could be due to the content in the pdf files OR the size.
It cannot read compressed or zip files. It cannot successfully read a .txt if character spaces are removed (thatâs a kind of âduhâ, but I tried).
Size limitations: In Create mode it will tell you that 25mb is the ideal cap to avoid errors, but you can upload up to 50mb.
Character limitations: Based on my experimentation you can upload a single doc (.txt) that is less then 1.5mm characters.
So far working within theses parameters I was met with success.
Iâll update as I continue working with GPTs, but definitely interested in othersâ answers.
The Assistants documentation on knowledge cap/docs doesnât apply to GPTs, but the only reason I know even a tiny bit about this is via experimentation and asking the Create bot specific questions. See my findings in my other response. Of course, always evolving!
I just added a 317 pages PDF to one of mine and broke it. There are a total of 4 pdfs and the biggest is 1.1 MB.
same here. I uploaded a pdf of a book and it fails to generate any responses even unrelated to the pdf.
Hi, try pdfâs as well wikipedia articles in downloable pdf articles, if anything try converting them to plaintext or docx. btw it seems you can upload up to 8 files.
prob around 2k characters and you can upload up to 8 files.
PDFâs are fine, but also those are restricted in size.
You can simply not âfeedâ the knowledge with lotâs of text.
I had 250.000 high-end articles, but was only able to upload about 15.000 of them (which is a lot, but not enough for my purpose).
So I killed the GPT : it is a fun tool, not ment for serious research at scientific / historical / journalism level.