What is the knowledge file limit that can be uploaded to train custom GPTs

freddyhelium82 · February 25, 2024, 7:10am

I want to create my own custom GPT that i can share with others via chat GPT store. I don’t have a chat GPT premium yet. Before subscribing i want to know one thing about custom GPT knowledge.

I want to train my custom GPT with thousands of articles, but what i read from online sources is that you can’t upload more than 10 documents in one custom GPT. I want to know how to overcome this barrier and train my custom GPT on these thousands of articles.

I’m a beginner and would appreciate if your answers are geared towards one.

Your help is greatly appreciated.

_j · February 25, 2024, 8:13am

That is a good question, one that isn’t explicitly answered for ChatGPT, but we can guess that it follows the same pattern as the API cousin: 2 million extracted tokens per document (about 1 million words).

The highest quality is where you are in control of the language - not sending PDF files or word processor documents that need to (unreliably) be processed to have text extracted, but by submitting curated plain text with sections and headings that the AI and the search can read directly.

I would watch the progress of other forum topics, where the knowledge retrieval for GPT files, especially those proprietary formats, isn’t currently performing to expectations.

Hopefully that simple tip allows you a more performative GPT that can succeed when shared.

freddyhelium82 · February 25, 2024, 8:20am

Hey thanks for replying. I understand the importance of plain text from what you said, about how many plain text files I can upload to train custom GPT? is it 10? That would be too less for what I was planning, I wanted to upload thousands of text files that have articles within it. I’m a beginner and would appreciate if you could keep it in layman’s terms. Thank you for your time.

_j · February 25, 2024, 8:28am

To go beyond the document search abilities and limitations, you would really need to develop your own API on a web server that the AI can call upon via an action. However, then, if your search is based on AI embeddings semantic search with features of API services, you are paying for the additional AI costs of those that use your GPT - as a gift.

Also, then, one considers if the exact snippet containing an answer can even be found in 10 million words. Let’s ask the AI if the GPT already meets your expectations:

To provide a tangible sense of how long 10 million words might be, let’s break it down using familiar references such as books and encyclopedias.

Average Book Length: The average length of a novel is roughly 80,000 to 100,000 words. Using 90,000 words as an average, 10 million words would be equivalent to about 111 novels of average length.

Encyclopedia Sets: The Encyclopædia Britannica, one of the most comprehensive encyclopedias, has about 40 million words across all its volumes. Therefore, 10 million words would be about 25% of the entire Encyclopædia Britannica set, or roughly equivalent to 8-10 volumes of a standard 32-volume set.

Famous Long Books: To give an example of particularly long books, “War and Peace” by Leo Tolstoy is about 587,287 words long, and “In Search of Lost Time” by Marcel Proust is around 1.2 million words in total. Therefore, 10 million words would be equivalent to approximately 17 copies of “War and Peace” or about 8 copies of “In Search of Lost Time.”

These examples should give you a sense of the sheer volume of content that 10 million words represent, equating to a substantial library of books or a significant portion of a comprehensive encyclopedia set.

freddyhelium82 · February 25, 2024, 8:37am

Hey, I’m not a dev and definitely won’t be creating APIs. You said it could be 1 million words but I’m asking how many (number) files can one upload to custom gpt knowledge base. I read in these threads that it is 10-20, that’s why I was wondering if I can do thousands of articles.

Here is one of the threads that started triggered my question - \https://community.openai.com/t/gpts-knowledge-capacity-limits/492955

_j · February 25, 2024, 8:44am

help.openai.com is where you’ll find ChatGPT answers (this is primarily a developer forum). For example:

How many files can I upload at once per GPT?

Up to 20 files per GPT for the lifetime of that GPT.

What are those file upload size restrictions?

All files uploaded to a GPT or a ChatGPT conversation have a hard limit of 512MB per file.
All text and document files uploaded to a GPT or to a ChatGPT conversation are capped at 2M tokens per file. This limitation does not apply to spreadsheets.
For images, there’s a limit of 20MB per image.
Additionally, there are usage caps:
– Each end-user is capped at 10GB.
– Each organization is capped at 100GB.
– Note: An error will be displayed if a user/org cap has been hit.

freddyhelium82 · February 25, 2024, 8:49am

Hey thanks! I read this 20-file point earlier, but I was wondering if I could somehow upload thousands of text files about a specific subject to train custom GPT on it. Like is there some beginner-friendly method that can indirectly allow me to upload or train my custom gpt with this huge corpus of text?

I intend to make it public and shareable through the chat gpt store.

_j · February 25, 2024, 9:05am

The AI model cannot be “trained” in the sense of traditional AI fine-tuning.

All the users of ChatGPT Plus get their normal GPT-4 AI model version, upon which the instructions of a GPT are placed.

The retrieval of document knowledge (if larger than just a thousand words or so automatically injected) is by a search function. The AI basically writes a query like it would to a web search engine, gets top results back. If the citation looks like more needs to be read for the AI to understand fully, the AI of ChatGPT can “click” to scroll through more of the document at that point behind the scenes.

So, documents don’t train an AI, they provide on-demand augmentation – search queries to find if there is more knowledge to fulfill a user’s input.

freddyhelium82 · February 25, 2024, 9:23am

Hey thank you for continuously replying. I was not using the trained word in that sense. I just want chat gpt to use the text files that I provide it and act as an expert in that domain, for eg after feeding a lot of political science research papers, I want to make a custom gpt that is a political scientist GPT. So I just want to know how many files can I upload with the goal of creating such a custom GPT. Of course, this can apply to any field, math, physics, literature etc.

Edit: Another example, I saw Khan Academy’s custom GPT in the store, they must’ve used their tutorials to create that custom gpt, how were they able to upload such an amount of data to create this custom GPT?

_j · February 25, 2024, 9:38am

The file count is 20.
The token count per file is 2M.
The ideal strategy is to combine like data into a continuous text file up to the maximum.

AI “tokens” can be counted on a site like “OpenAI tokenizer” or “vercel tokenizer” so you can get an idea of the number of characters per token (about 4.2 in English) and the length of file facing rejection upon upload.

Sophisticated solutions simply do not use consumer-oriented all-purpose ChatGPT methods alone. Developer products meant for wider audiences are not GPTs – only available to Plus subscribers (to the benefit of OpenAI).

It’s $20 to see what it can do, have at it.

rajesh.lakka · April 11, 2024, 3:50am

Hi Freddy. Glad to meet here. I have been lookin for a same response to my use case. May i know if you are able to find a solution. Good point on Khan Academy. I would also say the research based GPTs must have been done on some superior logic or capability. Happy to connect if you would like to. My email is rajesh.lakka@gmail.com.

himanshu.ninja7 · May 20, 2024, 5:12pm

You Can merge all article in a single file and upload the file in one go , then you train your custom GPT

rohitgulia02 · May 21, 2024, 3:42am

Or can we not use in the backend a smart retrieval function from a database perhaps ?

Train GPT to get the gist of what the user is requesting, send those variables in the backend to the db to retrieve content and then have your custom gpt do analysis on it. Ofcourse that creates another challenge, the input token count is 128k so in case i am retrieving multiple files, i am just defeating the purpose of having large knowledge base. Is there a way that files are temporarily stored in the GPT during retrieval so that from the backend whatever data is being fetches is uploaded first in the files ?

schmarzo · June 9, 2024, 3:26pm

Is there any way to upload more than 20 documents into my personal GPT created with ChatGPT4? Sorry, but I’m technical enough to know how to create an API to the files on my laptop or blogs that I’ve written on a website. Thanks

paulicek.david · June 27, 2024, 1:39pm

Hey guys. Here’s step by step How i do It to upload more than 20 docs:

Go to the Notion web (notion.so)

create account
create new page (Write your knowledge there)
click the “share” button any copy link (make sure the link can be seen by anyone)

Edit your GPT and type It something like:
“From now on when you’ll create a text use this website for knowledge: [your website]

Hope this helps

Michael_Evans · May 9, 2025, 4:13pm

Hi! Thanks for this. It’s very helpful. Do you know if this information is up to do date? Has its knowledge capacity increased (or decreased) since this time last year?

_j · May 9, 2025, 5:04pm

GPTs have not changed. They still have the same maximum instruction length.

I also suspect that the context length advertised by “Pro” is not delivered on GPTs for document searches on uploaded files, and “Free” can also use GPTs and would likely not have a degraded file search experience.

GPTs are not “trained”. They just have instructions and a file search that can include the GPT documents similar to user uploads to search on.

yun.el · June 18, 2025, 4:53am

Is it possible to use this method via API if the data is on my own server?

Topic		Replies	Views
How to best use GPTs with PDF files? Plugins / Actions builders plugin-development	14	18514	September 18, 2024
GPTs knowledge capacity limits Plugins / Actions builders custom-gpt , gpts , chatgpt-gpt , tp-1	76	53599	January 5, 2024
How can i build my chatgpt by speacial knowledge database Community gpt-4	11	12677	March 30, 2024
Token limit for Custom GPT's GPT builders	10	3428	September 20, 2024
Knowledge file upload limitations GPT builders	4	799	April 3, 2025

What is the knowledge file limit that can be uploaded to train custom GPTs

How many files can I upload at once per GPT?

What are those file upload size restrictions?

Related topics