What is the knowledge file limit that can be uploaded to train custom GPTs

I want to create my own custom GPT that i can share with others via chat GPT store. I don’t have a chat GPT premium yet. Before subscribing i want to know one thing about custom GPT knowledge.

I want to train my custom GPT with thousands of articles, but what i read from online sources is that you can’t upload more than 10 documents in one custom GPT. I want to know how to overcome this barrier and train my custom GPT on these thousands of articles.

I’m a beginner and would appreciate if your answers are geared towards one.

Your help is greatly appreciated.

3 Likes

That is a good question, one that isn’t explicitly answered for ChatGPT, but we can guess that it follows the same pattern as the API cousin: 2 million extracted tokens per document (about 1 million words).

The highest quality is where you are in control of the language - not sending PDF files or word processor documents that need to (unreliably) be processed to have text extracted, but by submitting curated plain text with sections and headings that the AI and the search can read directly.

I would watch the progress of other forum topics, where the knowledge retrieval for GPT files, especially those proprietary formats, isn’t currently performing to expectations.

Hopefully that simple tip allows you a more performative GPT that can succeed when shared.

1 Like

Hey thanks for replying. I understand the importance of plain text from what you said, about how many plain text files I can upload to train custom GPT? is it 10? That would be too less for what I was planning, I wanted to upload thousands of text files that have articles within it. I’m a beginner and would appreciate if you could keep it in layman’s terms. Thank you for your time.

To go beyond the document search abilities and limitations, you would really need to develop your own API on a web server that the AI can call upon via an action. However, then, if your search is based on AI embeddings semantic search with features of API services, you are paying for the additional AI costs of those that use your GPT - as a gift.

Also, then, one considers if the exact snippet containing an answer can even be found in 10 million words. Let’s ask the AI if the GPT already meets your expectations:

To provide a tangible sense of how long 10 million words might be, let’s break it down using familiar references such as books and encyclopedias.

  1. Average Book Length: The average length of a novel is roughly 80,000 to 100,000 words. Using 90,000 words as an average, 10 million words would be equivalent to about 111 novels of average length.

  2. Encyclopedia Sets: The Encyclopædia Britannica, one of the most comprehensive encyclopedias, has about 40 million words across all its volumes. Therefore, 10 million words would be about 25% of the entire Encyclopædia Britannica set, or roughly equivalent to 8-10 volumes of a standard 32-volume set.

  3. Famous Long Books: To give an example of particularly long books, “War and Peace” by Leo Tolstoy is about 587,287 words long, and “In Search of Lost Time” by Marcel Proust is around 1.2 million words in total. Therefore, 10 million words would be equivalent to approximately 17 copies of “War and Peace” or about 8 copies of “In Search of Lost Time.”

These examples should give you a sense of the sheer volume of content that 10 million words represent, equating to a substantial library of books or a significant portion of a comprehensive encyclopedia set.

Hey, I’m not a dev and definitely won’t be creating APIs. You said it could be 1 million words but I’m asking how many (number) files can one upload to custom gpt knowledge base. I read in these threads that it is 10-20, that’s why I was wondering if I can do thousands of articles.

Here is one of the threads that started triggered my question - \https://community.openai.com/t/gpts-knowledge-capacity-limits/492955

help.openai.com is where you’ll find ChatGPT answers (this is primarily a developer forum). For example:


How many files can I upload at once per GPT?

Up to 20 files per GPT for the lifetime of that GPT.


What are those file upload size restrictions?

  • All files uploaded to a GPT or a ChatGPT conversation have a hard limit of 512MB per file.
  • All text and document files uploaded to a GPT or to a ChatGPT conversation are capped at 2M tokens per file. This limitation does not apply to spreadsheets.
  • For images, there’s a limit of 20MB per image.
  • Additionally, there are usage caps:
    – Each end-user is capped at 10GB.
    – Each organization is capped at 100GB.
    – Note: An error will be displayed if a user/org cap has been hit.

1 Like

Hey thanks! I read this 20-file point earlier, but I was wondering if I could somehow upload thousands of text files about a specific subject to train custom GPT on it. Like is there some beginner-friendly method that can indirectly allow me to upload or train my custom gpt with this huge corpus of text?

I intend to make it public and shareable through the chat gpt store.

The AI model cannot be “trained” in the sense of traditional AI fine-tuning.

All the users of ChatGPT Plus get their normal GPT-4 AI model version, upon which the instructions of a GPT are placed.

The retrieval of document knowledge (if larger than just a thousand words or so automatically injected) is by a search function. The AI basically writes a query like it would to a web search engine, gets top results back. If the citation looks like more needs to be read for the AI to understand fully, the AI of ChatGPT can “click” to scroll through more of the document at that point behind the scenes.

So, documents don’t train an AI, they provide on-demand augmentation – search queries to find if there is more knowledge to fulfill a user’s input.

Hey thank you for continuously replying. I was not using the trained word in that sense. I just want chat gpt to use the text files that I provide it and act as an expert in that domain, for eg after feeding a lot of political science research papers, I want to make a custom gpt that is a political scientist GPT. So I just want to know how many files can I upload with the goal of creating such a custom GPT. Of course, this can apply to any field, math, physics, literature etc.

Edit: Another example, I saw Khan Academy’s custom GPT in the store, they must’ve used their tutorials to create that custom gpt, how were they able to upload such an amount of data to create this custom GPT?

The file count is 20.
The token count per file is 2M.
The ideal strategy is to combine like data into a continuous text file up to the maximum.

AI “tokens” can be counted on a site like “OpenAI tokenizer” or “vercel tokenizer” so you can get an idea of the number of characters per token (about 4.2 in English) and the length of file facing rejection upon upload.

Sophisticated solutions simply do not use consumer-oriented all-purpose ChatGPT methods alone. Developer products meant for wider audiences are not GPTs – only available to Plus subscribers (to the benefit of OpenAI).

It’s $20 to see what it can do, have at it.

1 Like

Hi Freddy. Glad to meet here. I have been lookin for a same response to my use case. May i know if you are able to find a solution. Good point on Khan Academy. I would also say the research based GPTs must have been done on some superior logic or capability. Happy to connect if you would like to. My email is rajesh.lakka@gmail.com.