GPTs knowledge capacity limits

I have two GPTs. One I can update with no problems. The other I can’t. Both have less than 10 files in the database.
I am still waiting for some solution from OpenAI.

is ‘memory’ as good as the knowledge base of the GPT?

Thx for your feedback @matt0sai @Foxalabs Indeed, I can retrieve specific data via a custom API that I’ve created. Yet, there are other elements like installation instructions for custom software, documentation, training course materials, and notes. It would be more convenient to upload these directly, rather than having to alter them for API use.

I’ve tried .xlsx (7MB) containing 250k rows, with code interpreter enabled. It was able to read it rows at the end of the file and give answers.

1 Like

Very cool.

I’ve had the same experience with very large excel files. According to this OpenAI article there is no limit to the size of excel files you can include as Knowledge in a GPT.

2 Likes

I get that you need to code a proper solution for a large dataset, but how have these guys put a regular custom GPT interface to access it?
" ResearchGPT, AI Research Assistant. Search 200M academic papers from Consensus, get science-based answers, and draft content with accurate citations. By consensus.app"

1 Like

This one claims to have made a research GPT that is 1.5x larger than Consensus. I talked with the maker and it’s simply done via Actions. You can write connections to various servers so it not actually loaded up with the info, but rather connecting to it via API calls.

I have read that it accepts a maximum of either 10 or 20 files (I’ve seen both, guessing it’s likely 10), with a total maximum upload size limit of 500MB/GPT: and 10 GB total per user among all GPTs.

I’ve also read that images are not yet supported and also that each file is limited to ~ 2 million tokens. Presuming a token is 4 byte, that is ~ 8MB/file max or ~80MB total data. I’d personally shave 10-20% off of that for initial testing…

I suspect this would be pushing the actual limit per GPT and that some of the published numbers are a mix of per user, per GPT, and per organization.

I hope this helps…

The options to create assistants is interesting. Have you found how to train assistants? The notes on the openai Platform are cursory.

1 Like

There are plenty of YouTube videos that do a pretty good job of teaching you how to create assistants. Browse around and find someone who works for you! It’s a lot of trial and error, if you have any specific questions after playing around with them for awhile, please feel free to ask. Good luck.

This method seems to be working best so far. I am currently using it on my site. I will attempt to embed around 100k products into it and observe how it performs. I’ll assess whether it is cost-efficient and worth considering for long-term use.

It seems like an auto-generation mechanism, perhaps utilizing a search through a database and then retrieving answers by assigning scores to posts. It might not be just one GPT but rather a swarm of them. :exploding_head:

So I have been playing around with Custom GPT for the past hour. I am blown away what it can do when I upload a sample .pdf and ask a few questions. Then I decided to export an old slack workspace, organized the data into one text file and test it on GPT.

I have a 3MB file, but it doesn’t seem to be able to process it all. It stops right at the end and generates an error.

I see that many people here are uploading 200M academic paper. Any idea how GPT likes its information? At the moment, my text file is json formatted. Perhaps that is the reason why?

1 Like

I think the ones with 200m papers are connected via “Actions” aka APIs. As for formatting instructions and knowledge docs, if seen a few approaches and theories floating around. Here is an interesting twitter post that touches on a few topics…one being how these LLMs like info. https://x.com/kenshin9000_/status/1734238211088506967?s=20

Hi @frankie.sc.law welcome to the forum.

The TL;dr for knowledge files is

### Knowledge Files

        1. Convert all text files (.pdf, .docx, etc.) to text files. Ideally format as Markdown (.md). .txt is fine too.
        2. Convert all table files (.csv, Google Sheet, etc.) to .xlsx as ChatGPT works especially well with Excel files.
        3. Per GPT Limits: 10 files
        4. Per File Limits: 512MB (20MB for image files, no limit for .xlsx), 2M tokens
        5. Per User Limits: 10 GB. Per Organization Limits: 100 GB.
        6. Direct uploads to Knowledge are recommended for performance.
        7. Separate content into smaller files for better search efficiency.
        8. If knowledge is frequently updated, do not upload the file. Instead use a system to store the file or url and create an OpenAPI endpoint to fetch the content via an Action.

I am keeping track of these limits as this changes over time here as a quick reference for anyone.

8 Likes

FYI, GPTs support up to 20 files, this thread is a bit out of date now but mentioning this in case folks come across it.

9 Likes