How to best use GPTs with PDF files?

robbar2015 · November 24, 2023, 7:05pm

I have a custom GPT that use PDF books as it’s source in the knowledge base. The GPTs limit I’ve seen so far is 10 total uploads and 10 files per upload (usually zipped) A few questions:

To assure the GPT is getting all the information it needs from the source what format is best? PDF, JSON, etc.
What is the best way to get maximum performance? Upload to knowledge base? Upload to a webserver and use an API?
For an optimal thorough search, is it best to separate everything by files (ex. chapters) or is it possible to group all PDFs into 1 large one and lose its ability to gather the information needed?
What are content management best practice recommendations to get the most of the source files?

Thanks in Advance

Foxalabs · November 24, 2023, 10:51pm

You can have up to 2million tokens worth of data per file.
You can have up to 20 files.
The most performant format is text.

The search system should chunk and search files automatically using the best methods.

jamiesnook · November 24, 2023, 11:53pm

So when you make your own GPT you can only upload 10 files? That doesn’t seem like much. What if you’d like to upload 500 pdf files?

Foxalabs · November 25, 2023, 12:28am

Well, there is a 2 million token limit per file and 10 in total so you can upload 20M tokens worth of data and use that to base your GPT on, if you wish to go for a commercial level system then you need to switch to assistants on the API side of things, or make use of vector database storage and retrieval to build a similar solution, but of enterprise grade.

robbar2015 · November 25, 2023, 11:06pm

Yes 10 separate file upload but you can have 10 files per upload.

robbar2015 · November 26, 2023, 4:35am

Interesting. Of the 2 options what do you recommend more? Was thinking of using wasabi and a n API.

Foxalabs · November 26, 2023, 4:37am

In vector DB terms… I like the people over at ChromaDB, but if you are after a ready to roll commercial solution you have to take a look at Azure Retrievals and pinecone.

ds2 · November 26, 2023, 9:57am

Did you know that ChatGPT doesn’t save any files or information once you close the current conversion? Not even one!

I was so happy to hear about the custom bots you can now create and also upload some books. When I used that custom bot later, I couldn’t understand why the answers were so bad - I just asked the bot if it remembered the books I uploaded a while ago, and it didn’t!

This is annoying!

cass · November 30, 2023, 6:20pm

Thanks to @Foxalabs and @robbar2015 for this conversation. It helped me understand best practice here.

So is this a good summary?

Convert all files to text files.
Per GPT Limits: 10 files
Per File Limits: 512MB (20MB for image files), 2M tokens
Per User Limits: 10 GB. Per Organization Limits: 100 GB.
Direct uploads to Knowledge are recommended for performance.
Separate content into smaller files for better search efficiency.
If knowledge is frequently updated, do not upload the file. Instead use a system to store the file or url and create an OpenAPI endpoint to fetch the content via an Action.

‘Reference: File uploads with GPTs and Advanced Data Analysis in ChatGPT | OpenAI Help Center’

I’ll be reviewing vectorDB options in the coming week as I am attempting to mass index tens of thousands of urls and pdfs for clients so this could be a great path to send knowledge efficiently to a GPT.

logankilpatrick · January 5, 2024, 2:50am

Following up here as well, the limit is 20 files per GPT, I am trying to get clarity on what data format would be most optimal, not clear to me right now that there is a difference in text vs something else. Behind the scenes, we have parsing libraries for different data formats so assuming those libraries work well, there should really be no difference but confirming if that is true.

dgiannini · January 5, 2024, 5:01am

Tested right now, 18 separated files with no errors, very good news! Is the size cap still set at 10gb per user? (Plus subs)

Would be great to have insights and tips about best format, i confirm, in terms of end-user-experience, a better responses time and behavior with .txt

Looking for a confirm

Thank you!

dgiannini · March 14, 2024, 9:47am

Months later, i never understood how this knowledge works… Today, after 1 month of missing testing, with same instructions, my gpt don’t use anymore its knowledge.
I leave the game

andreas.beer · March 23, 2024, 9:24am

it seems that the knowledge base files are only consulted occassionally or on direct instruction to do so. it’s not fine-tuned with this additional data. so yeah, GPTs are more fun toys to play with, but for proper custom GPTs it’s useless.

contact139 · September 18, 2024, 11:27am

I think putting a direct instruction to cGPT to read and use docs on Knowledge resolves this problem.

contact139 · September 18, 2024, 11:28am

Add a direct instruction in Configure section telling to cGPT to use data on knowledge first to answer any question.

Topic		Replies	Views
My GPT - Knowledge base - Best practices GPT builders	7	19535	January 25, 2024
GPTs has no access to analyse my document, GPT 4 can GPT builders	2	1878	December 14, 2023
GPTs knowledge capacity limits Plugins / Actions builders custom-gpt , gpts , chatgpt-gpt , tp-1	76	52154	January 5, 2024
What are the limitations of GPT-4 in analyzing PDF text? Prompting gpt-4	6	28424	March 12, 2024
Speeding up knowledge base searches in build a GPT? Plugins / Actions builders gpt-4 , chatgpt , gpt-builder	1	1406	November 14, 2023

How to best use GPTs with PDF files?

Related topics