This file contains too much text content. Please try again with a smaller file

When I build new custom GPT, I kept encountering the following error when I tried to upload a pdf file into GPT’s knowledge:

“This file contains too much text content. Please try again with a smaller file.”

I kept getting this error despite having pdf files smaller than 9MB.
My pdf files are less than 1000 pages.

I kept shrinking my file, cutting it to smaller and shorter parts, but it appears the error persists.

I couldn’t seem to find any information on how to avoid this error?

Would appreciate any help.

EDIT:
Out of curiosity, I deleted a file (a 1400 pages pdf file) that I have previously uploaded the GPT knowledge and then reuploaded it again to the GPT knowledge. TO MY SUPRISE!! NOW I AM GETTING THE SAME ERROR MESSAGE FOR THIS FILE!!!
How come I never received any notification on such changes from the OpanAI team?

1 Like

The size of data that can be appended to a custom GPT may have changed. The following limitations currently apply.

You can use the GPT editor to attach up to 20 files to a GPT. Each file can be up to 512 MB in size and can contain 2,000,000 tokens.

https://help.openai.com/en/articles/8843948-knowledge-in-gpts

I think the attached PDF probably exceeds the 2,000,000-token limit.

3 Likes

Thank you for your reply! it is helpful!
However, this did not answer the question of why a previously uploaded file (accepted to the knowledge) is now consider too large?

1 Like

No one can answer exactly ‘why.’

We can only speculate that the criteria have changed between when you uploaded it before and now.

When the model used in Custom GPT was changed from GPT-4 to GPT-4o, the byte pair encoding method was updated from cl100k_base to the corresponding o200k_base, resulting in a different number of tokens being counted for the same amount of data.

In general, the new byte-pair encoding is expected to result in fewer tokens, but there may be cases where the opposite is true. :slightly_smiling_face:

1 Like