GPT knowledge base doesn't read all kinds of pdfs

I have tried to upload many pdfs and other formats and gpt knowledge base can’t read them. I checked for size, tokens, etc and it kept saying everything is ok.
Any idea?

Are the PDFs it can’t read images and not text?

Could it be because they are encrypted PDFs?

1 Like

some error from back end. It can happen differently. But the whole point is to have files in knowledge. it can use unreal text in knowledge for work like use knowledge

The pdfs contain text and images. Thank you for asking!

No. they are plain pdfs. Thank you for asking

Hi chieffy99. I kind of didn’t understand when you mentioned that it can use unreal text in knowledge. Do you mean that it can allucinate and pretend it is using a text, when it is not using anything?
By the way, based on what I have read on redditt and here, it seems OpenAi has not a final product to offer (lots of instability, lack of gpt learning capability, etc), but is charging anyway, just taking advantage of some curious people who are keen to use the technology.

Sorry that I replied late. And I can’t find the chat screen that I saved. The term unreal text may not be accurate in many situations. It’s just that it’s related to GPT in using knowledge. For example:

  1. There are 2 files with different contents. I sent a file in chat to compare with file A, but MyGPT loads file B and explains that the contents of file B are related to the file sent via chat. At this point, some content needs to be edited so that it can be called unreal text.
  2. Has some file built in. I discuss topics not related to files and the need for information is discussed. MyGPT loads the file and uses the content (likely available in the tutorial) related to the chat response along with the file content.
  3. Often attempts are made to use files inappropriately even though the file clearly states their use.

I believe it is not a hallucination. I think it came from an order that was adjusted incorrectly. Because there is a change in the pattern of birth. Or the size of the occurrences, which are so frequent that I think something might need to be fixed in the back end.

As for ratdit or various columns I don’t care about nonsense news that is published without even testing it to see if it’s true or not. I came across various articles publishing attacks. From making headlines to demand attention, to distorting, to cutting out certain things. Even world news agencies It has also been written about the use of GPT to break the law by publishing instruction prompts used to create and explain how to use it to the public.

I’m not knowledgeable in ML myself, but I can understand that LLM, which is widely enabled like ChatGPT, doesn’t just learn from data. Instead, it learns from the user’s interaction with the system. reward-modeling by using user reactions to create weight for performance To learn what should or should not be done, but if something goes wrong It is impossible to delete or overwrite what GPT has learned. What you can do is just learn new things so that what you learn is more important than old knowledge. But it is difficult if it comes from learning from society. You must use a method of writing or setting the message to have more weight than before. However, it has disadvantages as it is not natural knowledge. (I don’t know all and don’t know how to explain it)

But I can tell you that the protection against hacking in CustomGPT includes solving knowledge problems that OpenAI doesn’t announce or tell us how to do because it isn’t necessary. Because of the qualifications of people who will sell GPT in stores Inability to manage the use of knowledge in a natural way and prevent data pilferage. The store is full of low-quality GPT. To solve the problems I encountered in knowledge, I have a method. But I’m not going to sell it in the store, I’m just doing it in a way that suits my use case, just removing the knowledge and giving it to chat in batches, or creating a GPT that suits the use case.

gpt can use both image in pdf and image pdf but it use time to ORC image more than text, causing less content to be loaded each time.

Thank you chieffy99 for your answers. I will try to upload files with less images, or even images separated from the files, just in case. And thank you for you explanation on unreal text.