I’m working on my gpts since last month, i’m playing with the knowledge section into the “configure” tab.
I have something like 30/40 pdf files of my business knowledge (no internet/public resource), i’m aware there is a 10gb per user cap and 10 max files btw i have strange behaviors.
10 files were not enough for me so i merged some pdfs and now i have the best knowledge for my purpose considering the limits.
By the way now asking questions regarding something inside a merged pdf doesn’t work. I tried general question with no result but if i go deeply asking for example “search this into my file name” it works.
What do you think? Is it a good practice or not to merge files? How can i improve my knowledge?
I am somewhat of an API addict, so take this as the biased perspective it is, but I’d take your business knowledge, put it in a database, and make endpoints for the GPT to call to get the relevant information for the query. Seems like the cleanest way, I guess it depends what your business knowledge is though.
What do you want form Knowledge file? i think to use knowledge for refer nformation that is hard to find on the internet. Or very specific to the context, like textbooks in my language. Being able to put knowledge in Thai helps a lot.
Do you want to use a knowledge file because the information you provide is not available on the Internet?
Then maybe it will help you to do this
Make headers, or better yet, convert from PDF to MD or TXT. Based on the headers, GPTs will be able to navigate and you will be able to give more accurate queries.
Text formatting in MD is very important, although the model can distinguish code among the text, but it is easier for it when it is highlighted with special characters, just like headings.
Remember that this is not a human program and the special symbol is very important.
Exactly, my knowledge is personal and not on the internet for my use-case, i want the gpt extend its base knowledge using content from files answering mix a public and specific knowledge.
Not so happy about a missing clear docu and behavior of this feature
Based on my experience and research I would recommend to:
switch to markdown formatting (attention, when I last tried, .md files were not supported by the automatic retrieval, but .txt files with markdown formatting work fine)
Test your formatting. Sometimes headings are cut-off, other times list won’t work. I recommend creating an assistant in the playground, there you can check the logs and see if the expected parts are returned
“annotate” your information. while some abstraction is possible (e.g. the retrieval might return “something red” when you search for “something colored”), it has its limits. Think about what and how users might ask and ask yourself if vector search can accomplish that level of abstraction. If not: maybe add specific tags to the relevant parts in your file
If you are really serious about this, think about sorting your knowledge files and put often required info to the beginning, this impacts retrieval time for large files.
I call this process Knowledge File Optimisation (KFO) A detailed write-up of my research on this can be found here
Thank you! Do u think merging different files converted , in one md, it could be a good solution? Or should i take small files for better performances?
If so, it’s still important. There are many methods people use to get GPT to use their Knowledge files as they want, ranging from highly technical to simple methods. I myself don’t know the best way, but I think 3 things are important factors:
What File?
depending on the file type, internal data. If we refer to the idea that text files are a good and very efficient method, text files with correct content. Completely arranged properly will make GPT work in finding information to use easily. But keep in mind that GPT does not treat documents with proper visual organization and therefore should not be used only human perspective. But you can combine related content into a single file to facilitate for GPT. Think to task if you are machine think like human work fot human but no sensation like human.
How to use file?
You may specify the sequence of operations in the instruction, such as in which situations the file should be used. Varies according to file You may also write instructions on how to use the contents of the file, such as how to search for information within it. (I don’t know if this method works because I’m a general user, don’t have the time, knowledge, and haven’t tested it clearly.)
Who are use File?
Knowldge & understanding There is a difference. It is the most important. Even creators like OpenAI don’t have a better understanding of ChatGPT than anyone else, so your usability is the most important factor. Regardless of the method used in the previous 2 steps. You understand those methods. But how do you understand the limitations of using GPT? I’m sure no one knows everything, such as how many types of GPT sessions,what condition of it, when the start and the end triggers. OpenAI probably knows best. But he doesn’t know how to make GPT respectable. The methods in this sequence of knowledge can be used as a guide, but there is no fixed method. One day the system may change. What you specified is not available. Quickly understanding the limitations and changing the way you use it will be more helpful. Also, if you are a developer And to create anything that becomes public, you have to think in a public way. If you don’t know what your GPT is inside, Will you be able to use it to its full potential?
There is one example that was found to be the case that I brought to study GPT. A friend of mine brought it for testing. It is written in the Instruction with approximately 28,000 letters, and unknown what file it has. Their job is to have a conversation. Providing advice develop learning. Most of the prompts are written on the steps and details that GPT needs to be aware of. It works well and makes very smart decisions. Even though I know the content of the instruction, it still doesn’t work efficiently (suitable for teachers to develop teaching systems), and it’s only 1 of 6 that work together. As you can see, I know how to test GPT through instruction prompts, but I’m not the right professional to pull off what it can do.
I want to ask your opinion. what kind of custom GPT has max instruction. is it suitable? If written in plain text to explain duties using detailed step-by-step descriptions. Compared to a brief explanation using simple vocabulary. covers instead of long sentences Which is more appropriate for GPT that focuses on abstract work?
I saw you mentioned abstraction and its use with GPT, so I’m sorry to ask here.
That’s how nature should be. And I may have used the word step by step incorrectly because I typed it through the translator again. I was mean in each task has an appropriate method to explain the correctness in creating the answer. It doesn’t require that you follow a 1-2-3-4 procedure like that.
I selected my resources, converted into .txt. Obviously pdf to txt conversion broke a bit the structure so i fixed the files a bit and than i used GPT4 asking to remove spaces, broke charachters and in general to better format the file to be used as GPT Knowledge.
Result is a list of 10 .txt files that should be, for my use-case, the minimum knowledge considering the limit. I didn’t merge a lot of files because i’d like to test side by side.
Change also my GPT Instruction, starting now testing phases.