GPTs - best file format for Knowledge to feed GPTs?

Don’t forget to check the “Code Interpreter” checkbox.
If you don’t, your GPT will probably work, but I’m not sure if it’s really checking the knowledge files.

I solved most of my issues (I’m using CSV file format)

GPT can see the knowledge with code interpreter disabled.

I checked and you’re right.

My problem was that I was using CSV files, that seems to require Code Interpreter to be used.

Foo-bar, davidthomasheider, Openai, et all,

I am working with regulating code trying to find the best file, format, preparation & cleaning to improve the quality and speed of knowledge files.

Since the various regulatory codes is usually difficult for humans to understand and it’s time-consuming to find related codes. As regulatory codes are text based sentence structures with references as Chapter, section, subsections which are used with references impeded into the sentences throughout the code referencing other specifics.

as you can see this intricacies make reading the code a complex web thats difficult for the GPT or API to understand.

im thinking maybe the code needs to be formatted into a .XSLX file with the sections in one column and the text / sentences in the next column… just seems like too much prep and openai might just have to increase capabilities on their end??

Any input on this matter would be greatly appreciated!!

Zakaria, my regulatory code is similar to academic papers but maybe harder to follow without the code references… or the code references make it harder?? IDK how the back end works but I would like to know so we can get this working better.

1 Like

Indeed, gaining more insight into the backend is essential. When I inquired ChatGPT about its training and which formats are most effective, I learned that .xlsx files are generally preferred due to their structured nature. In contrast, PDF files can be cumbersome, especially when extracting information from tables, images, and graphs. You could try with the .xlsx files, but like you said, it requires a tremendous amount of work, so it’s counterproductive, imo.

I modified some PDFs to isolate only the essential information (but also too much prep :rofl:), and this approach seemed to enhance the performance of custom GPT models. Also, the way prompts are configured plays a significant role. It’s often beneficial to set them up so that the model first searches through its knowledge database, sometimes even referencing the title of a specific PDF.

Could you elaborate on this please? Are the articles hard coded into the spreadsheet or linked? If hard coded, what kind of file size are we looking at?