How do I force the assistant to read all the content in the file being used for retrieval

I’ve been doing several experiments with the Assistants tool having the “Retrieval” option enabled. I’m struggling with problem #2 (see below).

Some problems (and some solutions) I found:

1. Message saying “cannot read/access the file” (which is not True)
Problem: Many times I got answers such as " I’m not able to process Python files directly." or “File is not accessible”, " The file you provided isn’t accessible with the myfiles_browser tool," etc.
Solution: This seems to be just a “bug” since If I ask about specific content inside the file the assistant is able to answer most of the times. So, just ignore the message. It might help if you name the ID of the file, but I’m not sure the results are consistent. See more here: Assistant api, retrieval file api is not working - #20 by

2. Questions about structured data
Problem: Let’s say I upload a JSON file representing 20 books, including name, ISBN, category, etc. And then I describe a type of “Reader”. For example, “Peter likes comic books that talk about super heroes”. Then I ask the assistant “Which book is more appropriate for Peter to read?”. The answer here is very inconsistent. Sometimes it replies with the first result, and says it cannot read the rest of the file. If I ask to list all the books the assistant knows, it doesn’t work either.

Solution: I don’t know! :frowning:

How can I force the retrieval tool to go through the entire file? I’ve tried this with many different formats including JSON, TXT, MD, PDF, CSV, and Python. It seems that having a file per book might help but this is far from ideal…what is the most recommended format to upload structured data?

Thank you!!

Btw, for JSON, it only works if the file is not too small [didn’t have time to investigate what is the minimum limit though].

1 Like

Having a similar issue with reading a file, “It looks like I can’t directly execute code or access the file content here”. Trying the solution of simply reminding it that it can read files, haha :roll_eyes:

On #2, tell it that you’ve uploaded a JSON with content it should reference. What model are you using? GPT-4 does much better on processing provided context and finding facts within it. I switched from 3.5 to 4-preview and it really improved the information retrieved.

I think JSON is going to provide you a lot of flexibility. One benefit is that it’s a very consistent format which has built-in relationships (key:value) which, I assume, GPT can “intuitively” understand.