Tips on getting 4o to answer questions with a given JSON file?

So I have a JSON file… with about 3,000 objects in there. User objects specifically. I am just using the playground for now to ask questions about the data in the file and it keeps getting the wrong answers.

For instance… I ask it when the newest user was added based on the createdate property in the object. It gives be back a different user each time.

Also… in my prompt I tell it not to bring back any users who are not active. Then, as a test, I specifically ask about a user that is inactive… it happily gives back that user.

When I ask it why it is giving me bad answers it tells me it is having trouble processing the document. I know the document is a valid JSON file… so I am wondering if I am just missing something?


This is no different from using a big spreadsheet or CSV to get those types of very specific answers. Currently, none of the models is good at doing what would be a standard query on a ‘big’ document. Anything over 30k characters or so (in my experience) and you’re out of luck.
So consider different strategies - like using a function call to get a specific record from the that JSON instead of one big json with thousands of records.

See this discussion for a few other strategies that could including using code completion (again, IMO better to use a function call ) The Fortune 500 list challenge!

The Forture 500 list is on 500 rows and 10 or so columns.

1 Like

Convert json file to a structured table.
Use the prompt to convert your query into SQL or some other query language relevant for your data structure.
Use the prompt to convert the result into natural language for the response.


Try this:

  1. Create an indexed JSON file. make the first column an index column
  2. Use this prompt: I provided a JSON file with [NUMBER HERE, FOR EXAMPLE 3,000] indexed entries; each entry relates to [PROVIDE CONTEXT HERE, FOR EXAMPLE IF YOU HAVE 3,000 ORGANIZATIONS, USE THE WORD "ORGANIZATION " HERE]. We have the following elements for each entry in the JSON file array: LABEL1, LABEL2, LABELN \n\n You must look at every single indexed element before answering my question\n\n ASK YOUR QUESTION HERE\n\nYou must structure your response as: INSTRUCTIONS HERE.

We have used this approach with up to 10,000 rows and 10 columns. It does not mean it cannot work with larger datasets, It is just what we have tested.

We have tested it with GPT 4 128K and Claude Opus 128K… Claude Opus 128K is consistently better for this particular case

What a great approach, I am using a similar approach for lengthy datasets and achieving great results too. Will try yours with extra detail defining the arrays. Thank you!

I’m not sure that a LLM is already good enough to handle those type of record processing.

So for that work why not stick to the “standard/traditional” data handling tech and trying to convert the json to sqlite table or pandas dataframe that can be just in memory so can execute whatever query over it always getting the right answer?

Maybe you can even ask to an assistant that using code interpreter tool do it for you (get the json and put it into sqlite or pandas and then answer your queries). In that way the LLM will do a better job only generating the adequate code to extract data from the data structure.