Best practices to help GPT understand heavily nested json data and analyse such data

This sounds like a difficult use case. RAG is already relatively finicky - LLMs tend to make up answers if the RAG results don’t have the answer. And since you’ll need to break down the input somehow (like @David_Blair said, 50mb is way too long for a single message), most of your RAG calls will be empty: “nothing in JSON wireshark chunk 1,” “nothing in JSON wireshark chunk 2,” etc. Depending on how the JSON is structured, you will probably also need to reconstruct the JSON hierarchy across chunks, so in chunk 14 you indicate where in the JSON hierarchy you are. Sounds tough, would love to hear if you get the project off the ground!

On a different note, 3.5 is very unreliable in outputting JSON, but I haven’t used it for analyzing JSON input. You might try 4o-mini, which is quite cheap and is more prepared for JSON.

2 Likes