Can OpenAI Process and Retrieve Data from Deeply Nested JSON

123s · February 11, 2025, 10:56am

I am building a search application using RAG and need to process deeply nested JSON data. It would be really helpful if anyone could guide me on whether I can chunk the JSON to find similar matches or if there’s a better approach to handle this. Additionally, I would love to know if OpenAI can understand deeply nested JSON structures and provide accurate responses when queried. Any insights would be greatly appreciated!

jochenschultz · February 11, 2025, 6:23pm

Hey,

this does not make sense at all.

If you want to search in structured data you can use normal programming logic. Why waste time, money and energy to use a LLM when you already have structured data.

Munna23 · February 11, 2025, 7:07pm

@123s - As @jochenschultz mentioned it is not required to use AI. If you wish to do so - Isolate the logic to classify the top level JSON using AI, extract it using programming logic, perform RAG and repeat. Alternatively, If RAG is the sole purpose and context is the primary need:
a. Chunk the JSON data into smaller, manageable pieces
b. Convert these chunks into string format
c. Perform a simple RAG operation using these converted strings… Note: you may use programming logic to highlight levels of JSON, key-value pairs and see how it performs.

Hope this helps! Cheers!

Munna23 · February 11, 2025, 7:48pm

Beyond stupidity? If I have a Key for user comments. I want to query across comments of all users. Would DB querying still work for this to parse on those comments? On what factors would you query when intent plays a role?

jochenschultz · February 11, 2025, 7:54pm

that would be a proper usecase for llm

Munna23 · February 11, 2025, 8:21pm

Funny, I assumed SDEs were supposed to think through all cases

jochenschultz · February 11, 2025, 8:26pm

Who told you that?

Are you programming a “bull collision avoidance system” into a moon lander because there might be a herd of flying cows in space?

jochenschultz · February 11, 2025, 8:26pm

ah wait I forgot the emoji

Repeat after me: for similarity search you don’t use a GPT!

There are sentence transformers if you insist on using AI stuff.

123s · February 12, 2025, 5:26pm

@Munna23 , @jochenschultz

Thank you for your responses. I completely understand that LLMs are preferred for unstructured data. I may not have conveyed my context clearly…
However, in my case, the backend data is stored in JSON format, and assume I need to implement a search function on top of an API documentation platform. Since the backend data is structured as JSON rather than plain text, I need to chunk the JSON data effectively to enable efficient searching within the documentation. Here, my query will be a natural language query based on intent. (eg: endpoints for specific scenerio)

Even, I was trying the logic which you have mentioned here. Just, checking is there any alternative approach for this…

manu3b · February 12, 2025, 5:41pm

A great example is Home Assistant, which includes its entire YAML data structure in the prompt. However, the results are somewhat mixed:

jochenschultz · February 12, 2025, 6:00pm

You can use LLM to create a RDBMS database structure and put the structured data into it and you combine entries with a graph with multiple subgraphs and add embeddings.

Data is context - providing context to the LLM is key to get good results. So you need to build a software that prompts the LLM correctly by selecting which data it needs and which not and which might confuse it (you can see that in the graph and you can tell it not to go that way again if it fails).

Here is something I made to demonstrate it on a smaller dimension…

The “System prompt” is generated per Chat Message - which reduces the cost for the LLM drastically as well…

Would be a very good base for HomeAssistant as well btw…

The only problem I see is that it has extreme complexity (postgresql, so you need SQL knowledge, postgis so you need to know how geoinformatics work, pgvector, neo4j, minio, rabbitmq, python, php (Symfony, api platform), typescript (vue3 with pinia - woah I love that pinapple state), golang and 50% of it consists of shellscripts to automate the infrastructure) I guess once the system itself is analyzed that will be easier… Then the devs can just get an issue and the relevant code parts are highlighted and a course is generated that onboards you to the task.

Topic		Replies	Views
Best approaches for querying JSON files with LLMs API gpt-4 , rag , assistants	5	107	May 30, 2025
JSON, AI and the future. This AI enthusiast needs your expert insight please... pretty please Community api	0	303	February 27, 2024
Building a RAG App (as a noob) API gpt-4 , rag	8	943	November 21, 2024
AI Search using big ammount Data without VECTOR Prompting chatgpt , assistants-api	3	281	November 27, 2024
What is the ideal approach to create semantic search on custom data that balances costs as well? API api	3	1120	February 7, 2024

Can OpenAI Process and Retrieve Data from Deeply Nested JSON

Related topics