ChatGPT data flow architecture

HI Team,
When LLM is trained on local corpus, index.json gets created and saved locally. Now when I write a prompt to get a query answered, I see there are 2 different results clubbed together, One from LLM training and other from local corpus. Could you help me in data flow architecture here? How does data gets pulled from local system and merged with LLM result and shown to user? I am trying to create a architecture which shows dataflow from in-out user system.

Welcome to the OpenAI developer forum.

Can you give some more details please, some example code you are running?

Perhaps I have read this wrong or they has been a slight miscommunication, but I am not aware of json files getting created locally with any of OpenAI’s products.

Thanks for reaching out , I referred this code : How to Train an AI Chatbot With Custom Knowledge Base Using ChatGPT API | Beebom

But question is more on , i need to know how client-server interaction is taking place? There is no architecture which I am able to find and understand dataflow. Would be great if you can share some knowledge

Looks like that tutorial is using LangChain. Know that you aren’t “training” the LLM though. Try running LangChain in verbose mode and you can see the prompts it is using and how it stuffs your local data in the prompt. Under the hood it is using embeddings/vector storage, here’s an explainer from another community member:

1 Like