Input exceeds the context window of a model

Hi,

I’m wanting to integrate with the response api and allow my user to have a llm interface to ask questions about their data.

The data is stored in database and they are quite big causing the context window limit to exceed when providing context.

Am wondering what is the best way to handle this. Ultimately I just want the llm to have access to the data to help with answering question user may have.

Hi,

Welcome to the forum.

You can chunk the data (ie send the data in smaller more manageable pieces over multiple requests).

Retrieval-Augmented Generation (RAG): Utilize RAG to dynamically retrieve relevant data chunks in response to user queries, reducing the need to process the entire dataset at once.

  • User asks question in natural language

“How many users signed up in April?”

  • LLM interprets schema & query goal

SELECT COUNT(*) FROM users WHERE signup_date BETWEEN ‘2025-04-01’ AND ‘2025-04-30’

  • Your system runs the query, gets result

e.g., 437

  • **LLM generates secondary query for details (Chunking)

SELECT id, name, email, signup_date FROM users
WHERE signup_date BETWEEN ‘2025-04-01’ AND ‘2025-04-30’
LIMIT 100 OFFSET 0;

  • LLM uses result to create natural reply + Chunked Data

“437 users signed up in April.”+Page 1 Data
“Alice (alice@email.com), signed up on 2025-04-02
Bob (bob@email.com), signed up on 2025-04-03”

GPT-4.1 offers a very large context window (~1 million tokens), making it ideal for reading or analyzing large documents efficiently.

Other users will be able to enhance this but this should get you started.

I’m curious to know if there is a way to implement a feature similar to “uploading a json file” like with ChatGPT which gives LLM access to the file content for the chat session without pasting the content in the input text to avoid exceeding input tokens.

As my ideal user experience is when chat is initialized:

  • My system will send the LLM a json file (which consist of the metadata) with all the context
  • LLM analyse the file and be ready for questions related to the metadata
  • User ask question in natural language
  • LLM respond with answer/query if additional data (the data is not included in the initial metadata as there may be lots of data so I want to only provide the relevant one when needed and have the LLM tell me what’s relevant)
  • My system will query for the data and feed it back into the LLM (ideally this also comes in the uploading a csv file format)
  • LLM uses result to create natural reply

Is this possible?