Context:
We are using OpenAI’s CSV agent with GPT-4o-mini to answer questions based on a CSV file that contains data up to 2024. The agent performs well for queries related to 2023 and earlier, but when asked about 2024 data, it often responds:
“I’m trained up to October 2023.”
What We Tried:
- Explicitly Mentioning the Data Coverage
- We added instructions like:
“You have been provided with data up to 2024. Use only the CSV file to answer questions.”
- Result: The model keeps “thinking” indefinitely and does not respond.
- Providing 2024 Data Separately
- We structured the CSV so that 2024 data was separated and added it explicitly in the prompt.
- Result: The model does answer questions, but only about 2024 and only if the query explicitly references it. It does not reason across years or behave like an agent.
Expected Behavior:
- The model should integrate the provided CSV data (including 2024) and use it for reasoning.
- It should not default to its pretraining cutoff if the relevant information exists in the CSV.
Questions
- Has anyone else experienced this issue with GPT-4o-mini in the CSV agent?
- Are there any workarounds to ensure the model properly reasons over CSV data, including newer years? (We need reasoning abilities, hence RAG option also failed)