A model with a better understanding of the grid structure

Hello

I’m building an Excel add-in similar to ChatGPT within Excel. A user can ask the tool, “What cells contain the value 300 in A1:D100?” The tool then sends the values of A1:D100 (e.g., “The values of A1:D100 are [[100, 200, 300, 400], [200, 300, 400, 500], …, [1, 2, 3, 4]]”) to the OpenAI API along with the question and expects an answer.

I’ve noticed that even for such a simple question, the model can return incorrect answers, especially when dealing with large datasets. For instance, it may return a cell whose value is not 300 or fail to return all the cells that contain 300.

The questions users can ask vary greatly (finding a cell based on a value is just one use case out of a thousand), so it is essential for the model to have a strict understanding of the worksheet grid structure and spatial awareness to accurately address all users’ queries (improving accuracy solely for finding a cell based on a value is far from sufficient). However, I suspect that it does not. This understanding includes, for example:

  • A worksheet has columns labeled from A and rows numbered from 1.
  • There is one column between Column A and Column C.
  • A worksheet is a two-dimensional array of cells.
  • A workbook consists of multiple worksheets.

Does anyone know which techniques I could use to improve the model’s accuracy in this context?

Thank you