I have a tabular dataset [date, username, category, numericalValue], and I’m trying to perform accurate math on the numericalValue by date, username, category.
I can get the answer with the Assistants API, but I’d like to do this with just the API due to cost and flexibility.
I haven’t seemed to have accurate enough results using embeddings, since it it numeric, and the LangChain agents aren’t accurate enough either.
Any advice on performing RAG on my numerical data accuratly enough to do math?
You could consider a few functions and use tool calling, or a text2sql approach.
For the former, since you already know the slices of data you want, you could already define the functions. For example, you can write a function that does group by var
x , then sum the numeric column. This function can be done using a dataframe manipulation library like Pandas).
For the latter, you could describe the dataset and query to the GPT model & ask it to write SQL query. You can then execute this SQL query.
Although the data and purpose is not made clear, if the assistant can do it successfully, likely with code interpreter enabled for python computation, than an AI can also write code you can execute yourself.
Here’s a conversation I had with AI to make simulated data like you describe, do some math, and return results based on that, results which can be further employed or saved by code modification.
The alternative is providing a calculation function to the AI, or even your own python execution environment, where such calculations can made accurately.
The AI is a language predictor, it doesn’t do math well on its own.