Generating summary insights over large quantitative data

I have been trying to explore how to use GPT-4 to summarize insights from large data sets. One of the exploration areas is to generate insights for sales people on sales data. A typical data set would be thousands of rows because it would be across hundreds of customers and hundreds of product SKUs. It does not need to be real time, where a salesperson asks a question and then the insight is generated, rather it would be more about creating summary insights that humans can look over and refine further. Typical insights would be around who are the growing customers, what are the customer movements across products, interesting changes in growth rates across different cuts of the data etc. Could also be around error detection and oddities in the data. I understand that one could use traditional machine learning techniques and analytical AI to generate such insights, but I’m looking at this as a stepping stone to providing some interactivity in the future and also reducing the workload of generating written reports that summarize key trends.

From whatever I have read till now it seems that one could leverage embeddings to do this, but it’s still quite restrictive on the size, I could summarize the data into useful cuts and maybe use that but still would be expensive to run.

I am looking for any ideas or threads to pursue and ideas to explore, not necessarily a baked solution that I can directly leverage. Appreciate any help!

What are the main issues you’re experiencing when trying to use GPT4? Trying to better characterize your problem so I can think of the best solution

Did you end up with a solution to your problem?