Analysing Big Data (CSV) via OpenAI API

So I know you can use ChatGPT4 to have ChatGPT load csv files and datafiles stored online. However, that is done via the ChatGPT interface and Plugins.

Does anyone know of a way, inside a google sheet, to have a script send data from the sheet, into OpenAI for analysis.

For example, your working in a google sheet with 1,000 rows of data. Have a good script run within the sheet, which sends the data to OpenAI along with specific questions.

Anyone?

1 Like

Maybe something like this is what you’re looking for? https://workspace.google.com/marketplace/app/gpt_for_sheets_and_docs/677318054654

Thank you, yes. But rather than using a plugin-in for chrome, Im wanting to script it in a google-sheet script, so that it can be scheduled and integrated within other APIs

There’s a ton of articles and YouTube videos available if you Google for “google sheets OpenAI api”

I have one on my GitHub that makes spreadsheets and flowcharts in terminal… it might be able to be modded for Google sheets. I think this is it… (I’m at work so if it isn’t and I remember I’ll post a different one) GitHub - 0m364/Flow: This is intended to retrieve values and variables and create a flowchart

That really! doesn’t make any sense as long as the csv is well formatted.
Import it into a database and use SQL.

A CSV is a structured format.

Try this:

I mean, just as a disclaimer: do not run that in production. The input data must be validated and you need to add some kind of authorisation. But that’s - on an abstract level - how you could utilise the model.

May have missed the point here but turning your request on its head have you thought of bringing GPT’s analytical power into the sheet rather than sending large volumes of data to GPT? After all would not sending so much data quickly eat up the token limits.

Has anyone cracked this? how we can use large CSV dataset and then train open-ai model to refer its data?

Hi and welcome to the developer forum!

I think there are two fundamental issues here, the first is the nature of the tool, a Large Language Model is a token based processor, number processing is several layers removed from the basic comprehension of the model. LLM’s can do basic numeracy and even demonstrate impressive mathematical prowess, but they are still statistical in nature, and as with humans, they can make mistakes that traditional deterministic software does not.

The second issue is that a spreadsheet, by it’s nature, is a large collection of often disparate values that interact with each other over large spaces. LLM’s need to have points of attention, trying to get some kind of system other than full contextual awareness does not seem likely given the current technology. I’m willing to be proven wrong, but I don’t see multi megabyte spreadsheets being a useful application for LLM’s in the next 6 months at least.

(additional)

Having thought about it, this is actually in the realm of probability for Code Interpreter, asking the model to use tools to process a spreadsheet is possible. Worth investigating, I’d use CSV format files to give the model the best chance.

I have build a course creator based on GPT-3.5 the last two days.

It can create courses where you explain the task and it will give stuff like this (it actually starts at your evaluated skill matrix - so it could even start by giving you a course in basic variable types or on how to learn python upfront until you reach the goal you want to achieve):

Here is the task to predict house prices from a csv (but with an own model).

Gonna check out if I can get a good course for this CSV data the next days and put it online then.

Hint this course was created on this skill matrix:

Python: 4
Flask: 4
NumPy: 2
Pandas: 2
SQL: 4

1 Like

Just wanted to bump this to say, I would think the best way to do this is through code interpreter. The sheet in question is well within the size of the file upload limit. I’ve used it for exactly this sort of thing, (e.g., GPT-4, tell me x, y, z about this file). If you’ve not tried it before, I would recommend it, honestly, it is way more powerful than whatever you are probably envisioning.

There is the limitation of concern about data privacy, but I guess you aren’t worried about that if you were intending to send the data directly to GPT-4 anyways.

I think for those concerned about data privacy, the new ChatGPT Enterprise offering will be a great addition to the line up. I can see that being quite a revolution in terms of basic to mid level data visualisation and processing done in lots of companies.

1 Like

Absolutely. I think one barrier though is that rudimentary Python skills have become kind of an equivalent to Excel chops ten years ago. The Code Interpreter is really incredible, but I don’t see it as truly being a No Code implementation, or at least, my co-workers who don’t know how to code seem to really struggle to use it well.

I’m not certain about the specific type of data in your CSV file. However, for those considering using ChatGPT for analyzing and categorizing extensive CSVs filled with unstructured text, such as comments and feedback, you should reconsider relying on the Code Interpreter (Advanced Analysis). Despite its excellence in data summarization and general analytics, it falls short in accurately discerning sentiment and the more nuanced facets of language.

For those facing the need to dependably categorize and scrutinize individual pieces of text, I recommend employing a Python script. In this script, you would tokenize your data (using tiktoken, for exemple) and execute the appropriate API requests via chat completions.

Let’s be clear: this is far from a simple task. My own script has expanded to a hefty 2000 lines, yet the outcomes justify the effort. There are no quick-fix solutions here. Additionally, ChatGPT can be a valuable resource in devising the structure of your application.