Extracting key insights from a dataset

Hey everyone! I was curious if anyone has tried to use GPT3 to extract key insights from a dataset?

For instance, if I wanted to extract the high level feedback from a csv file consisting of customer reviews, how does one use the fine tune feature to extract insight from this file? How do I set up the training file?


You wouldn’t necessarily want to use finetuning. If you post a few examples I’ll show you how I would proceed.

Here is a screenshot of a sample of data. Suppose for this specific dress, I wanted to extrapolate customer thoughts on it - overall sentiment, what the pros and cons of the dress are, etc.

Hope that helps!

model = text-davicini-002 , temperature = 0.7 , maximum length 100

Example Data set:
{"prompt": "<user review of chilis>", "completion": "<average>"}
{"prompt": "<user review of jacks>", "completion": "<sucks>"}
{"prompt": "<user review of arbys>", "completion": "<average>"}
{"prompt": "<user review of jacks>", "completion": "<average>"}
{"prompt": "<user review of jacks>", "completion": "<excellent>"}

from the Example Data set above, write how many reviews there are of jacks that are average or below:

example output

There are 2 reviews of jacks that are average or below.

Okay, these are relatively short so what you can do is aggregate them in chunks. Since I can’t copy/paste your data I had GPT-3 synthesize some for me:

You can see that this chunk of 20 reviews is only 176 tokens, so we have a lot of room to work with. Next I will ask GPT-3 to summarize these reviews into key points. I clipped the chunk of reviews after generating the output just so it would all fit in a screenshot.

You could then recursively summarize multiple chunks. Say, you have 400 reviews. What I would do is split that into 20 chunks of 20 reviews each, summarize each one, and then do the same thing as here - merge the summarizations together. Make sure the prompt says “very detailed” for instance and something like “don’t sacrifice data”. You might even switch from “summarize” to “combine and rewrite these reviews” or “consolidate”, although GPT-3 often doesn’t understand the imperative “consolidate” that well.

Maybe I’ll make a video about this because it’s such a cool usecase…

1 Like

Hmm so I was thinking about putting it into chunks but that would not be very cost effective. The number of customer reviews for a product could go up to 10,000. Idk how well GPT-3 will synthesize summaries of summaries of summaries etc.

Is this the right way to set it up? What if I asked it something like “what are people saying about jacks restaurant?” Will it know to filter for only jacks restaurant?

Cost optimization comes after you get the desired result. You can refactor any number of ways. For instance, you could use clustering to find general types of reviews and then perform inference on just a few from each cluster.

This thread inspired me to make a video. I could follow up on this automatic extraction of insights to generate RFEs, feature requests, or user stories.