Extracting key insights from a dataset

alexrazen · July 11, 2022, 5:14pm

Hey everyone! I was curious if anyone has tried to use GPT3 to extract key insights from a dataset?

For instance, if I wanted to extract the high level feedback from a csv file consisting of customer reviews, how does one use the fine tune feature to extract insight from this file? How do I set up the training file?

Thanks!

daveshapautomator · July 11, 2022, 6:15pm

You wouldn’t necessarily want to use finetuning. If you post a few examples I’ll show you how I would proceed.

alexrazen · July 11, 2022, 6:43pm

Here is a screenshot of a sample of data. Suppose for this specific dress, I wanted to extrapolate customer thoughts on it - overall sentiment, what the pros and cons of the dress are, etc.

Hope that helps!

SaturnProductions · July 11, 2022, 6:56pm

model = text-davicini-002 , temperature = 0.7 , maximum length 100
input

Example Data set:
{"prompt": "<user review of chilis>", "completion": "<average>"}
{"prompt": "<user review of jacks>", "completion": "<sucks>"}
{"prompt": "<user review of arbys>", "completion": "<average>"}
{"prompt": "<user review of jacks>", "completion": "<average>"}
{"prompt": "<user review of jacks>", "completion": "<excellent>"}

from the Example Data set above, write how many reviews there are of jacks that are average or below:

example output

There are 2 reviews of jacks that are average or below.

daveshapautomator · July 11, 2022, 7:07pm

Okay, these are relatively short so what you can do is aggregate them in chunks. Since I can’t copy/paste your data I had GPT-3 synthesize some for me:

You can see that this chunk of 20 reviews is only 176 tokens, so we have a lot of room to work with. Next I will ask GPT-3 to summarize these reviews into key points. I clipped the chunk of reviews after generating the output just so it would all fit in a screenshot.

You could then recursively summarize multiple chunks. Say, you have 400 reviews. What I would do is split that into 20 chunks of 20 reviews each, summarize each one, and then do the same thing as here - merge the summarizations together. Make sure the prompt says “very detailed” for instance and something like “don’t sacrifice data”. You might even switch from “summarize” to “combine and rewrite these reviews” or “consolidate”, although GPT-3 often doesn’t understand the imperative “consolidate” that well.

Maybe I’ll make a video about this because it’s such a cool usecase…

alexrazen · July 11, 2022, 7:41pm

Hmm so I was thinking about putting it into chunks but that would not be very cost effective. The number of customer reviews for a product could go up to 10,000. Idk how well GPT-3 will synthesize summaries of summaries of summaries etc.

alexrazen · July 11, 2022, 7:42pm

Is this the right way to set it up? What if I asked it something like “what are people saying about jacks restaurant?” Will it know to filter for only jacks restaurant?

daveshapautomator · July 11, 2022, 9:39pm

Cost optimization comes after you get the desired result. You can refactor any number of ways. For instance, you could use clustering to find general types of reviews and then perform inference on just a few from each cluster.

daveshapautomator · July 13, 2022, 12:57pm

This thread inspired me to make a video. I could follow up on this automatic extraction of insights to generate RFEs, feature requests, or user stories.

Topic		Replies	Views
Generating summary insights over large quantitative data API	2	1909	December 29, 2023
Use "private" dataset as basis for AI responses Prompting	29	2798	December 16, 2023
How to improve a fine-tune classifier? Prompting	10	1405	August 15, 2022
To fine-tune or not to fine-tune? that is the question Prompting	8	2441	November 21, 2022
How do I summarise a block of text larger than the token limit? API	13	9143	December 17, 2023

Extracting key insights from a dataset

Related topics