Can GPT3 do some classification base on the content of the sentences?

penggong6688 · January 22, 2022, 6:41am

Hi, I am pretty new to GPT 3. I am working on a project to try to classify thousands of sraped review data into about 150 categories based on their meaning. Can GPT3 help me to achieve that?

sps · January 22, 2022, 6:54am

Hi @penggong6688,

In short, yes. GPT-3 can do that. Here the classification guide: https://beta.openai.com/docs/guides/classifications

The Classifications endpoint (/classifications) provides the ability to leverage a labeled set of examples without fine-tuning and can be used for any text-to-label task. By avoiding fine-tuning, it eliminates the need for hyper-parameter tuning. The endpoint serves as an “autoML” solution that is easy to configure, and adapt to changing label schema. Up to 200 labeled examples or a pre-uploaded file can be provided at query time.

sps · January 22, 2022, 7:40am

Agreed. I don’t know much about embeddings, since I haven’t used it. Going to try it myself.

penggong6688 · January 23, 2022, 5:56am

Thank you so much for your reply! sps!!
So in the endpoint AutoML solution, I can upload 200 labeled examples to train the model? and then us the model in my own dataset? In my data, I have about multiple million records would like to classify in those probably 150 classes. Another question, those 150 classes is what we think those multiple million records belong to, is OpenAI GPT-3 can do some unsupervised machine learning to tell us besides those 150 classes whether there are other topics we can cluster? I tried myself to use the embedding and K-means but not that meaningful.

penggong6688 · January 23, 2022, 6:15am

Thank you so much m-a.schenk !

What I am trying to do is to classify my multi-million records and find some new classes if they are not in those 150 classes. Do you mean the new function embedding in OpenAI will be available in a few days?

sps · January 23, 2022, 4:58pm

That’s an interesting question if we can do some unsupervised learning on our data using the classification endpoint. If my understanding is correct then it’s not possible at the moment because it needs examples of labels. Maybe, a label ‘other’ could be created to group all the data that doesn’t belong to any predefined labels, but how it would be achieved and by what examples is another interesting question.

lmccallum · January 23, 2022, 7:24pm

I would use embeddings. Get the embedding for each of your n million records, and do the same for each of your 150 classes, and then get the similarity scores to measure how close each record is to each class, semantically speaking. You’ll have n million times 150 scores. (That’s a lot - I am not sure of the cost or speed implications.) Then you can assign each record to a class based on the highest similarity score. If the highest and second highest similarity scores are quite close, manually review those ones or assign them to more than one class, if that works for your use case. If the highest similarity score is notably lower than the average highest similarity score, that’s a red flag that a new class may need to be created. If there are lots of those, you can probably cluster them into groups based on semantic similarity, then extract some keywords for each cluster to help develop the new classes.

sps · January 24, 2022, 5:14am

Thanks @lmccallum,

Agreed. This is a great explanation on how to best use embeddings in this scenario, both cost-wise and given the amount of data that needs to be processed.

penggong6688 · January 24, 2022, 11:00pm

@lmccallum Thank you so much! I tried k means before by using embeddings but that is not that helpful. Your solution sounds great. Do you think it is available to use GPT-3 or openAI to solve the problem? Again, thank you so much!

ted-at-openai · January 25, 2022, 11:55pm

If you know Python:

Here’s an example of classification using embeddings: openai-python/Classification.ipynb at main · openai/openai-python · GitHub

And here’s an example of classification using fine-tuned completions: openai-python/finetuning-classification.ipynb at main · openai/openai-python · GitHub

Topic		Replies	Views
Sentence Classification solution API	4	1579	March 4, 2024
GPT-3 for custom dataset classification with custom labels API	1	1232	April 11, 2022
How do I handle a large number of classes for classification API	12	3306	May 28, 2024
Use OpenAPI for supervised classification task API	4	1866	March 26, 2023
API choice for research question API	6	211	July 4, 2024

Can GPT3 do some classification base on the content of the sentences?

Related topics