Classification question


Totally new here. I am working on a project to classify text statements as question or not a question.
Here is what i am doing now.

  example_question = [
                            ['How are you?','question'],
                            ['What is your name?','question'],
                            [' My name is Bob', 'not a question'],
                            [' My cat is running.','not a question'],
                            ['What time is it','question'],
                            ['Can you fix this sentence for me','question'],
                            [' This is my house', 'not a question'],
                            ['Is this your house', 'question']

Then i am using something like this call the openai api.

 response = openai.Classification.create(
                examples = examples_question,
                labels = ["question", "not a question"],
                query = sentence,
                search_model = "davinci",
                model = "davinci"

When i run this for 100s of tweets, i don’t see the classification performing well.
Statement like this “Too bad that he was injured, they would have won if he was able to play! Glad the surgery went well!” is getting classified as question.

I am pretty sure i am making a mistake here, should i be giving more example to train here?
I am pretty new to ML in general, so if someone could help me out , that would be great.


Are you using only the provided questions for training, or do you have a larger training set? As @m-a.schenk says you would have to look at the overall score, not on specific classification errors even though they might seem trivial. What accuracy do you obtain?

You’ve run into the same problem that countless researchers have while studying dialog acts! The problem with identifying interrogatives is that tone of voice matters.

This is a house.
This is a house?
This is a house!

All three are correct and common. The short answer is that you need either punctuation or context to determine dialog act with absolute certainty. Accordingly, dialog act classification works better if you include multiple sentences (preferably before and after).


Add punctuation to the following chat logs:

Sally: This is my house
Johnny: Are you sure this is your house
Sally: Yep, sure is

Chat logs with punctuation added:

Sally: This is my house.
Johnny: Are you sure this is your house?
Sally: Yep, sure is.

Also, in your example, something can only be “not a question” if it starts with a space

Thanks a lot. I didnt know about the overall score. I will check that.

@molleresa I am only using the provided question for training, i will get back to you on the accuracy.

@daveshapautomator this is really good to know for a newbie like me. Thanks a lot for your thoughtful reply

@NeoGenAI ah, i be dammed if the accuracy increases because of that. Let me fix that and run it again. Thanks. Good eye

