Text classification with OpenAI

We’ve tried classifying text inputs, generally in the form of 1 to 3 words, such as “Email and Chat Support” into one of the Google Material Icons (such as “up_trends”, “thumb_up”, “email”, etc…)

To do so, we used two solutions :

  • By using the classification API with 200 labels and around 20 examples => 40% of the cases are returned as “Unknown”. We’ve tried all engines and search models.

  • By using the prompts, and training the prompt into using one of the 200 labels:
    “Read these icons then classify the Text into one of the icons: search, home, pets, settings,…”
    => 30% of the times, the AI invents a new icon that doesn’t exist. Tried curie / davinci / including beta

Has anyone found a way to do this reliably ?

I was thinking about using broader classes for labels, such as “House”, “Persons”, “Like”, “Office” etc

Can you provide some more detailed examples? It’s not clear what your data or labels are.

  • As for the classification, we used the following parameters :

Temperature from 0 to 1
Examples:

“examples”: [

        ["Premium Support", "support_agent"],

        ["Powerful API", "computer"],

        ["Easy to use", "emoji_emotions"],

        ["More Affordable", "attach_money"],

        ["Modern Kitchen", "kitchen"],

        ["Business", "storefront"],

        ["Fast", "bolt"],

        ["Passionate team", "person_outline"],

        ["Free Parking", "local_parking"],

        ["Trial", "verified"],

        ["Get Paid", "paid"],

        ["Increase", "trending_up"],

        ["Get more", "trending_up"],

        ["Shop", "shopping_cart"],

        ["Money back Guarantee", "verified"],

        ["Satisfied or refunded", "verified"],

        ["Beach and Sun", "beach_access"],

        ["Settings and parameters", "settings"],

        ["Profile", "account_circle"],

        ["Help", "help_outline"],

        ["Fingerprint", "fingerprint"],

        ["Description", "description"],

        ["School", "school"],

        ["Social networks", "facebook"],

        ["Phone call", "phone"],

        ["Email", "email"],

        ["Swimming pool", "pool"],

        ["Best Hotels", "pool"],

        ["Premium Italian Food", "restaurant"],

        ["Best Restaurant", "restaurant"],

        ["We love dogs", "pets"],

        ["Roadside Assistance", "help"]

For the labels:

“labels”: [“pets”, “search”, “home”, “settings”, “done”, “account_circle”, “info”, “check_circle”, “delete”, “shopping_cart”, “favorite”, “face”, “visibility”, “logout”, “fingerprint”, “description”, “favorite_border”, “lock”, “help_outline”, “language”, “schedule”, “manage_accounts”, “thumb_up”, “close”, “menu”, “expand_more”, “arrow_back”, “chevron_right”, “arrow_drop_down”, “cancel”, “more_vert”, “arrow_forward”, “add”, “add_circle_outline”, “add_circle”, “send”, “content_copy”, “clear”, “save”, “mail”, “link”, “filter_list”, “remove”, “insights”, “create”, “remove_circle_outline”, “bolt”, “sort”, “inventory”, “flag”, “remove_circle”, “reply”, “add_box”, “person”, “facebook”, “notifications”, “groups”, “people”, “share”, “person_outline”, “school”, “person_add”, “public”, “emoji_events”, “group”, “notifications_active”, “engineering”, “construction”, “people”, “psychology”, “health_and_safety”, “travel_explore”, “emoji_emotions”, “group_add”,“edit”, “navigate_next”, “photo_camera”, “image”, “picture_as_pdf”, “tune”, “circle”, “receipt_long”, “timer”, “auto_stories”, “navigate_before”, “add_a_photo”, “auto_awesome”, “collections”, “remove_red_eye”, “palette”, “music_note”, “wb_sunny”, “brush”, “flash_on”, “euro”, “email”, “location_on”, “call”, “phone”, “chat”, “business”, “mail_outline”, “vpn_key”, “list”, “qr_code_scanner”, “chat_bubble_outline”, “alternate_email”, “chat_bubble”, “forum”, “textsms”, “contact_mail”, “sentiment_satisfied”, “person_search”, “message”, “qr_code”, “comment”, “file_download”, “file_upload”, “download”, “folder”, “local_shipping”, “place”, “menu_book”, “map”, “local_offer”, “category”, “badge”, “restaurant”, “directions_car”, “volunteer_activism”, “local_fire_department”, “storefront”, “apartment”, “fitness_center”, “spa”, “business_center”, “house”, “meeting_room”, “cottage”, “corporate_fare”, “ac_unit”, “family_restroom”, “checkroom”, “other_houses”, “grass”, “all_inclusive”, “airport_shuttle”, “child_care”, “pool”, “beach_access”, “kitchen”, “roofing”, “holiday_village”, “computer”, “payment”, “attach_money”, “local_parking”, “support_agent”, “thumb_up”, “event”, “filter_alt”, “verified”, “dashboard”, “list”, “calendar_today”, “login”, “lightbulb”, “question_answer”, “date_range”, “help”, “paid”, “trending_up”, “article”, “account_balance”]

We tried all of the models and search_model

We have too many results that are returned as “Unknown”

We tried with prompt :

data = {
    "prompt": "Read these icons then classify the Text into one of the icons: \n##\nsearch, home, pets, settings, done, account_circle, info, check_circle, delete, shopping_cart, favorite, face, visibility, logout, fingerprint, description, favorite_border, lock, help_outline, language, schedule, manage_accounts, thumb_up, close, menu, expand_more, arrow_back, chevron_right, arrow_forward_ios, arrow_back_ios, arrow_drop_down, cancel, more_vert, arrow_forwar, add, add_circle_outline, add_circle, send, content_copy, clear, save, mail, link, filter_list, remove, inventory_2, insights, create, remove_circle_outline, bolt, sort, inventory, flag, remove_circle, reply, add_box, person, facebook, notifications, groups, people, share, person_outline, school, person_add, public, emoji_events, group, notifications_active, engineering, construction, people, psychology, health_and_safety, travel_explore, emoji_emotions, group_add, notifications_none, edit, navigate_next, photo_camera, image, picture_as_pdf, tune, circle, receipt_long, timer, auto_stories, navigate_before, add_a_photo, auto_awesome, collections, remove_red_eye, palette, music_note, wb_sunny, add_photo_alternate, brush, flash_on, euro, email, location_on, call, phone, chat, business, mail_outline, vpn_key, list, qr_code_scanner, chat_bubble_outline, alternate_email, chat_bubble, forum, textsms, contact_mail, sentiment_satisfied, person_search, message, qr_code, comment, file_download, file_upload, download, folder, local_shipping, place, menu_book, map, local_offer, category, badge, restaurant, directions_car, volunteer_activism, local_fire_department, storefront, apartment, fitness_center, spa, business_center, house, meeting_room, cottage, corporate_fare, ac_unit, family_restroom, checkroom, other_houses, grass, all_inclusive, airport_shuttle, child_care, pool, beach_access, kitchen, roofing, holiday_village, computer, payment, attach_money, local_parking, support_agent, thumb_up, event, filter_alt, verified, dashboard, list, calendar_today, login, lightbulb, visibility_off, question_answer, check_circle_outline, highlight_off, date_range, help, paid, trending_up, article, account_balance, shopping_bag, open_in_new, task_alt, perm_identity, credit_card, arrow_right_alt, history, star_rate, fact_check, build, assignment, verified_user, delete_outline, report_problem, work, autorenew, savings, print, account_balance_wallet, code, view_list, store, today, android, grade, room, power_settings_new, update, receipt, contact_support, explore, accessibility, done_all, account_box, bookmark, shopping_basket, note_add, reorder, thumb_up_off_alt, launch, bookmark_border, payment, done_outline, supervisor_account, touch_app, drag_indicator, assessment, pending_actions, view_in_ar, exit_to_app, zoom_in, leaderboard, pan_tool, timeline, feedback, bug_report, open_in_full, pending, preview, accessibility_new, assignment_ind, stars, flight_takeoff, work_outline, add_task, alarm, dns, book, published_with_changes, card_giftcard, supervised_user_circle, assignment_turned_in, 3d_rotation, gavel, cached, swap_horiz, get_app, record_voice_over, extension, contact_page, label, translate, sync_alt, minimize, space_dashboard, thumb_down, help_center, nightlight_round, trending_flat, hourglass_empty, loyalty, edit_calendar, dashboard_customize, group_work, support, announcement, privacy_tip, euro_symbol, arrow_circle_up, grading, view_headline, book_online, source, sensors, close_fullscreen, copyright, compare_arrows Arrows, query_builder, api, find_in_page, restore, dangerous, table_view, subject, swap_vert, track_changes, bookmarks, settings_phone, redeem, input, backup, build_circle, disabled_by_default, perm_media, toc, circle_notifications, arrow_circle_down, https, swipe, zoom_out, open_with, g_translate, perm_phone_msg, ads_click, wysiwyg, label_important, pageview, file_present, card_membership, perm_contact_calendar, accessible, trending_down, integration_instructions, view_module, model_training, settings_accessibility, production_quantity_limits, upgrade, tips_and_updates, offline_bolt, thumbs_up_down, change_history, calendar_view_month, invert_colors, bookmark_add, class, expand, segment, donut_large, aspect_ratio, settings_backup_restore, important_devices, thumb_down_off_alt, alarm_on, settings_ethernet, opacity, youtube_searched_for, schedule_send, theaters, maximize, commute, addchart, no_accounts, open_in_browser, mark_as_unread, view_column, mediation, view_agenda, unpublished, contactless, not_started, settings_input_antenna, turned_in, anchor, tour, shop, view_week, camera_enhance, history_toggle_off, flight_land, accessible_forward, hide_source, flaky, turned_in_not, settings_voice, settings_input_component, highlight_alt, search_off, settings_remote, fit_screen, view_carousel, assignment_late, pregnant_woman, next_plan, plagiarism, donut_small, hourglass_full, remove_shopping_cart, online_prediction, lock_clock, edit_off, toll, all_inbox, offline_pin, tab, assignment_return, dynamic_form, swap_horizontal_circle, event_seat, view_sidebar, markunread_mailbox, restore_from_trash, outlet, settings_power, request_page, try, rowing, gif, view_quilt, smart_button, vertical_split, hotel_class, play_for_work, alarm_add, card_travel, remove_done, outbox, compress, chrome_reader_modeMode, settings_overscan, wifi_protected_setup, find_replace, settings_brightness, http, comment_bank, polymer, quickreply, spellcheck, settings_bluetooth, backup_table, view_stream, new_label, batch_prediction, data_exploration, view_day, outbound, restore_page, assignment_returned, bookmark_remove, swap_vertical_circle, picture_in_picture, settings_cell, line_weight, send_and_archive, generating_tokens, speaker_notes_off, eject, work_off, free_cancellation\n\nText: \"Powerful API\"\nIcon: computer\n##\nText: \"Premium Support\"\nIcon: sentiment_satisfied\n##\nText: \"Easy to use\"\nIcon: emoji_emotions\n##\nText: \"More Affordable\"\nIcon: attach_money\n##\nText: \"Modern Kitchen\"\nIcon: kitchen\n##\nText: \"Business\"\nIcon: storefront\n##\nText: \"Fast\"\nIcon: bolt\n##\nText: \"Passionate team\"\nIcon: person_outline\n##\nText: \"Free Trial\"\nIcon: person_add\n##\nText: \"Free Parking\"\nIcon: local_parking\n##\nText: \"We love dogs\"\nIcon: pets\n##\nText: \"" + query,
    "temperature": 0,
    "max_tokens": 20,
    "top_p": 0,
    "best_of": 2,
    "n": 2,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stop": ["##"]
};

This regularly returns a new icon that doesn’t exist.

1 Like

Okay that helps. I think you’ll need to go about it a different way but before I make any suggestions I need to know more about the goal. What is the outcome or purpose of this? Why do you need to classify icons? It seems to me that this particular problem lends itself to far simpler approaches. For instance you could just do WordNet distances to perform matches. So why do you feel you need GPT-3?

1 Like

I’m not an expert on Machine learning, and my team is busy with other stuff. For this particular project, everything is done with Open AI, so I was thinking maybe that I could also do classification with it, since I don’t have the time nor the ressources to train my own AI. I will check WordNet distance, thank you.

Do you think this would work if instead of using the icon names, I would use broad classes, like Humans, Pets, Trees, (then associate these with the related icons with a script) ?

1 Like

Well, I’m still not sure what your use case is, but check out this example from the documentation. GPT-3 understands emojis so I would try just giving it some examples and fill in the blanks.

1 Like

One other thing you could try is using the return_prompt flag in the classifications endpoint. That should give you the prompt we send to the completions endpoint and you can rerun that in the playground to see the actual label it generates

1 Like