I tried to fine tune gpt-3.5 but I must have done something wrong becuase the results are now terrible.
My goal was to build a link classifier so I “manually” classified a lot of links and then built this training data set. (Adding a sample of one line of my jsonl below - full set is 500 lines similar to below).
{"messages": [{"role": "system", "content": "You are a link classifier: given a list of links and paths from fashion brands or fashion stores, you classify each link based on the most likely content they lead to. You categorize links into one of the following categories:\n\n- Products & Categories: This section contains product names, categories, and URL paths related to various products and their categories. Examples include individual product names like \"bracelets\" and \"jackets\", as well as URLs to specific product pages and categories such as \"/products/best-sellers\", \"/collections/sale-skirts\", and \"/collections/sneakers\".\n- Designers: This category focuses on paths and links that are specifically about the designers behind the products or collections. Examples are URLs like \"/designers/chanel\", \"/collections/designer-tom-ford\", and \"/designer-spotlight\".\n- Policy Pages, FAQs, Shipping, Press & Help: This section lists URLs and strings related to policies, frequently asked questions, shipping, returns, press, and help pages of a website. It includes links to pages like \"/policies/refund-policy\", \"/pages/size-guide\", and \"/pages/faq\".\n- Contact & About: Contains URLs related to contact information and about pages. Examples are \"/contact\", \"/pages/about-us\", and \"/pages/our-story\".\n- Locations: Details URLs and strings about the physical locations or stockists of the business. Entries include \"/pages/stockists\" and \"/pages/our-store\".\n- Others: A more miscellaneous category, it captures a wide range of URLs from the website, ranging from cart and checkout pages like \"/cart\" and \"/checkout\", to other types of pages and actions such as \"/blog/\", \"/events\", and \"/services/store-services\".\n- You return the classified content as a dictionary, where the keys are categories and the values are lists of links classified under them.\n"}, {"role": "user", "content": "Classify these links: ['sneakers', '/collections/ceramics', '/collections/kids-footwear', '/collections/lingerie', 'new-arrivals-woman', '/collections/womens-new-arrivals', '/collections/jewelry-earrings', '/c/designers-alice-olivia-cat40360738', '/collections/pedro-garcia', '/c/designers-david-yurman-cat540734', '/collections/amber-sceats', '/collections/shaina-mote', '/c/designer-gift-shops-gucci-gift-shop-cat1696283890', '/collections/maison-louis-marie', '/shipping-FAQ', '/pages/advertising', 'terms-conditions', '/policies/cancellation-policy', '/policies/exchange-policy', '/pages/legal-notice', '/team', '/about/origin', 'real-estate', '/pages/delivery', '/rafi/list/where-to-shop-for-burning-man-in-nyc', '/selectprints/', '/delivery-choices?pr_id=promo_deliverystrip_108&pr_name=deliverystrip&pr_creative=sameday&pr_position=deliverystrip', '/account', '/services/virtual-services']"}, {"role": "assistant", "content": "{\"products & categories\": [\"sneakers\", \"/collections/ceramics\", \"/collections/kids-footwear\", \"/collections/lingerie\", \"new-arrivals-woman\", \"/collections/womens-new-arrivals\", \"/collections/jewelry-earrings\"], \"designers & collections\": [\"/c/designers-alice-olivia-cat40360738\", \"/collections/pedro-garcia\", \"/c/designers-david-yurman-cat540734\", \"/collections/amber-sceats\", \"/collections/shaina-mote\", \"/c/designer-gift-shops-gucci-gift-shop-cat1696283890\", \"/collections/maison-louis-marie\"], \"policy pages, faqs, shipping, careers, press & help\": [\"/shipping-FAQ\", \"/pages/advertising\", \"terms-conditions\", \"/policies/cancellation-policy\", \"/policies/exchange-policy\", \"/pages/legal-notice\"], \"contact & about\": [\"/team\", \"/about/origin\"], \"locations\": [], \"others\": [\"real-estate\", \"/pages/delivery\", \"/rafi/list/where-to-shop-for-burning-man-in-nyc\", \"/selectprints/\", \"/delivery-choices?pr_id=promo_deliverystrip_108&pr_name=deliverystrip&pr_creative=sameday&pr_position=deliverystrip\", \"/account\", \"/services/virtual-services\"]}"}]}
However, when I tried using it has some isssues that I am not fully understanding. For example, when I pass these links
'/p/Clz-ssiumXM/,/p/ClhindYvxM3/,/p/ClrgwZiSGiR/,/p/Clq6UD1psTE/,/p/ClwOc6-JvvW/,/p/ClpqePWr37J/,/p/Clm1UJ7JFQs/,/billystarzana/,/,/p/Clq7RgqJxV_/,/p/ClmH13yvzRb/,boutique@billystarzana.com,/p/Clgp8xAp4d0/,/10206736089920694,/,/p/ClocGzaJvPn/,/p/Cloc36Rpy6g/,/p/ClgpG33JrpT/,/p/ClmI5b2PFWm/,/,/p/Clu0gkNLQqN/,/'
I get this below (which feels random - given, those are weird links but I was expecting more consistency)
{'products & categories': ['/p/Clz-ssiumXM/',
'/p/ClhindYvxM3/',
'/p/ClrgwZiSGiR/',
'/p/Clq6UD1psTE/',
'/p/ClwOc6-JvvW/',
'/p/ClpqePWr37J/',
'/p/Clm1UJ7JFQs/'],
'designers & collections': ['/billystarzana/',
'/',
'/p/Clq7RgqJxV_',
'/p/ClmH13yvzRb/'],
'policy pages, faqs, shipping, careers, press & help': [],
'contact & about': ['boutique@billystarzana.com'],
'locations': [],
'others': ['/p/Clgp8xAp4d0/',
'/10206736089920694',
'/',
'/p/ClocGzaJvPn/',
'/p/Cloc36Rpy6g/',
'/p/ClgpG33JrpT/',
'/p/ClmI5b2PFWm/',
'/',
'/p/Clu0gkNLQqN/']}
Is it because I overfit? How do I debug something like that?