Fine-Tuning returned terrible results

nicola.macchitella · October 12, 2023, 1:47am

I tried to fine tune gpt-3.5 but I must have done something wrong becuase the results are now terrible.

My goal was to build a link classifier so I “manually” classified a lot of links and then built this training data set. (Adding a sample of one line of my jsonl below - full set is 500 lines similar to below).

{"messages": [{"role": "system", "content": "You are a link classifier: given a list of links and paths from fashion brands or fashion stores, you classify each link based on the most likely content they lead to. You categorize links into one of the following categories:\n\n- Products & Categories: This section contains product names, categories, and URL paths related to various products and their categories. Examples include individual product names like \"bracelets\" and \"jackets\", as well as URLs to specific product pages and categories such as \"/products/best-sellers\", \"/collections/sale-skirts\", and \"/collections/sneakers\".\n- Designers: This category focuses on paths and links that are specifically about the designers behind the products or collections. Examples are URLs like \"/designers/chanel\", \"/collections/designer-tom-ford\", and \"/designer-spotlight\".\n- Policy Pages, FAQs, Shipping, Press & Help: This section lists URLs and strings related to policies, frequently asked questions, shipping, returns, press, and help pages of a website. It includes links to pages like \"/policies/refund-policy\", \"/pages/size-guide\", and \"/pages/faq\".\n- Contact & About: Contains URLs related to contact information and about pages. Examples are \"/contact\", \"/pages/about-us\", and \"/pages/our-story\".\n- Locations: Details URLs and strings about the physical locations or stockists of the business. Entries include \"/pages/stockists\" and \"/pages/our-store\".\n- Others: A more miscellaneous category, it captures a wide range of URLs from the website, ranging from cart and checkout pages like \"/cart\" and \"/checkout\", to other types of pages and actions such as \"/blog/\", \"/events\", and \"/services/store-services\".\n- You return the classified content as a dictionary, where the keys are categories and the values are lists of links classified under them.\n"}, {"role": "user", "content": "Classify these links: ['sneakers', '/collections/ceramics', '/collections/kids-footwear', '/collections/lingerie', 'new-arrivals-woman', '/collections/womens-new-arrivals', '/collections/jewelry-earrings', '/c/designers-alice-olivia-cat40360738', '/collections/pedro-garcia', '/c/designers-david-yurman-cat540734', '/collections/amber-sceats', '/collections/shaina-mote', '/c/designer-gift-shops-gucci-gift-shop-cat1696283890', '/collections/maison-louis-marie', '/shipping-FAQ', '/pages/advertising', 'terms-conditions', '/policies/cancellation-policy', '/policies/exchange-policy', '/pages/legal-notice', '/team', '/about/origin', 'real-estate', '/pages/delivery', '/rafi/list/where-to-shop-for-burning-man-in-nyc', '/selectprints/', '/delivery-choices?pr_id=promo_deliverystrip_108&pr_name=deliverystrip&pr_creative=sameday&pr_position=deliverystrip', '/account', '/services/virtual-services']"}, {"role": "assistant", "content": "{\"products & categories\": [\"sneakers\", \"/collections/ceramics\", \"/collections/kids-footwear\", \"/collections/lingerie\", \"new-arrivals-woman\", \"/collections/womens-new-arrivals\", \"/collections/jewelry-earrings\"], \"designers & collections\": [\"/c/designers-alice-olivia-cat40360738\", \"/collections/pedro-garcia\", \"/c/designers-david-yurman-cat540734\", \"/collections/amber-sceats\", \"/collections/shaina-mote\", \"/c/designer-gift-shops-gucci-gift-shop-cat1696283890\", \"/collections/maison-louis-marie\"], \"policy pages, faqs, shipping, careers, press & help\": [\"/shipping-FAQ\", \"/pages/advertising\", \"terms-conditions\", \"/policies/cancellation-policy\", \"/policies/exchange-policy\", \"/pages/legal-notice\"], \"contact & about\": [\"/team\", \"/about/origin\"], \"locations\": [], \"others\": [\"real-estate\", \"/pages/delivery\", \"/rafi/list/where-to-shop-for-burning-man-in-nyc\", \"/selectprints/\", \"/delivery-choices?pr_id=promo_deliverystrip_108&pr_name=deliverystrip&pr_creative=sameday&pr_position=deliverystrip\", \"/account\", \"/services/virtual-services\"]}"}]}

However, when I tried using it has some isssues that I am not fully understanding. For example, when I pass these links

'/p/Clz-ssiumXM/,/p/ClhindYvxM3/,/p/ClrgwZiSGiR/,/p/Clq6UD1psTE/,/p/ClwOc6-JvvW/,/p/ClpqePWr37J/,/p/Clm1UJ7JFQs/,/billystarzana/,/,/p/Clq7RgqJxV_/,/p/ClmH13yvzRb/,boutique@billystarzana.com,/p/Clgp8xAp4d0/,/10206736089920694,/,/p/ClocGzaJvPn/,/p/Cloc36Rpy6g/,/p/ClgpG33JrpT/,/p/ClmI5b2PFWm/,/,/p/Clu0gkNLQqN/,/'

I get this below (which feels random - given, those are weird links but I was expecting more consistency)

{'products & categories': ['/p/Clz-ssiumXM/',
  '/p/ClhindYvxM3/',
  '/p/ClrgwZiSGiR/',
  '/p/Clq6UD1psTE/',
  '/p/ClwOc6-JvvW/',
  '/p/ClpqePWr37J/',
  '/p/Clm1UJ7JFQs/'],
 'designers & collections': ['/billystarzana/',
  '/',
  '/p/Clq7RgqJxV_',
  '/p/ClmH13yvzRb/'],
 'policy pages, faqs, shipping, careers, press & help': [],
 'contact & about': ['boutique@billystarzana.com'],
 'locations': [],
 'others': ['/p/Clgp8xAp4d0/',
  '/10206736089920694',
  '/',
  '/p/ClocGzaJvPn/',
  '/p/Cloc36Rpy6g/',
  '/p/ClgpG33JrpT/',
  '/p/ClmI5b2PFWm/',
  '/',
  '/p/Clu0gkNLQqN/']}

Is it because I overfit? How do I debug something like that?

N2U · October 12, 2023, 1:53am

Hey mate!

welcome to the community forum, I have a hard time understanding where these links are leading based only on the text. How would you expect GPT to know that?

nicola.macchitella · October 12, 2023, 2:14am

Hi! Thanks for the reply!

Yeah that’s a fair note, and indeed I need to make sure these are present in the training set otherwise I can understand how this is not feasible.

I was curios on why it did not classified them ALL as “others” but only some of them…since they didn’t belong to any category but “others” (but maybe also Other needs to have training data with those present already?).

_j · October 12, 2023, 2:18am

This is an example of

the type of system prompt that you will use in your application and
the type of user input that you expect the user to input?

{“role”: “system”, “content”: "You are a link classifier: given a list of links and paths from fashion brands or fashion stores, you classify each link based on the most likely content they lead to. You categorize links into one of the following categories:

Products & Categories: This section contains product names, categories, and URL paths related to various products and their categories. Examples include individual product names like “bracelets” and “jackets”, as well as URLs to specific product pages and categories such as “/products/best-sellers”, “/collections/sale-skirts”, and “/collections/sneakers”.

Designers: This category focuses on paths and links that are specifically about the designers behind the products or collections. Examples are URLs like “/designers/chanel”, “/collections/designer-tom-ford”, and “/designer-spotlight”.

Policy Pages, FAQs, Shipping, Press & Help: This section lists URLs and strings related to policies, frequently asked questions, shipping, returns, press, and help pages of a website. It includes links to pages like “/policies/refund-policy”, “/pages/size-guide”, and “/pages/faq”.

Contact & About: Contains URLs related to contact information and about pages. Examples are “/contact”, “/pages/about-us”, and “/pages/our-story”.

Locations: Details URLs and strings about the physical locations or stockists of the business. Entries include “/pages/stockists” and “/pages/our-store”.

Others: A more miscellaneous category, it captures a wide range of URLs from the website, ranging from cart and checkout pages like “/cart” and “/checkout”, to other types of pages and actions such as “/blog/”, “/events”, and “/services/store-services”.

You return the classified content as a dictionary, where the keys are categories and the values are lists of links classified under them.
"},

{“role”: “user”, “content”: “Classify these links: [‘sneakers’, ‘/collections/ceramics’, ‘/collections/kids-footwear’, ‘/collections/lingerie’, ‘new-arrivals-woman’, ‘/collections/womens-new-arrivals’, ‘/collections/jewelry-earrings’, ‘/c/designers-alice-olivia-cat40360738’, ‘/collections/pedro-garcia’, ‘/c/designers-david-yurman-cat540734’, ‘/collections/amber-sceats’, ‘/collections/shaina-mote’, ‘/c/designer-gift-shops-gucci-gift-shop-cat1696283890’, ‘/collections/maison-louis-marie’, ‘/shipping-FAQ’, ‘/pages/advertising’, ‘terms-conditions’, ‘/policies/cancellation-policy’, ‘/policies/exchange-policy’, ‘/pages/legal-notice’, ‘/team’, ‘/about/origin’, ‘real-estate’, ‘/pages/delivery’, ‘/rafi/list/where-to-shop-for-burning-man-in-nyc’, ‘/selectprints/’, ‘/delivery-choices?pr_id=promo_deliverystrip_108&pr_name=deliverystrip&pr_creative=sameday&pr_position=deliverystrip’, ‘/account’, ‘/services/virtual-services’]”},

{“role”: “assistant”, “content”: “{“products & categories”: [“sneakers”, “/collections/ceramics”, “/collections/kids-footwear”, “/collections/lingerie”, “new-arrivals-woman”, “/collections/womens-new-arrivals”, “/collections/jewelry-earrings”], “designers & collections”: [”/c/designers-alice-olivia-cat40360738", “/collections/pedro-garcia”, “/c/designers-david-yurman-cat540734”, “/collections/amber-sceats”, “/collections/shaina-mote”, “/c/designer-gift-shops-gucci-gift-shop-cat1696283890”, “/collections/maison-louis-marie”], “policy pages, faqs, shipping, careers, press & help”: [“/shipping-FAQ”, “/pages/advertising”, “terms-conditions”, “/policies/cancellation-policy”, “/policies/exchange-policy”, “/pages/legal-notice”], “contact & about”: [“/team”, “/about/origin”], “locations”: , “others”: [“real-estate”, “/pages/delivery”, “/rafi/list/where-to-shop-for-burning-man-in-nyc”, “/selectprints/”, “/delivery-choices?pr_id=promo_deliverystrip_108&pr_name=deliverystrip&pr_creative=sameday&pr_position=deliverystrip”, “/account”, “/services/virtual-services”]}"}]}

This just seems to be a massive misunderstanding about the purpose and use of fine-tune.

nicola.macchitella · October 12, 2023, 2:23am

It might be, I am trying to ramp up on the topic so I might butchering the tech here. What would you say it is wrong?

To answer your questions: conceptually yes. I have something like 10k links that I am trying to classify based on those categories so I thought that prompt + input was a good combo to get that output (structured classified dictionary)

N2U · October 12, 2023, 2:25am

You are you aware that the model doesn’t have internet access right

_j · October 12, 2023, 2:30am

Fine-tune is for training the AI what kind of output it should generate given a particular input. By having hundreds or thousands of examples of the kind of output to be generated for a user input.

It can thus have a short system identifier because the type of output has been demonstrated by thousands of examples.

good fine-tune example conversation:

system: Classifo classifies links
user: https://help.openai.com/en/articles/5528730-fine-tuning-a-classifier-to-improve-truthfulness
assistant: documentation

bad fine-tune example conversation (see above)

nicola.macchitella · October 12, 2023, 2:34am

Yeah of course, it should all be based on string recognition. Those link types might not be present in training data and maybe that’s the issue?

nicola.macchitella · October 12, 2023, 2:36am

can you say a little more of why the above is wrong? I don’t see how the training set/job is fundamentally different from what you are saying.

_j · October 12, 2023, 2:45am

What you show above is a huge system prompt, which is supposed to be the same kind of system prompt that you would use in your application. You’d have to supply the same huge system prompt to use the fine-tune model, completely the opposite of what fine-tune is for. Your fine-tune is only triggered by the same type of system prompt.

What you show above is a huge user input with a whole bunch of links in a very long list. Are you then going to input a huge list of links every time you use the fine-tune AI?

The for the output, you a massive re-grouping of all those links that are beyond the AIs ability to understand or be trained on. To even generate that is beyond the way that an open-ended transformer works, it would have to produce the text with all of one category until it is done filling that category, and then start producing text for the next category until it thinks it got them all. It then can’t reflect and go back on ones it didn’t produce earlier. Grouping things by classification is a task that AI is terrible at, and you aren’t teaching it anything except perhaps the name of categories we see you already put into a massive system prompt.

nicola.macchitella · October 12, 2023, 2:51am

Yeah, that was the goal basically

nicola.macchitella · October 12, 2023, 2:52am

Interesting, are you saying that the transformer doesn’t read all the text first and then outputs results but does it almost on a first in first out? Why would it have issues classifying things?

Noted that those links are just weird so there is no why to know what they are. I would need to add them to training data in some way that is standardized enough for then to recognize the pattern

_j · October 12, 2023, 3:00am

It is an open-ended word-by-word generation, with no going back to revise what it wrote earlier.

A task that will have bad results:

"group these 100 songs into two outputs, “happy songs” and “sad songs”.

By the time it gets to the end of the production of sad songs and realizes some that haven’t been included yet are actually happy?

nicola.macchitella · October 12, 2023, 3:02am

I mean, I am clearly doing something wrong . The output here is definelty not what I was hoping for.

Why would all those collections link to be classified as “other”? And the privacy link to be classified as designer. The non tuned model does better actually. How is that possible?

{‘products & categories’: [‘/destinations/washington/seattle/shopping/’],
‘designers & collections’: [‘/california-privacy-rights/’,
‘/interests/food-drink/’,
‘/destinations/washington/seattle/neighborhoods/’,
‘/newsletters/’,
‘/8f88e25d-1e4f-43f3-b2ad-74e039da705b’,
‘/destinations/washington/seattle/hotels/’,
‘/destinations/washington/seattle/attractions/’,
‘/interests/trip-planning/’,
‘/local-experts/’],
‘policy pages, faqs, shipping, careers, press & help’: [‘/privacy-statement/’],
‘contact & about’: ,
‘locations’: ,
‘others’: [‘/interests/arts-culture/’,
‘/destinations/washington/seattle/restaurants/’,
‘12066322877’,
‘/destinations/washington/seattle/’,
‘/collections/mens-laced-shoes’,
‘/collections/womens-small-leather-goods’,
‘/collections/mens-flat-sandals’,
‘/collections/10-corso-como’,
‘/collections/mens-handbags’,
‘/collections/mens-wearable-accessories’,
‘/collections/mens-belt-bags’]}

_j · October 12, 2023, 3:13am

Because of what I just said. The AI produces a token at a time. There is no “thinking” going on, there is just a certain point where it becomes more likely to produce the closing brackets of a category than to produce the text that continues in a category.

Multi-headed attention, especially in gpt-3.5 that can be tuned, just doesn’t have enough layers to contemplate the entire input at once. It doesn’t have a score-keeping or inner monologue where it can go down the list and first mark them by what category they should go in.

Just turn this into a single input - single output job. It will consume the same amount of tokens after the whole idea of not needing a huge system prompt any more. And then still will likely not work well on different links than the examples you provided, because there is no making sense of /8f88e25d-1e4f-43f3-b2ad-74e039da705b’

If you already have the links and they aren’t changing, and you already classified them, this isn’t a job for AI at all.

nicola.macchitella · October 12, 2023, 3:18am

Few questions:

What do you mean to have a single input/output job? To classify a link at a time?
Is the model able to generalize that /collections/ (e.g. ‘/collections/bergfabel’,) is a designer link and /collections/ (e.g. collections/sneakers) is a product category page? (granted that at training time i will provide some examples of /collection/name and collection/thing - but not all the possible examples of the population?)
I have a lot of links, only manually classified some. Was hoping to use GPT to do the rest. Maybe gpt 4 can do it?

Topic		Replies	Views
How to confirm that you got the correct value from a text other than repeating the same prompt over and over API	39	750	September 1, 2024
Gpt-3.5-turbo fine tuning help needed, very difficult situations API	6	1711	September 13, 2023
Do you fine tune? If so why? API	34	4513	December 25, 2023
Near real time feedback to provide new output API	22	2735	March 26, 2023
Accident reports to unified taxonomy: A multi-class-classification problem Prompting embeddings , gpt-4 , chatgpt , classification , semantic-search	15	1139	August 9, 2024

Fine-Tuning returned terrible results

Related topics