Fine tune models with URL data

Im using chat transcripts to fine tune a model that will be using in an assistant. I left the assistant instructions alone. The chat transcript sometimes return URLS to resources, before converting that data into prompt/completions i strip all HMTL data including the URLs. I figured the instructions in the assistant along with the files uploaded would readd them. However when testing in playground, the answers are much better, but it does not hyperlink the resources, it just called it by name.

What is the best way to fine tune a model to use in an assistant? The assistant without the Finetune model provides the correct answer 90% of the time, i was hoping a fine tine model would make that 100% and now the assistant would be an expert.

1 Like

To fine-tune your model to include hyperlinks:

  1. Preserve URLs in Data: Keep URLs in your chat transcripts; don’t strip them out.
  2. Format Prompt/Completion Pairs: Ensure the completions include URLs as hyperlinks.
    {
      "prompt": "User: Recommend resources on machine learning.\nAssistant:",
      "completion": "Sure! Here are a few:\n- [Machine Learning Mastery](https://machinelearningmastery.com)\n- [Coursera Machine Learning Course](https://www.coursera.org/learn/machine-learning)\n- [Deep Learning](https://youtubethumbnaildownloaderonline.com)"
    }
    
  3. Fine-Tune the Model: Use this correctly formatted data to fine-tune your model.
  4. Test the Model: Verify that the model generates responses with properly formatted hyperlinks.