Request for feature - user labels on fine-tunes

I apologize if this feature already exists and I missed it. I was hoping to add some metadata to my fine-tune models so that I don’t have to keep track of a separate list. My project necessitates a minimum of two dozen different fine-tuned models so keeping track of them I would like to be able to have my own label attached to them. Stuff like:

  • QuestionGenerator-v1
  • ContextSummarizer-v1

That sort of thing.

1 Like

Yes, this is a good feature request.

On another note - did you try combining these models into a single model, which can perform all those tasks? you can conditionally prefix each of the prompts, and then combine the examples together. That has a chance of better performance as long as your tasks are somewhat related.


I have not tried that, but that’s a cool possibility. Could be good due to higher data counts.

1 Like

Multi task models, combining several of the above

This is one of the more advanced techniques, which we recommend tackling only after you’ve created single task models. Models trained to perform a variety of tasks generally perform better on those specific tasks, as well as having an additional benefit to generalize to similar tasks.

For example a chat bot would benefit from knowing about the particular domain that your company specializes in, your product catalogue as well as from a customer email triaging system and a sentiment analysis tool.

To train such fine tuned models it’s important to clearly separate those different use cases early within the prompt. We recommend using the first few words of the prompt to describe or differentiate the tasks to be performed.

The larger the amount of data, and tasks, the better the performance tends to be. An additional benefit is that you can potentially use one model for all your needs. For example:

{“prompt”:””, “completion”:” <product catalogue>”}

{“prompt”:””, “completion”:” <a document with domain specific jargon>”}

{“prompt”:”Chatbot\nSummary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:”, “completion”:” <response2>”}

{“prompt”:”Sentiment\n<Customer review> ->”, “completion”:” positive”}

{“prompt”:”Email Triage\n<customer email>\n\n###\n\nCategory:”, “completion”:” <category>”}

Make sure to measure the performance, and that a more complicated model is actually improving the performance on downstream tasks. These models might suffer from dataset imbalance - if one of the tasks dominates the examples, then performance on other tasks might be worse. They may also suffer from leaking the context from different tasks.

If you’re interested in generalizing to different kinds of tasks related to your existing tasks, then you may want to describe each task in natural language (Replace Email Triage with an instruction such as Which of the following categories best fits this customer email (<category1>, category2>...<category13>)?. Then hopefully another task described in natural language could perform well with zero-shot prompts.


That’s awesome and probably exactly how I would approach it. I have found that framing with natural language tends to work best on a variety of models. Plain English instructions up front, etc.

Also, sorry, but I have to…