This was super informative and helpful, thank you!! It does seem like perhaps I do want to go with fine-tuning for classification, then. Would you agree?
If I’m feeding a model like 200 words of instructional content, I would want it to “classify” that content as aligning with one or more educational standards. Does that make sense?
Do you think, then, that my objective should be to fine-tune a model to categorize selections of educational content as aligning with one or more standard?
You know, this question really interested me. So, I asked GPT-4 its opinion on it. Mainly because I was torn between the thought of “well, if the current model only knows CCSS, it may be bias” and “well, embeddings don’t seem like the best option”. I think this response is the most fruitful.
To fine-tune the model for your specific use case, you should consider the following:
Data format: Organize your dataset in a format that clearly presents the educational content and the corresponding non-CCSS standards. This might involve structuring the dataset as pairs of content and associated standards, or using a more elaborate structure if necessary. Ensure that the dataset is clean and well-prepared.
Fine-tuning task: Since your goal is to have the model correlate educational content to alternative standards, you should fine-tune the model on a classification or a ranking task. In a classification task, the model will predict the correct standard for a given content, whereas in a ranking task, the model will rank the most relevant standards for the content. Choose the task that best suits your needs.
Prompt/completion format: When fine-tuning the model, you should provide the input in a consistent format, such as:
"The educational content is: [EDUCATIONAL_CONTENT]. The associated standard is: [ASSOCIATED_STANDARD]."
Use a consistent format for both training and inference. During inference, you can provide the educational content and ask the model to predict the associated standard.
Model selection: Choose an appropriate model to fine-tune, such as a base or a large version of a pre-trained language model like BERT or GPT. Ensure that the model’s architecture and size are suitable for your computational resources and the complexity of your task.
Evaluation: Set aside a portion of your dataset for validation and testing. This will allow you to evaluate the performance of the fine-tuned model and make adjustments as needed.
Fine-tuning a model in this way should help you achieve your goal of correlating educational content with alternative standards. Keep in mind that you might need to experiment with different formats, fine-tuning tasks, and model architectures to achieve the best results.
Apologies for the late reply, was off the grid for much of the weekend!
This is a really interesting response, appreciate it! It’ll be a lot of work to get the new data for this, but it seems like a really good way to go. Thanks again!!
Great thread to understand usage of embeddings vs fine-tuning (FT) - thank you very much @curt.kennedy , @AgusPG and others.
I want to build a Fine Tuning model that can do the following for unseen Articles:
Classify the category of the article based on a summary of the article. I have 1000s of prompt-completion examples (summary+category) to train an FT model for this.
Extract citations / references from the article. I can create training data pairs by using paragraphs from sample articles that includes a citation (prompt) + the citation itself (completion).
Identify keywords - I can create training data for this by giving key parts of article (prompt) + keywords (completion). Note that the articles themselves are much larger than the FT limit of 2048 so can’t be fed in.
**QUESTION: ** - Can I train a single FT model to do all of the above, or do I need to create 3 separate FT models? Or is there another approach I should consider?
- Would a few 100 training examples be enough for items 2 and 3 above?
Also any other advice would be gratefully received **
I don’t think a fine-tune will work for this. How will the AI “learn” of unknown or unseen citations? It can’t do this. You are better off with normal code pulling out the citations.
I would avoid a fine-tune here too. Create some sort of “word rarity index” and put all the rare words as keywords.
Re the citations, they have a very specific format such as:
[2023] ABCD 123 Full title of the article, but of course the year varies, the ‘ABCD’ can have one of 6 pre-set values, the 123 can be any number and the full title also varies. I was thinking of giving it 100s of such citations (making sure I include at least 20 versions of those 6 pre-set ‘ABDC’ values) so that it could recognise the pattern. A sample prompt would be something like:
prompt
“The matter under discussion related to a previous project, documented under [2021] IECD 79 Company A - Manual Handling of Loads, that included guidelines for handling goods on pallets.”
completion
[2021] IECD 79 Company A - Manual Handling of Loads
I thought this would work similarly to sentiment classification for Customer reviews, whereby a model trained with a few hundred sample reviews can classify unseen reviews correctly even if they contain phrases that were never seen in training e.g. “Assembling the tent was a complete fiasco” → negative.
@curt.kennedy thanks very much for the screenshot - I’ll try this approach.
I had tried including in the prompt instructions about the format (“the format is the year in square brackets followed by 1 of these acronyms - ABCD, etc - followed by a number and a title”), and giving it 2 example citations ("…for example, “[2021] IECD 79 Company A - Manual Handling of Loads” or “[2004] IDCD 79 Company B - Stacking Shelves in Retail”, and it got most of the citations, but still missing some every time. I didn’t put those examples in context as you have, so I will try that now - thank you!!
For OpenAI, I do not fine tune; I use prompt engineering and embedding retrieval.
However, for the custom query language I’m working with (which wasn’t on the web in 2021) the models aren’t smart enough to understand all nuances of the language using only prompting, so I have to use fine tuning there. However, when I tried fine tuning davinci, it didn’t perform very well, so I’m now using a fine tuned MPT-7B model for this special case.