@mcavanaugh
In this context, a categorizer is outputting single tokens. So example “CCSS text stuff” → ’ 0’ and “non-CCSS text stuff” → ’ 1’
Note the space preceding the value of 0 or 1. To run this, set temperature = 0 and max_tokens = 1.
Limiting it like this avoids confusion in the model, and makes it more reliable.
You can start with ada or babbage, and work your way up if needed.
This was super informative and helpful, thank you!! It does seem like perhaps I do want to go with fine-tuning for classification, then. Would you agree?
If I’m feeding a model like 200 words of instructional content, I would want it to “classify” that content as aligning with one or more educational standards. Does that make sense?
I think this makes sense now, thank you!
Do you think, then, that my objective should be to fine-tune a model to categorize selections of educational content as aligning with one or more standard?
Yes, I would try that first.
You know, this question really interested me. So, I asked GPT-4 its opinion on it. Mainly because I was torn between the thought of “well, if the current model only knows CCSS, it may be bias” and “well, embeddings don’t seem like the best option”. I think this response is the most fruitful.
To fine-tune the model for your specific use case, you should consider the following:
- Data format: Organize your dataset in a format that clearly presents the educational content and the corresponding non-CCSS standards. This might involve structuring the dataset as pairs of content and associated standards, or using a more elaborate structure if necessary. Ensure that the dataset is clean and well-prepared.
- Fine-tuning task: Since your goal is to have the model correlate educational content to alternative standards, you should fine-tune the model on a classification or a ranking task. In a classification task, the model will predict the correct standard for a given content, whereas in a ranking task, the model will rank the most relevant standards for the content. Choose the task that best suits your needs.
- Prompt/completion format: When fine-tuning the model, you should provide the input in a consistent format, such as:
[EDUCATIONAL_CONTENT] [SEPARATOR] [ASSOCIATED_STANDARD]
or
"The educational content is: [EDUCATIONAL_CONTENT]. The associated standard is: [ASSOCIATED_STANDARD]."
Use a consistent format for both training and inference. During inference, you can provide the educational content and ask the model to predict the associated standard.
- Model selection: Choose an appropriate model to fine-tune, such as a base or a large version of a pre-trained language model like BERT or GPT. Ensure that the model’s architecture and size are suitable for your computational resources and the complexity of your task.
- Evaluation: Set aside a portion of your dataset for validation and testing. This will allow you to evaluate the performance of the fine-tuned model and make adjustments as needed.
Fine-tuning a model in this way should help you achieve your goal of correlating educational content with alternative standards. Keep in mind that you might need to experiment with different formats, fine-tuning tasks, and model architectures to achieve the best results.
Seems like the best use case for fine tuning so far is classification… Sorry I’ve been a little AWOL from this thread. Been coding 
I don’t do classification, I think there are many uses for fine-tuning besides that.
Apologies for the late reply, was off the grid for much of the weekend!
This is a really interesting response, appreciate it! It’ll be a lot of work to get the new data for this, but it seems like a really good way to go. Thanks again!!