Great thread! I’ve learned a lot here. I do have additional questions, though, on the same topic.
Basically, my use case for a fine-tuned model is this: I work in the education field as a SWE. Every piece of educational content that we release is aligned with educational standards, whether that be the federally-backed Common Core State Standards (CCSS), individual states’ customized educational standards, or more a la carte standards from various groups/entities.
All OpenAI models so far (even GPT-4, it seems) only has knowledge of CCSS, which makes sense given the breadth of discussion on the internet of CCSS vs other educational standards sets. So, I’ve compiled a substantial data set of individual non-CCSS educational standards to fine-tune a davinci model with.
Here are my questions:
- Do you all have insight into the prompt/completion format for the training data?
- Part of my use case for this task is getting the model to correlate educational content to standards–that is, having it parse content and make decisions on what educational standards the content aligns to. GPT-3 and 4 currently do this for CCSS standards, and actually do it very well. But I’d like it to do that for the other educational standards sets I’m going to feed it. Will fine-tuning a model help achieve this? If so, is there specific prompt/completion formatting I have to use?
Thank you!