Format of data for finetuning

I’d like to finetune OpenAPI to create exceptional lesson plans. I’m wondering about how to format the lesson plan data. There are lots available on the Internet, but they are whole documents. Teachers generally think of lesson plans as composed of individual components (i.e., learning objective, content, learning activity, relevant standards, assessments, etc.). I was thinking of separating the lesson plans into these components.

Hi @johnfaig, yes, that would be a great start.

Could you explain how you classify the lesson parts, the criteria used, and why?

And what are you trying to teach the fine-tuned model to do? What will be the input, and what will be the output? Why do you think it will work, etc.?

Sorry for asking all those questions, but without them, no chances to get you the exact thing you need (which often is quite different from what people think they want ; ))

Are you planing to let a generative AI model create the plan, or are you planing to have a process which looks up training models and puts together a plan.

The first idea requires fine-tuning; the second idea is more of an embedding search and retrieval task.

Thanks for the clarification questions. I’m a bit of a newbie. I’ve reviewed tutorials and code on embedding and fine-tuning, but haven’t looked much at the underlying data. I think it might be more like the way code search is done where the code and comments are kept together as blocks. Let me explain a little bit.

I’m thinking that lesson plans have about 10 blocks of text and each one is a component (i.e., learning goals, content, learning activities, assessment, etc.). The output from ChatGPT generating complete lesson plans has not been great. So, my thought was to submit an incomplete lesson plan and fill in the missing components. Or, request on of the component be replaced.

Thanks, but something more concrete? Ideally:

Lesson plan is a document describing the various aspects of a lesson to do, contains between 7 and 12 sections, each of those having a title and between 1-4 paragraphs of text, sometimes subtitles (?) And bullet points lists…

Each section describes one aspect of the lesson. They are: here goes a list etc…

The goal of the application is to generate the complete lesson plan when the user inputs just a brief description/lesson subject/choose needed…

Having something like this would basically solve 60% of your problem of not 90%…

Thanks for the response. Actually, I’m thinking that a teacher has an 80% lesson plan and wants the missing blocks. Or, a teacher has a complete lesson plan and wants suggestions for a few new blocks that can be substituted.

I’m also still a little fuzzy on how the data being embedded should be formatted. I look at samples, but I’m not building an understanding.

1 Like