Will we be allowed to fine-tune o1 models in the future?

I just wonder: Will we be allowed to fine-tune o1 models in the future?

I have a specific use-case in education category for students. An AI assistant that can solve any question that student couldn’t solve on their own, so it will solve it for them. o1 is pretty good at better answers but I need to (a) align it with the student’s current curriculum, (b) improve accuracy on solving questions that AI couldn’t solve and (c) improve the general final output of the AI to customize it.

I can partially align it with curriculum via prompting and improve the final output by adding another layer with fine-tuned gpt-4o model but fine-tuning would be better.

I don’t know what to do with accuracy improvement though. Only thing I can think of is to make my own o1 model by fine-tuning gpt-4o with reinforcement learning by preparing a dataset then providing the questions and repeat calling it until it finds the correct answer, or by formatting and providing with solution explanation. Which, I think, would probably result with less accuracy but I don’t know.

1 Like

Hi there!

Based on this summary of a recent AMA with OpenAI staff on o1 fine-tuning of o1 models is in the cards but there is no exact timeline just yet and personally I would not expect this to become available until o1 is out of preview.

I am still not 100% I fully comprehend what all you are trying to achieve with fine-tuning. Be mindful that fine-tuning is not intended for knowledge injection. You can certainly fine-tune the model to get it to solve questions better by training it on the logical steps it should take. However, the model will not retain the actual questions and answers - at best it will partially pick up a few points here and there. You can read up more on this here.

While you are waiting for o1 fine-tuning to become available, you could try out the new model distillation capability. This would allow you to create a fine-tuning dataset for fine-tuning a gpt-4o model based on o1-preview outputs. If this works for your use case, this would be a much cheaper and readily available option.

3 Likes

Thank you so much for sharing this AMA summary with me!
Yeah, I actually tried many ways and even fine-tuned a few models. Chain-of-Thought + Fine-Tuning is the most accuracy improving way for me. Distillation could be great, I will try it but reasoning tokens are not shared. I will get a higher accuracy with less effort but it could be lower precision.

My plan is to get the actual answer from the user as answer option or value. Try with o1-mini without revealing the answer to it then validate. If answer is incorrect, try with o1-preview. If still incorrect try with the fine-tuned model.

1 Like