Reinforcement fine-tuning is now available for o4-mini! You might remember we announced the alpha program for RFT during 12 Days of OpenAI last December. We’ve been working on it since, and verified organizations can get started with it today. This marks the first time you fine-tune on OpenAI reasoning models—RFT is a new technique that uses chain-of-thought reasoning and task-specific grading to improve model performance for your specific domains. One of our alpha program members, Accordance, used RFT and saw a 40% increase in model performance for their tax and accounting purposes. We’re also offering a 50% discount if you share your datasets with us, which can help improve future OpenAI models. Get started with our reinforcement fine-tuning guide: https://platform.openai.com/docs/guides/reinforcement-fine-tuning
And as an update to “normal” fine-tuning (supervised), we’ve added the ability to fine-tune our fastest, cheapest model, GPT-4.1 nano.
OpenAI might not be pregnant, but never fails to deliver.
I hope someday that features like this, and reasoning summaries in the API, can be granted to folks like me who don’t want to provide government ID. This all sounds so cool but the risk is a smidge too high.
And thanks for allowing fine-tuning on 4.1-nano! It’s a pretty weak model so this will be a very big help. Many thanks for your hard work.
I opened another thread in the feedback API, but I just saw this post. Sorry for double post. First, very excited to explore RL tuning. Great work on that.
I have two questions:
Can we use SF fine-tuned models as a grader in the RL pipeline? I have some nice SF fine-tuned models that are good at in-distribution but bad at out-of-distribution. I assume that if I use them in the RL pipeline as a grader, maybe RL can learn from them and generalize to out-of-distribution too? Any idea on that? Does it make sense?
Second, I like the idea that we can run a custom Python grader. However, I have some deep learning models that can be used like a grader. I am running these models on a REST API with Azure. However, your RL pipeline does not give network connectivity. Is there any chance that you can provide network connectivity for graders to post to REST APIs? It also reduces the resource demand on your side. Or can you add a REST API grader option so that we can run graders from our server and send responses to the RL pipeline? Any thoughts on that?
@Karan_Sharma As, “FT updates… gpt-4.1-nano now available” to everyone, i don’t have access to this, and i need to fine-tune this model and work with. How can i get access to?