AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

sherwinwu · December 17, 2024, 6:55pm

We are actively working on a guide around how to approach reinforcement fine-tuning, and will publish it once it’s ready! But there are still some details we wanted to work out first.

At a high level though, I would say to keep several things in mind:

The underlying dataset are “tasks” – a set of instructions paired with an output that is the result of the task
Make sure the task is autogradable via the options we make available in the API (it should be easy to verify if the task was done correctly or not). We currently support a set of graders, but will expand over time. I know this is a bit hard because you can’t see the set of graders available until we launch more broadly – but easily gradable tasks (i.e. string match) are more likely to work out of the box.
Make sure that the task is clear enough that if expert humans do it, they also converge onto the same answer.

More to come here soon!

Topic		Replies	Views
Introducing ChatGPT and Whisper APIs Announcements whisper	77	20280	December 13, 2023
All the questions addressed by the API team during the December 17, 2024 AMA Community community , ama , shipmas	3	1339	December 17, 2024
Announcing GPT-4o in the API! Announcements	130	109679	July 4, 2024
New models and developer products announced at DevDay Announcements announcement	70	17794	February 16, 2024
New features in the Assistants API! API assistants-api	49	11864	April 25, 2024

AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

Related topics