Hi, have a bunch of multi-turn agent traces (approx 12-16 steps long) generated from o4-mini, want to try the new RFT fine-tuning API. In the examples posted online, the setup is single-turn convo, but since I am using for an agentic usecase, curious if submitting an entire agent trace of multiple Q/A’s along with a grader for the entire agent sequence would work well? Or am I better off isolating and picking out specific agentic turns?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Autoregressive Fine-Tuning for Chat Models | 0 | 187 | July 10, 2024 | |
| Fine tuning with function calling / tools help! | 2 | 272 | November 27, 2024 | |
| Instrruction tuning for GPT API | 7 | 643 | March 11, 2024 | |
| How does gpt-3.5-turbo fine-tuning work? | 10 | 2019 | September 11, 2023 | |
| Extensive documentation about fine tuning | 3 | 331 | March 27, 2025 |