New fine tune gpt oss model feedback

tom.tyiu · October 3, 2025, 3:07am

Hi everyone

How is everyone doing? I am excited to introduce

This model is based on GPT-OSS-20B and has been fine-tuned using the Unsloth RL framework to optimize inference efficiency while mitigating vulnerabilities such as reward hacking during reinforcement learning from human feedback (RLHF)–style training. The fine-tuning process emphasizes alignment robustness and efficiency, ensuring the model preserves its reasoning depth without incurring excessive computational overhead. The model to deliver 3x faster inference for gpt-oss-rl at ~ 21 tokens/s. For BF16, this model also achieves the fastest inference (~30 tokens/s)*

This is an newer version (0.001) of the first-generation vibe-code alpha(preview) LLM. It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts. Compared to earlier-generation LLMs, it has a lower prompt-engineering overhead and smoother latent-space interpolation, making it easier to guide toward usable code.

Please test or comment or any feedback with the fine tune model? I really appreciate it.

tom.tyiu · October 4, 2025, 2:03am

any feedback? I like to have everyone’s opinion

Topic		Replies	Views
OpenAI's open weight models are here: gpt-oss-120b and 20b Open Models open-source , announcement , community	18	4812	August 11, 2025
Fine-tuning GPT to learn a new coding language Prompting codex , chatgpt , plugin-development , fine-tuning , api	3	3596	December 24, 2023
RLHF after Fine-Tuning Davinci? API	7	2108	February 21, 2024
OpenAI releases updates to fine-tuning and custom models Community fine-tuning	2	1051	April 5, 2024
Gpt-3.5-turbo-0125 is available via API now. Share your first impressions API	6	3328	February 3, 2024