New fine tune gpt oss model feedback

Hi everyone

How is everyone doing? I am excited to introduce

EpistemeAI/gpt-oss-20b-RL · Hugging Face

  • This model is based on GPT-OSS-20B and has been fine-tuned using the Unsloth RL framework to optimize inference efficiency while mitigating vulnerabilities such as reward hacking during reinforcement learning from human feedback (RLHF)–style training. The fine-tuning process emphasizes alignment robustness and efficiency, ensuring the model preserves its reasoning depth without incurring excessive computational overhead. The model to deliver 3x faster inference for gpt-oss-rl at ~ 21 tokens/s. For BF16, this model also achieves the fastest inference (~30 tokens/s)*

EpistemeAI/VibeCoder-20B-alpha-0.001 · Hugging Face

  • This is an newer version (0.001) of the first-generation vibe-code alpha(preview) LLM. It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts. Compared to earlier-generation LLMs, it has a lower prompt-engineering overhead and smoother latent-space interpolation, making it easier to guide toward usable code.

Please test or comment or any feedback with the fine tune model? I really appreciate it.

4 Likes

any feedback? I like to have everyone’s opinion