Hi everyone
How is everyone doing? I am excited to introduce
EpistemeAI/gpt-oss-20b-RL · Hugging Face
- This model is based on GPT-OSS-20B and has been fine-tuned using the Unsloth RL framework to optimize inference efficiency while mitigating vulnerabilities such as reward hacking during reinforcement learning from human feedback (RLHF)–style training. The fine-tuning process emphasizes alignment robustness and efficiency, ensuring the model preserves its reasoning depth without incurring excessive computational overhead. The model to deliver 3x faster inference for gpt-oss-rl at ~ 21 tokens/s. For BF16, this model also achieves the fastest inference (~30 tokens/s)*
EpistemeAI/VibeCoder-20B-alpha-0.001 · Hugging Face
- This is an newer version (0.001) of the first-generation vibe-code alpha(preview) LLM. It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts. Compared to earlier-generation LLMs, it has a lower prompt-engineering overhead and smoother latent-space interpolation, making it easier to guide toward usable code.
Please test or comment or any feedback with the fine tune model? I really appreciate it.