Guidance Needed: GPT-OSS 20B Fine-Tuning with Unsloth → GGUF → Ollama → Triton (vLLM / TensorRT-LLM)

Gaurav_Modak · February 6, 2026, 12:08pm

I am currently fine-tuning the GPT-OSS 20B model using Unsloth with HuggingFace TRL (SFTTrainer).

Long-term goal

Serve the model in production using Triton with either vLLM or TensorRT-LLM as the backend
Short-term / initial deployment using Ollama (GGUF)

Current challenge
GPT-OSS uses a Harmony-style chat template, which includes:

developer role
Explicit EOS handling
thinking / analysis channels
Tool / function calling structure

When converting the fine-tuned model to GGUF and deploying it in Ollama using the default GPT-OSS Modelfile, I am running into ambiguity around:

Whether the default Jinja chat template provided by GPT-OSS should be modified for Ollama compatibility
How to correctly handle:
- EOS token behavior
- Internal reasoning / analysis channels
- Developer role alignment
How to do this without degrading the model’s default performance or alignment

Constraints / Intent

I already have training data prepared strictly in system / user / assistant format
I want to:
- Preserve GPT-OSS’s native behavior as much as possible
- Perform accurate, non-destructive fine-tuning
- Avoid hacks that work short-term but break compatibility with vLLM / TensorRT-LLM later

What I’m looking for

Has anyone successfully:
- Fine-tuned GPT-OSS
- Converted it to GGUF
- Deployed it with Ollama
- While preserving the Harmony template behavior?
If yes:
- Did you modify the chat template / Modelfile?
- How did you handle EOS + reasoning channels?
- Any pitfalls to avoid to keep it production-ready for Triton later?

Any concrete guidance, references, or proven setups would be extremely helpful.

sergeliatko · February 6, 2026, 8:43pm

Bookmarking this one as working on same thing

Topic		Replies	Views
Tool Use Differences Between gpt-oss-20b and o3-mini in Multi-Agent Setup Open Models	10	919	September 29, 2025
Fine-tuning GPT to learn a new coding language Prompting codex , chatgpt , plugin-development , fine-tuning , api	3	3610	December 24, 2023
Model's Tuning and Re-Tuning Problems Community lost-user	3	243	June 4, 2025
Fine-Tuning an LLM for Dynamic JSON Configuration Generation Community chatgpt	2	1217	March 17, 2025
Can a fine-tuned GPT model handle dynamic information during inference, and is it effective to use markers (e.g., brackets) in training data to indicate variable parts for better generalization? API	3	128	January 12, 2025

Guidance Needed: GPT-OSS 20B Fine-Tuning with Unsloth → GGUF → Ollama → Triton (vLLM / TensorRT-LLM)

Related topics