Best way to download and run gpt-oss-20b

What’s the best way to download and run gpt-oss-20b locally ? I

is there an official Hugging Face checkpoint ?

what are the minimum VRAM/CPU specs?

4 Likes

Easiest way? Ollama.

For more (including specs), try looking into these:

4 Likes

Adding to @aprendendo.next’s great reply: the models need unified memory.
I managed to run the small model with 6GB of VRAM + 20GB of RAM, and the large one with 64GB of RAM.
Latency is fine, tokens stream in at about normal reading speed.

4 Likes

Out of interest, what API model might you compare their overall usefulness with?

Do they exceed GPT 3.5 turbo “performance”? (as in efficacy not speed)

3 Likes

For my use cases I could replace o3-mini and sometimes o4-mini with the large OSS model.
I didn’t spend too much time with the smaller version as latency is more important to me when it comes to the smaller models and then using more Vram is obviously advised.

3 Likes

I’m looking into runpod.io to get my analysis app migrate to 20b OSS. 10 tasks all based on same core model and 10 fine-tuned adapters preloaded in memory… anyone experienced with this, would be welcome for a reasonable budget.

what is the cheapest gpu vm provider, other than vast.ai? .

I’ve been running this on my m2 pro mac mini for almost two weeks non stop now and with all the other stuff that has been running on my machine the RAM usage never went above 50% (32GB model). Also the GPU% never hits the ceiling for me. I’ve been using it regularly myself on agents that are mostly Q&A, meta search, document summarization, webcrawl type scenarios. I run it with ollama serve and openapi compatibility.
Hope that gives you a good idea of what is needed. Any M type machine with at least 16GBs of RAM should suffice for your own development/personal use.