Faster Api throughput using Cerebras

lmtr0 · September 14, 2025, 3:56pm

Hi there, good day!

I am quite interested in how fast Groq and Cerebras made their systems. And I want to know if it would be possible for OpenAI to use their compute power to speed up GPT models (in theory).

From OpenRouter, we can see a very significant increase in throughput compared to other providers:

GPT OSS 120B:

7x more tokens (compared to Novita).

With Qwen, the gap is 16 times (Compared with Together) (I cannot include more than one img and no links, so use OpenRouter to search for the information please).

So, is GPT not able to run on Cerebras hardware? Am I wrong and missing a critical detail?

Could OpenAI allow us to run GPT on Cerebras similar to how we can run GPT on Azure? Or could they offer the models to us running on Cerebras systems? I would even pay more per token to run it there if we have speed improvements.

There is also Groq, but Cerebras tends to be faster under the same scenarios (I tested 30 prompts 5 times with GPT OSS 120B and 20B).

jeffvpace · September 14, 2025, 5:07pm

I think that, in the long run, OpenAI will have more compute than anyone - just look at the recent deals made with Oracle, Broadcom, and Microsoft.

Topic		Replies	Views
Inference speed of different models API	1	613	July 4, 2025
Inquiry About Maximum Rate Limit for GPT-3.5-turbo-16k Model API api-rate-increase , rate-limit	7	1075	November 1, 2023
Gpt-4o tokens per second comparable to gpt-3.5-turbo. Data and analysis API gpt-4 , gpt-35-turbo , playground , gpt-4-turbo , gpt-4o	3	13456	August 16, 2024
Optimizing OpenAI API Integration on High-Performance Laptops API chatgpt , api	1	247	January 15, 2025
GPT-3.5 and GPT-4 API response time measurements - FYI API	19	38788	February 6, 2024

Faster Api throughput using Cerebras

Related topics