Codex-Spark marks the first milestone in our partnership with Cerebras, which we announced in January. Codex-Spark is optimized to feel near-instant when served on ultra-low latency hardware—delivering more than 1000 tokens per second while remaining highly capable for real-world coding tasks.
At launch, Codex-Spark has a 128k context window and is text-only. During the research preview, Codex-Spark will have its own rate limits and usage will not count towards standard rate limits. However, when demand is high, you may see limited access or temporary queuing as we balance reliability across users.
Yes, the context window of the Spark model is 128,000 while 5.2-Codex has a length of 400,000 tokens. Switching to the faster model mid-session needs some additional management.
I plan to use it for single, clearly defined tasks.
With some adjustments to the plan, full Codex should supply these to sub-agents to achieve both speed and support for complex implementations.
Curious if anyone finds any “quality” drop worthwhile because of the speed benefit?
With one project running I find full codex limits throughput of features - whereas if you have two or three concurrent “projects”, it is fast enough when you have to switch between each project to review and contribute.
“Real time” is definitely very attractive if you are just doing “one project” at a time.