Maybe. But if this really is some new approach, just imagine what this model would do if it was done at a bigger scale, like how presumably o1 was made.
Training on reasoning first for a reasoning model sounds about right though. Gotta lay a good initial foundation.
It seems to me that anything AGI and beyond will need superb reasoning skills.
In the paper, I think they actually start with deepseek v3 base as their initial foundation
In this paper, we take the first step toward improving language model reasoning capabilities
using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop
reasoning capabilities without any supervised data, focusing on their self-evolution through
a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ
GRPO (Shao et al., 2024) as the RL framework to improve model performance in reasoning.
This is a somewhat key detail that Jiayi-Pan also discovered in his twitter thread. The RL works better with more capable baseline models.
I always thought it was weird that Google didnāt do it first, since they invented the technology.
Similar to ChatGPT ā¦ why didnāt anyone create it with the API? They could have. But I think the answer lies in that OpenAI could clone whoever did create the hypothetical ChatGPT (from the API) and then undercut by running at cost.
So the stranglehold isnāt ideas, itās execution and hardware resources. OpenAI just managed to do both the quickest.
So going back to DeepSeek ā¦ if DeepSeek can operate with lower compute resources ā¦ thatās a BIG DEAL! Also why the market freaked out.
Here is the full story on how they went from Zero to the DeepSeek we see now.
However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.
Reading this, it looks like they flowed some structure through Zero, and this structure came from the V3 base model.
I totally agree with that article. Namely itās a win for everyone, and puts pressure on the real frontier model innovators (like OpenAI) to step-up their game.
Hints from Sam Altman suggest a lot of āstepping upā is in the pipeline. Thatās good for us!
āSpecial thanks to the DeepSeek and SGLang teams for their close collaboration!ā
The leakers claim that businesses further down the (AI and HPC) food chain are having to shell out $15,000 per MI300X unit, but this is a bargain when compared to NVIDIAās closest competing packageāthe venerable H100 SXM5 80 GB professional card. Team Green, similarly, does not reveal its enterprise pricing to the wider publicāTomās Hardware has kept tabs on H100 insider info and market leaks: "over the recent quarters, we have seen NVIDIAās H100 80 GB HBM2E add-in-card available for $30,000, $40,000, and even much more at eBay.**
Volatility in the stock market is mean-reverting and exhibits the wisdom of the crowd. In practice, this means that big events will introduce large swings in price for a brief period while the market figures out what the new price should be.
Deepseek is just the newest company around the block to release an open-source model that is comparable to OpenAIās offerings. This has happened before with companies like Hugging Face, Mistral, X, Google, and Microsoft. None of these cases triggered a market reaction this big.
Itās widely known that you can run and train AI models on lesser hardware; it just takes longer and is less efficient. Deepseek performing a 6 million dollar fine-tuning job on one of their existing models to add reasoning capabilities seems very plausible to me, but thatās just not the entire cost of it.
Deepseek being a Chinese company is likely the unfortunate trigger of the current volatility in the market, as it opens up the possibility of further export restrictions on American chips manufactured in Taiwan.
TL;DR: Deepseek is just another competitor in the AI world. The market reaction is likely due to the geopolitical tensions it may create between the US and China.
Unless you mean to say that Implied Volatility is mean reverting, in that volatility reverts to average volatility. Which is true. nvda should calm down at some point
Yep, realized volatility also tends to be smaller than implied volatility, but the release of DeepSeek seems to have impacted the average volatility and forced the market to price in a larger amount of uncertainty on chip sector stocks.
Remember when Microsoft talked about taking snapshots in your Computer?
Well, is already implemented (each print screen you do now in Windows 11 show your location, date, name of the app, etc), and in combination with Operator from OpenAI will make TikTok look nice and decent in terms of spying your computer and collecting private data.
Itās sort of amazing that the market volatility isnāt taking into account that perhaps DeepSeek canāt do what it says it can do at scale. Itās status update has been pretty red for the past few days. https://status.deepseek.com/
I keep hearing this is just a temporary glitch, but suppose this āglitchā is part of how all this is delivered at the amazing low cost? What good is this to any competent business user if it only works sometimes?
In the short term, markets are inherently irrational. Too much bias and subjectivity, generally ill informed. Very little to conclude right now, but I understand people always are looking for conclusions.
But at this point I think you are not taking into account that there are already multiple options to use the full R1 model of 671b parameters and the other variants, outside of the DeepSeek App / Web or API and it seems that more alternatives are emerging.
For example, from fireworks.ai, hyperbolic, deepinfra and others:
Even offering a greater context than that offered by the DeepSeek API, and a rate of tokens per second much higher than that of the official DeepSeek API itself.
The price is higher than the direct API, but this allows companies and users with concerns about privacy and the use of their data to use this model. In fact, I think that in development IDEs such as Cline or Cursor AI it is the most used at the moment.
My analysis is limited only to the programming/code and technical part regarding this model and what I see happening, at least in this field it is really very functional.
I think the most interesting thing about all this is not Deepseek itself or its own infrastructure, but the fact that it is Open Source, and how these models are very quickly being integrated into companies, even in the USA, and the community (at least of programmers) is using it massively.
Iām building a tool right now that will get around the DeepSeek congestion and will be available as a Windows standalone executable and an Android APK version. Once tested and working I will be cancelling my OpenAI account