What is the impact of DeepSeek on the AI sector? šŸ”„

Maybe. But if this really is some new approach, just imagine what this model would do if it was done at a bigger scale, like how presumably o1 was made.

Training on reasoning first for a reasoning model sounds about right though. Gotta lay a good initial foundation.

It seems to me that anything AGI and beyond will need superb reasoning skills.

1 Like

In the paper, I think they actually start with deepseek v3 base as their initial foundation

In this paper, we take the first step toward improving language model reasoning capabilities
using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop
reasoning capabilities without any supervised data, focusing on their self-evolution through
a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ
GRPO (Shao et al., 2024) as the RL framework to improve model performance in reasoning.

This is a somewhat key detail that Jiayi-Pan also discovered in his twitter thread. The RL works better with more capable baseline models.

1 Like

I always thought it was weird that Google didnā€™t do it first, since they invented the technology.

Similar to ChatGPT ā€¦ why didnā€™t anyone create it with the API? They could have. But I think the answer lies in that OpenAI could clone whoever did create the hypothetical ChatGPT (from the API) and then undercut by running at cost.

So the stranglehold isnā€™t ideas, itā€™s execution and hardware resources. OpenAI just managed to do both the quickest.

So going back to DeepSeek ā€¦ if DeepSeek can operate with lower compute resources ā€¦ thatā€™s a BIG DEAL! Also why the market freaked out.

2 Likes

Here is the full story on how they went from Zero to the DeepSeek we see now.

However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.

Reading this, it looks like they flowed some structure through Zero, and this structure came from the V3 base model.

2 Likes

Clueful, couldnā€™t find anything I disagree with. Sill a bit gobsmacked that 600B can be shed on a misunderstanding.

Will be very shocked if NVDA fully recovers over the next few days. I mean, no way could the stock market be that stupid. Right? Right? :smiley:

2 Likes

I totally agree with that article. Namely itā€™s a win for everyone, and puts pressure on the real frontier model innovators (like OpenAI) to step-up their game.

Hints from Sam Altman suggest a lot of ā€œstepping upā€ is in the pipeline. Thatā€™s good for us!

3 Likes

Ah I get the groq reference now. Yeh, this makes more sense tbh

As DeepSeek is opensource it can allow competitors like AMD and Groq to build solutions to compete against OpenAI+NVDA. Which what this link is

No doubt NVDA has all sorts of special deals with OpenAI and Anthropic to keep them in the lead.

Nasty business, but maybe this will bust it open

2 Likes

Yeh, people tend to underestimate Mr Market

ā€œSpecial thanks to the DeepSeek and SGLang teams for their close collaboration!ā€

The leakers claim that businesses further down the (AI and HPC) food chain are having to shell out $15,000 per MI300X unit, but this is a bargain when compared to NVIDIAā€™s closest competing packageā€”the venerable H100 SXM5 80 GB professional card. Team Green, similarly, does not reveal its enterprise pricing to the wider publicā€”Tomā€™s Hardware has kept tabs on H100 insider info and market leaks: "over the recent quarters, we have seen NVIDIAā€™s H100 80 GB HBM2E add-in-card available for $30,000, $40,000, and even much more at eBay.**

So 2x to 4x premium for nvda

1 Like

Volatility in the stock market is mean-reverting and exhibits the wisdom of the crowd. In practice, this means that big events will introduce large swings in price for a brief period while the market figures out what the new price should be.

Deepseek is just the newest company around the block to release an open-source model that is comparable to OpenAIā€™s offerings. This has happened before with companies like Hugging Face, Mistral, X, Google, and Microsoft. None of these cases triggered a market reaction this big.

Itā€™s widely known that you can run and train AI models on lesser hardware; it just takes longer and is less efficient. Deepseek performing a 6 million dollar fine-tuning job on one of their existing models to add reasoning capabilities seems very plausible to me, but thatā€™s just not the entire cost of it.

Deepseek being a Chinese company is likely the unfortunate trigger of the current volatility in the market, as it opens up the possibility of further export restrictions on American chips manufactured in Taiwan.

TL;DR: Deepseek is just another competitor in the AI world. The market reaction is likely due to the geopolitical tensions it may create between the US and China.

5 Likes

oooh thatā€™s good too, for real.

It would also explain the recovery as well given Trump speaking out, congratulating DeepSeek on their achievement and calming things.

Nothing worse than the stock market going down! heh

2 Likes

Eeeeh no. Lol. That would be free money

Unless you mean to say that Implied Volatility is mean reverting, in that volatility reverts to average volatility. Which is true. nvda should calm down at some point

2 Likes

Yep, realized volatility also tends to be smaller than implied volatility, but the release of DeepSeek seems to have impacted the average volatility and forced the market to price in a larger amount of uncertainty on chip sector stocks.

5 Likes

Remember when Microsoft talked about taking snapshots in your Computer?

Well, is already implemented (each print screen you do now in Windows 11 show your location, date, name of the app, etc), and in combination with Operator from OpenAI will make TikTok look nice and decent in terms of spying your computer and collecting private data.


1 Like

Itā€™s sort of amazing that the market volatility isnā€™t taking into account that perhaps DeepSeek canā€™t do what it says it can do at scale. Itā€™s status update has been pretty red for the past few days. https://status.deepseek.com/

I keep hearing this is just a temporary glitch, but suppose this ā€œglitchā€ is part of how all this is delivered at the amazing low cost? What good is this to any competent business user if it only works sometimes?

1 Like

In the short term, markets are inherently irrational. Too much bias and subjectivity, generally ill informed. Very little to conclude right now, but I understand people always are looking for conclusions.

4 Likes

Itā€™s all fun until:

Better make sure your infrastructure is up to scratch before marketing your product :slight_smile:

I guess their engineering team is scrambling to support 100m sign-ups a day right now. :sweat_smile:

And then thereā€™s the question about whether they can sustain acceptable service levels ā€¦

2 Likes

But at this point I think you are not taking into account that there are already multiple options to use the full R1 model of 671b parameters and the other variants, outside of the DeepSeek App / Web or API and it seems that more alternatives are emerging.

For example, from fireworks.ai, hyperbolic, deepinfra and others:

Even offering a greater context than that offered by the DeepSeek API, and a rate of tokens per second much higher than that of the official DeepSeek API itself.

The price is higher than the direct API, but this allows companies and users with concerns about privacy and the use of their data to use this model. In fact, I think that in development IDEs such as Cline or Cursor AI it is the most used at the moment.

And really interesting things have quickly emerged, such as Aiderā€™s architect mode that allows combining r1 with sonnet 3.5, achieving an efficiency never seen before: R1+Sonnet set SOTA on aiderā€™s polyglot benchmark | aider

My analysis is limited only to the programming/code and technical part regarding this model and what I see happening, at least in this field it is really very functional.

5 Likes

It also went a little unnoticed that a few days ago they released an open model for image generation:

I think the most interesting thing about all this is not Deepseek itself or its own infrastructure, but the fact that it is Open Source, and how these models are very quickly being integrated into companies, even in the USA, and the community (at least of programmers) is using it massively.

3 Likes

No, I did not realize that. Thank you for the info!

1 Like

Iā€™m building a tool right now that will get around the DeepSeek congestion and will be available as a Windows standalone executable and an Android APK version. Once tested and working I will be cancelling my OpenAI account

1 Like