What is the impact of DeepSeek on the AI sector? 🔥

curt.kennedy · January 29, 2025, 1:44am

Maybe. But if this really is some new approach, just imagine what this model would do if it was done at a bigger scale, like how presumably o1 was made.

Training on reasoning first for a reasoning model sounds about right though. Gotta lay a good initial foundation.

It seems to me that anything AGI and beyond will need superb reasoning skills.

qrdl · January 29, 2025, 1:50am

In the paper, I think they actually start with deepseek v3 base as their initial foundation

In this paper, we take the first step toward improving language model reasoning capabilities
using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop
reasoning capabilities without any supervised data, focusing on their self-evolution through
a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ
GRPO (Shao et al., 2024) as the RL framework to improve model performance in reasoning.

This is a somewhat key detail that Jiayi-Pan also discovered in his twitter thread. The RL works better with more capable baseline models.

curt.kennedy · January 29, 2025, 1:52am

I always thought it was weird that Google didn’t do it first, since they invented the technology.

Similar to ChatGPT … why didn’t anyone create it with the API? They could have. But I think the answer lies in that OpenAI could clone whoever did create the hypothetical ChatGPT (from the API) and then undercut by running at cost.

So the stranglehold isn’t ideas, it’s execution and hardware resources. OpenAI just managed to do both the quickest.

So going back to DeepSeek … if DeepSeek can operate with lower compute resources … that’s a BIG DEAL! Also why the market freaked out.

curt.kennedy · January 29, 2025, 2:02am

Here is the full story on how they went from Zero to the DeepSeek we see now.

However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.

Reading this, it looks like they flowed some structure through Zero, and this structure came from the V3 base model.

qrdl · January 29, 2025, 2:07am

Clueful, couldn’t find anything I disagree with. Sill a bit gobsmacked that 600B can be shed on a misunderstanding.

Will be very shocked if NVDA fully recovers over the next few days. I mean, no way could the stock market be that stupid. Right? Right?

curt.kennedy · January 29, 2025, 2:18am

I totally agree with that article. Namely it’s a win for everyone, and puts pressure on the real frontier model innovators (like OpenAI) to step-up their game.

Hints from Sam Altman suggest a lot of “stepping up” is in the pipeline. That’s good for us!

qrdl · January 29, 2025, 2:37am

Ah I get the groq reference now. Yeh, this makes more sense tbh

As DeepSeek is opensource it can allow competitors like AMD and Groq to build solutions to compete against OpenAI+NVDA. Which what this link is

No doubt NVDA has all sorts of special deals with OpenAI and Anthropic to keep them in the lead.

Nasty business, but maybe this will bust it open

qrdl · January 29, 2025, 2:54am

Yeh, people tend to underestimate Mr Market

“Special thanks to the DeepSeek and SGLang teams for their close collaboration!”

The leakers claim that businesses further down the (AI and HPC) food chain are having to shell out $15,000 per MI300X unit, but this is a bargain when compared to NVIDIA’s closest competing package—the venerable H100 SXM5 80 GB professional card. Team Green, similarly, does not reveal its enterprise pricing to the wider public—Tom’s Hardware has kept tabs on H100 insider info and market leaks: "over the recent quarters, we have seen NVIDIA’s H100 80 GB HBM2E add-in-card available for $30,000, $40,000, and even much more at eBay.**

So 2x to 4x premium for nvda

N2U · January 29, 2025, 3:06am

Volatility in the stock market is mean-reverting and exhibits the wisdom of the crowd. In practice, this means that big events will introduce large swings in price for a brief period while the market figures out what the new price should be.

Deepseek is just the newest company around the block to release an open-source model that is comparable to OpenAI’s offerings. This has happened before with companies like Hugging Face, Mistral, X, Google, and Microsoft. None of these cases triggered a market reaction this big.

It’s widely known that you can run and train AI models on lesser hardware; it just takes longer and is less efficient. Deepseek performing a 6 million dollar fine-tuning job on one of their existing models to add reasoning capabilities seems very plausible to me, but that’s just not the entire cost of it.

Deepseek being a Chinese company is likely the unfortunate trigger of the current volatility in the market, as it opens up the possibility of further export restrictions on American chips manufactured in Taiwan.

TL;DR: Deepseek is just another competitor in the AI world. The market reaction is likely due to the geopolitical tensions it may create between the US and China.

qrdl · January 29, 2025, 3:14am

oooh that’s good too, for real.

It would also explain the recovery as well given Trump speaking out, congratulating DeepSeek on their achievement and calming things.

Nothing worse than the stock market going down! heh

qrdl · January 29, 2025, 3:18am

Eeeeh no. Lol. That would be free money

Unless you mean to say that Implied Volatility is mean reverting, in that volatility reverts to average volatility. Which is true. nvda should calm down at some point

N2U · January 29, 2025, 3:34am

Yep, realized volatility also tends to be smaller than implied volatility, but the release of DeepSeek seems to have impacted the average volatility and forced the market to price in a larger amount of uncertainty on chip sector stocks.

razvan.i.savin · January 29, 2025, 7:01am

Remember when Microsoft talked about taking snapshots in your Computer?

Well, is already implemented (each print screen you do now in Windows 11 show your location, date, name of the app, etc), and in combination with Operator from OpenAI will make TikTok look nice and decent in terms of spying your computer and collecting private data.

SomebodySysop · January 29, 2025, 7:06am

It’s sort of amazing that the market volatility isn’t taking into account that perhaps DeepSeek can’t do what it says it can do at scale. It’s status update has been pretty red for the past few days. https://status.deepseek.com/

I keep hearing this is just a temporary glitch, but suppose this “glitch” is part of how all this is delivered at the amazing low cost? What good is this to any competent business user if it only works sometimes?

platypus · January 29, 2025, 8:41am

In the short term, markets are inherently irrational. Too much bias and subjectivity, generally ill informed. Very little to conclude right now, but I understand people always are looking for conclusions.

merefield · January 29, 2025, 8:46am

It’s all fun until:

Better make sure your infrastructure is up to scratch before marketing your product

I guess their engineering team is scrambling to support 100m sign-ups a day right now.

And then there’s the question about whether they can sustain acceptable service levels …

dannetstudio · January 29, 2025, 8:49am

But at this point I think you are not taking into account that there are already multiple options to use the full R1 model of 671b parameters and the other variants, outside of the DeepSeek App / Web or API and it seems that more alternatives are emerging.

For example, from fireworks.ai, hyperbolic, deepinfra and others:

Even offering a greater context than that offered by the DeepSeek API, and a rate of tokens per second much higher than that of the official DeepSeek API itself.

The price is higher than the direct API, but this allows companies and users with concerns about privacy and the use of their data to use this model. In fact, I think that in development IDEs such as Cline or Cursor AI it is the most used at the moment.

And really interesting things have quickly emerged, such as Aider’s architect mode that allows combining r1 with sonnet 3.5, achieving an efficiency never seen before: R1+Sonnet set SOTA on aider’s polyglot benchmark | aider

My analysis is limited only to the programming/code and technical part regarding this model and what I see happening, at least in this field it is really very functional.

dannetstudio · January 29, 2025, 8:57am

It also went a little unnoticed that a few days ago they released an open model for image generation:

I think the most interesting thing about all this is not Deepseek itself or its own infrastructure, but the fact that it is Open Source, and how these models are very quickly being integrated into companies, even in the USA, and the community (at least of programmers) is using it massively.

SomebodySysop · January 29, 2025, 9:19am

No, I did not realize that. Thank you for the info!

ryanscottwright · January 29, 2025, 2:24pm

I’m building a tool right now that will get around the DeepSeek congestion and will be available as a Windows standalone executable and an Android APK version. Once tested and working I will be cancelling my OpenAI account

Topic		Replies	Views
AI Pulse News Roundup (December 2024 Edition) Community news , in-the-news , ai-pulse-roundup	64	1034	December 31, 2024
Why strawberry is not interesting to me Community chatgpt	85	1661	September 16, 2024
Discussion thread for "Foundational must read GPT/LLM papers" Community gpt-4 , gpt-35-turbo , chatgpt , research	75	10453	September 3, 2024
Foundational must read GPT/LLM papers Community research , large-language-model	79	68947	May 16, 2024
Day 12 of Shipmas: New frontier models o3 and o3-mini announcement Community shipmas	71	8059	December 26, 2024

Related topics