What is the impact of DeepSeek on the AI sector? đŸ”„

Don’t have subscription but found this non paywalled article that claims Meta is scrambling a taskforce to figure out how DeepSeek does it such cost efficiency. They are concerned that DeepSeek R1 will outperform even the next Llama release. More info: Meta is reportedly scrambling multiple ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price | Fortune

3 Likes

I don’t remember where I saw this, but I clearly remember cyber security news talking about how gpt-4 was stolen.

Now, with all the news around R1, I believe they won’t be able to catch up with the newer models coming out this year as long as security around the models as improved.

o3-pro and gpt-5 will be miles ahead the competition. It very much reminds me of when of Anthropic’s Claude was catching up last year. Meta is still going to release ollama-4. I haven’t seen any real security analysis on the local models that was posed by ollama.

I believe the markets are over selling. There are concerns regarding tariffs on Feb 1st, but currency union similar to Europe in North America should certainly increase by at least 20% the USD.

Competition is great, but I don’t think Sam or anyone at OpenAI (ok maybe the security folks) are loosing sleep over this.

Keep calm and watch most of what I said come true.

4 Likes

Thanks for sharing!
I found some other commentary mentioning that it’s about the compute needed for training the models because it’s not clearly explained in the R1 paper and could be misinterpreted.

Either way, they didn’t open-source the training data, which is understandable I suppose? Now that the model has been released it’s all about performance even if it’s not possible to recreate the model from scratch.

For the interested reader this paper about DeepSeek math may hold additional information.

4 Likes

Thanks for the link! There is a great post from Philipp Schmid from HuggingFace: Philipp Schmid on LinkedIn: Don’t fall for false DeepSeek R1 news! Deepseek R1 made it to mainstream
 | 206 comments

And there were some additional analyses in FT and other sources this morning. Essentially: if we include all the training and experimentation costs (not just the final checkpoint training at the alleged $6m), the spend was probably close to $1b; their team team relied on some heavy hacking of the GPU ISA to overcome memory bandwidth limitations, which suggests the team is composed of some absolute star engineers.

My takeaway, contrary to social and news media buzz, is that you can’t just deploy a small team of people with a few million in funding and reproduce this. You still require a massive upfront investment, both monetary and with talent.

4 Likes

Unfortunate but true.
This is somewhat typical for RL and ML. Looking at the older papers one will find references to presentations, for example, but no source is available what actually happened at these conferences.
The secret sauce recipe is supposed to be well kept.

Either way, I read the first comments in the LikedIn ‘discussion’ and decided that this is not a good way to spend my time :slight_smile: Thanks for sharing anyways.

6 Likes

It’s very obvious, at least to me, that deepseek trained on outputs from all the models. But this is how it’s all going forward. Everyone is going to be doing this.

Look at all the evals being contributed to arxiv, evals being perhaps the most important type of content to be contributed. Do you think OpenAI is coming up frontier benchmark? Obv no. The greatest mathematicians in the world are, many of which are chinese.

It is this benchmark which is responsible for advances in AI. Validation is model selection and training. This is the real work. What OpenAI is doing is just mostly smashing nns and gpus together.

Everyone needs to get over themselves. AGI is a global and collective effort. We either do this together and survive or end times.

3 Likes

I mean, get real. We all know that OpenAI completely ‘extricated’ (Ilya) like 90% of its IP from Google. But reasonably so, Google was using (abusing) its search engine revenue to hoard AI experts in a manner that was pretty despicable.

Deepseek (and others, soon) is just returning the favor.

This is so absurdly incorrect. o1 and r1 or whatever the various CoT are not even SOTA compared to botique CoT agentic solutions out in the wild. They are just what is in front of you and hitting the arena.

There are so many superior 3rd party CoT solutions, especially when you optimize for a domain which of course you’d want to.

What matters is the base model that deepseek released over christmas. That was the watershed moment. What we’re seeing right now is just a late reaction.

I mean, for real. People plugged in Deepseek into their CoT pipelines a month ago. They’ve already not just reproduced and but have achieved superior results.

tbf some of the multimodal computer controlling stuff is pretty advanced and it remains to be seen how quickly models will catch up

But really what we fundamentally care about is math and code, because that’s where all the AI alg improvement will come from which will feed takeoff.

There is this effort, btw, GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1 though I’m skeptical it will be able beat cathedralish type efforts, but may come close.

btw, qwen 2.5 just came out (again, base models is where its at, ignore the CoT stuff, it’s just api spend and not as secret sauce as they say it is):

2 Likes

The story behind OpenAI and DeepSeek: :grin:

6 Likes

I think this applies to AI perfectly:

1 Like

@sashirestela check this:

2 Likes

Lol “real data” https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

1 Like

Yep, absolutely. However, new players could break and disrupt NVidia’s stranglehold over the industry, perhaps opening some daylight for AMD and others.

1 Like

Yes, forgot the quotes, “Real Data” :joy:

1 Like

We’re all standing on shoulders here, no less so in LLM dev

3 Likes

I have my doubts about this.

There was rumors of a “wall” being hit. I don’t think it had to do with benchmarking ability, but being able to augment non-reasoning models with the output of reasoning ones.

The reality is the data already exists. With the latter case it’s cleverly bouncing the model around in it’s own headspace until it has extinguished all possibilities, and then finalizing with a solution. We have all done CoT/Reasoning before it became baked in and are aware of it’s power.

These models, especially in commercial applications fail not because of their intelligence, but because of their attention.

On a technological level I would agree. However, as someone who has to report to clients that have no understanding of this stuff and live off metrics and new reports, I’d imagine it’s been a very hard time explaining to investors why they need billions and billions of dollars when someone else has come up with something comparable for a fraction, and that can be locally run.

So, I wouldn’t be surprised if OpenAI still has a massive led (it’s well-known that everyone just uses up their output and therefore will always be playing catch-up), I just am considering the practical applications of these models, which is what investors are interested in.

“Solving world hunger, advancing medicine”. As if the majority of the world’s problems aren’t solvable and stuck in political & corporate limbo.

In my experience people just want these models to solve small tasks and ease their time so that they can enjoy life a little more.

We barely have a honda civic, and yet still it’s like we’re being sold an 18-wheeler.

We’re slipping into ideas of grandeur when even the small tasks have inconsistencies and failures

4 Likes

I like this response from OpenAI to the “DeepSeek moment”:

2 Likes

https://arxiv.org/pdf/2501.12948

Pretty freaking wild for sure. The speed at which this is coming I can only believe is because of the accelerant caused by the models themselves. People above are expressing doubt and disbelief because they are tied to preconceptions about how the world they are familiar with used to work.

It’s all changing, and in particular it will change in this domain first and fastest as it’s obviously the most optimal canary in the coal mine.

In general, everyone needs to start shifting their world model in a way that is properly leveraging these models.

1 Like

I personally feel like DeepSeek is giving us a glimpse that the future for everyday (non enterprise / commercial) use is going to look a lot like the Desktop or home computer boom.

Frontier models like OAI may end up being like the “cloud” computing (it kind of already is tbh), where other developers increase the efficiency and compactness of models so that people can have their own localized models.

We’re also reaching the point where models don’t have to be “the best” to be “good enough for most use cases”. I think DeepSeek demonstrates this quite well.

People also like cheap. OAI seems to have forgotten that.

3 Likes

Except in many cases, DeepSeek is the best :slight_smile:

1 Like

Probably one immediate impact coming to a chatgpt near you very soon.

2 Likes