Venture Step

DeepSeek vs. NVIDIA: The Future of AI Chip Economics

DeepSeek's new AI model shocked the industry with its efficiency, challenging NVIDIA's dominance. Is this the future of AI training? Listen to the full episode to learn more.

Dalton Anderson

11 Feb 2025 • 12 min read

TL;DR

DeepSeek's hyper-efficient AI model, allegedly built for just $6M, questions the necessity of NVIDIA's expensive GPUs, hinting at a major shift in AI economics. #VentureStep #AI #DeepSeek

INTRODUCTION

What if a small team, working on a "side project" with older hardware and a tiny budget, could create an AI model that competes directly with a juggernaut built on billions of dollars in funding and computational power? ¹¹That was the story that sent a wave of panic through the AI community a few weeks ago when DeepSeek released its R1 model. ²²The model went toe-to-toe with giants like OpenAI's GPT-4, Google's Gemini, and Meta's Llama, all while claiming it was developed for a fraction of the cost, using limited, older NVIDIA GPUs. ³³³

In this episode of Venture Step, host Dalton Anderson waits for the dust to settle before diving into the extraordinary claims surrounding DeepSeek. ⁴He questions the narrative, exploring the possibility that the news was strategically released by its backer—a quantitative venture capital firm—to manipulate the market for financial gain. ⁵⁵The story seemed too good to be true: an unknown company emerges from the shadows with a revolutionary model, shaking the foundations of the industry with claims of unprecedented efficiency. ⁶

Dalton breaks down the technology behind DeepSeek's success, particularly its use of a "Mixture of Experts" (MOE) architecture, and discusses the company's unconventional long-term hiring philosophy. ⁷⁷⁷More importantly, he analyzes the massive implications for the AI hardware market, specifically NVIDIA. ⁸As titans like Google, Meta, and Amazon begin developing their own in-house AI chips, is NVIDIA's reign as the indispensable "shovel seller" of the AI gold rush coming to an end? ⁹⁹

KEY TAKEAWAYS

DeepSeek's "Mixture of Experts" (MOE) architecture enables incredible computational efficiency by activating only a fraction of its total parameters for any given task, challenging the idea that AI progress requires brute-force computational power. ¹⁰¹⁰¹⁰
The narrative of a cheap "side project" is highly suspect, as one research firm estimates DeepSeek's actual server expenditures could be as high as $1.5 billion, suggesting the story may have been a strategic play to create market volatility. ^11111111
Major tech companies like Google, Meta, and Amazon are developing their own AI chips to reduce their dependency on NVIDIA, signaling a significant long-term threat to NVIDIA's market dominance. ¹²¹²¹²
NVIDIA's true long-term value isn't just selling chips, but its vertically integrated ecosystem that includes robotics, simulation platforms like Omniverse, and comprehensive development kits. ^13131313
When compared directly, DeepSeek's model output is praised for its clean, well-structured, and easy-to-read format, while Google's Gemini provides a more detailed breakdown of its thought process. ^1414141414

FULL CONVERSATION

Dalton Anderson: Welcome to VentureStep podcast, where we discuss entrepreneurship, industry trends, and the occasional book review. ¹⁵What if I told you that you could get access to cutting-edge AI models for a fraction of the cost? ¹⁶Well, that was a story that broke a couple of weeks ago with DeepSeek's R1 model. ¹⁷

The Shockwave: DeepSeek's Unbelievable Claims

Dalton Anderson: DeepSeek recently released a computational model, an LLM, that was going toe-to-toe with the juggernauts of OpenAI, Google's Gemini, and Meta's open-source model. ¹⁸ And the weirdest thing was the claims that they were having. They claimed that it was a side project. ¹⁹It had limited GPU usage and they were using older Nvidia GPUs that aren't as good because of the restrictions related to Nvidia exporting certain types of GPUs used to train AI. ²⁰And then on top of that, it was free and came out of nowhere. ²¹

It was just this whole wave of, my gosh, like panic. The AI community was in panic. ²²They didn't know what hit them. ²³

Dalton Anderson: It was a complete disaster. Not only is this really good, but who are these people? 24And then they're like, well, we only spent like $6 million on this. 25 And also it was like a side project that we worked on in our free time. And everyone else is like, wait, we're spending billions of dollars here. 26What's going on? 27And so the whole reason why I wanted to wait a couple of weeks before I talked about it is because it seemed too good to be true. 28 Like someone coming out of the shadows has this amazing model. No one's heard of the shop before and they're saying it's a side project and they use a limited amount of GPUs. 29

A Theory of Market Manipulation?

Dalton Anderson: So I wanted to wait before I talked about DeepSeek simply because I don't know if this stuff is true. ³⁰Some of it may be true, some might not be, but what I do know is that they did have backing from a quantitative venture capitalist firm that uses deep learning to trade the markets. ³¹ And I was like, hmm. It seems to be a good opportunity to claim all these things. They're just a claim, a rumor, or hearsay that all these things might be true, but it freaked the market out enough that NVIDIA dropped 20% or something and other companies had similar reactions. ^32323232

And so it's a perfect opportunity to just get in if you're waiting to get in. But it was also a perfect opportunity to short those stocks or those financial securities and make an insane amount of money. ³³ An insane amount of money. And you could do those things, no problem if you're in China, because it's really hard to enforce those kinds of transactions or investigate those things when you're not on the soil which the market is on. ³⁴ But that's just a theory. I don't want to get too far and take out the tin foil and put a tin foil hat on, but some food for thought. ³⁵I'm not sure how much stuff related to DeepSeek is true. ³⁶

DeepSeek's Origins and Open Source Philosophy

Dalton Anderson: So that being said, let's talk about DeepSeek's origins. So DeepSeek was created by, let me just make sure, Wing-Feng. ³⁷ And please let me know if I mispronounced that. But Ling created High Flyer and it is said that Ling was someone who embraced AI and wants to create AGI. ³⁸And the first step to that was using deep learning and machine learning applications within algorithmic trading. ³⁹And they did that starting in 2016. And then recently, July 2023, which is very recent, he founded DeepSeek. ⁴⁰And DeepSeek has a pretty quick product to market from all the research that's required to make these models to getting a product to market. ⁴¹

And I want to make sure that I emphasize the point of open source. ⁴²

Open source is the way to innovation. If you have an idea and you tell 10 different people about it, you write out your thesis, send it to them, ask for feedback, they can make their own ideas with your idea. ⁴³

Dalton Anderson: And it's the same thing as crowdsourcing innovation versus holding innovation within one entity. ⁴⁴You're allowing innovation to spread between entities instead of a single entity. ⁴⁵ So your development is much faster. DeepSeek was able to from creation July 2023 to late January, early February of 2025, release this model. ⁴⁶With a $6 million budget, I'm not sure that's true, but that's what they're saying. ⁴⁷ That wouldn't be possible without open source. So I want to make sure that's very clear in this episode that these open source models are critical to the development of new innovation and new applications of AI. ⁴⁸

A Different Approach To Hiring Talent

Dalton Anderson: So within DeepSeek, they have a different approach to recruitment. And when he was asked about recruiting from OpenAI or Google's deep research, his reply was, well, if I have short-term goals, that's what I'll do. ⁴⁹

And I'll get people to have the right experience right away, but I'm looking long-term and in the long-term approach, basic skills, passion, readiness to learn and general interest and attitude are more important than ready-made individuals. ⁵⁰

Dalton Anderson: And he also emphasized creativity as a skill that he looks for. ⁵¹And he said those things will build the vision that I have. ⁵² Having somebody ready-made isn't my concern. It's the people that have those skills. Those are the people I want. Let everyone else fight for the other individuals. ⁵³ And that might be a good approach. And it might be twofold. One, DeepSeek isn't a world-renowned AI company like OpenAI, Gemini, or Meta. ⁵⁴ So you can develop talent that you want. And from there, you can create the things that you want with the mission that you need to foster. ⁵⁵And it gives you a different opportunity, whereas if somebody's coming to you with all the skills made, they're not necessarily closed-minded, but they're not as open-minded as someone that is ready for the opportunity and just wants to make something that they're passionate about. ⁵⁶

How "Mixture of Experts" (MOE) Creates Efficiency

Dalton Anderson: So computational resources and training time. This is something that is pretty important. ⁵⁷Since they had not the high-tech best GPUs on the market that Meta or Google or OpenAI are using, they had to take a different approach and they had to be very careful about their usage of parameters. ⁵⁸They had to be careful about how things are being executed on the GPUs. ⁵⁹And one of the things that they did deploy was a Mixture of Experts. ⁶⁰Mixture of Experts is an approach that I would explain in a way that works like parallelism on a complex problem. ⁶¹Parallelism is something that was used in databases or processing and basically it takes a large complex problem and it breaks it up in different parts. ⁶²And from there it can reduce the resource load and allocate resources. ⁶³

MOE is a mixture of experts, so each expert works on their subject matter or whatever they're good at. ⁶⁴You might have a PDF expert that reads all PDFs. ⁶⁵ And then you might have audio parsers. And then you might have a summarizer. ^66666666 For example, an audio file is transcribed. The transcription is in a PDF. Then the PDF expert processes that PDF and then it's pushed over to the summarizer, and the summarizer summarizes the transcript. ⁶⁷Versus if you didn't have that applied, then you're running the whole model which is billions of parameters all at once to get the same thing. ⁶⁸This is running certain things at certain times versus running everything all at once. ⁶⁹So that saves you resources. ⁷⁰

And they're saying that in the paper, their architecture only activated 37 billion parameters out of 671 billion parameters for each token. ⁷¹And this reduces the computational resources required without sacrificing performance, which is good. ⁷² And MOEs aren't a very unique application. They're applied in other machine learning projects and there's reference of MOEs in Meta's paper and in Gemini's paper. ⁷³So it's not something that is out of the ordinary, but the level of performance gain that they've got from MOEs is what is the wow factor. ⁷⁴

The Billion-Dollar Question: What Did It Really Cost?

Dalton Anderson: When they trained R1, they're saying that it took them two months. ⁷⁵So it says that they took them 2,788,000 GPU hours and their data set was 14.8 trillion tokens, which took them less than two months. ⁷⁶And on their $5.6 million budget... all of that combined, it's a compelling story. ⁷⁷ But as far as the budget, I think the budget is inaccurate. I don't believe $6 million for that. ⁷⁸How are you getting that many GPUs? ⁷⁹

And so there was a separate research firm that provided a research report. SemiAnalysis said that DeepSeek's total server expenditures could be significantly higher than what they're stating, potentially 1.3 or even exceeding $1.5 billion. 80So it wasn't that they had this $6 million and they had some person after hours running their spare 15 GPUs that made DeepSeek. 81 But I think it's much more than that. But with that opportunity, you could reap financial benefit by just plummeting the market with all this news and having the market freak out. 82

Is NVIDIA Just a Shovel Seller in a Gold Rush?

Dalton Anderson: So that being said, people freaked out about what's the future of Nvidia and what's the future of AI in general. ⁸³ If you don't need that many GPUs, what happens? I don't think that usage of GPUs decreases if things become more efficient. ⁸⁴It just means that you could do more stuff with the resources you have, which thus increases utility of those resources and then thus increases consumption. ⁸⁵

So it's not all bad for Nvidia. What I do find that is bad, and I am an investor in Nvidia, I have Nvidia stock. One thing that I do disagree with is Nvidia is sold to the market as a shovel seller. 86868686Like this is the gold rush, we're selling shovels, buy some shovels, get your gold. 87I don't think that the level of expenditure that is required to train AI models is the same equivalent to a shovel. 88

There is no other option. It's either AI or die. ⁸⁹

Dalton Anderson: And so these companies are spending billions of dollars on AI. ⁹⁰But that's so much money, you got to bring something like that in-house. ⁹¹If you're spending like eight billion a year on something, then you've got to bring that in-house. ⁹²

The Rise of In-House AI Chips at Google, Meta, and Amazon

Dalton Anderson: So that's what companies are doing. Google has their own AI chip that they've had for a while and it's utilized by Claude AI. ⁹³And for Google to manufacture these TPUs, what I was able to find from an AI research article from The New York Times, which I'll link in my show notes, is that they were saying that for one chip, NVIDIA was requiring $15,000. ⁹⁴ And for Google, it was costing them $2,500 to $3,000. So you're saving 80%. ⁹⁵ If I went to my manager and I was like, hold on, I've got an idea here. I can save us 80%. What do you think? Sure, why not? ⁹⁶

So Meta's working on their own chip. ⁹⁷ Amazon's working on their own chip. So Meta, Google, Amazon, they're all working on their own chips, and Microsoft. ⁹⁸So where does that leave NVIDIA? ⁹⁹

NVIDIA's True Value: A Vertically Integrated Ecosystem

Dalton Anderson: Well, this is what I think NVIDIA's value proposition is. And that is being vertically integrated. ¹⁰⁰ So they have this whole ecosystem of robotics. They have NVIDIA Omniverse, which is like the metaverse, but for robotics. ¹⁰¹They have a physics model built on top of Omniverse that is able to simulate in the virtual environment an interaction or workspace with a robot. ¹⁰²And then you can run simulations within Omniverse and these other programs and train your robots virtually. ¹⁰³So then when you take them out of the virtual world and you bring them into your reality, they are informed and they know how to deal with different scenarios. ¹⁰⁴

NVIDIA, long story short, NVIDIA's value proposition is that they're vertically integrated, not that they sell shovels. ¹⁰⁵

Dalton Anderson: And the chips thing is defensible for some time, but eventually the people aren't going to pay billions of dollars for their shovels. ¹⁰⁶They're going to build their own. ¹⁰⁷ And that's what you're seeing at the moment. Microsoft and Meta made up a quarter of Nvidia sales in the past two quarters. ¹⁰⁸So 25% of their revenue in the last two quarters is going to two companies, and they're both building their own chips. ¹⁰⁹

AI Model Showdown: DeepSeek vs. Gemini Output

Dalton Anderson: The last thing that I wanted to show you is the output of DeepSeek. ^110110110110I asked it a prompt: "How do you use and deploy crypto and a stable coin mechanism for transactions to reduce your foreign currency exposure and risk? ¹¹¹ Can you please walk me through it? What are some great platforms to use? ¹¹² Something like Avalanche, right? And is it easier to use someone else's stable coin or should I make my own?" ¹¹³And "could you do that and is that easy? If it isn't easy, can you make it easy?" ¹¹⁴So that was my ramble to AI. ¹¹⁵

So DeepSeek's process, it goes a little bit over like how it's gonna think about the problem. It has a small snippet, a couple sentences, and then it maps out the problem. 116And it looks like it uses Markdown as its preferred output, which I think is very nice. 117

The output of DeepSeq, I think is one of the better outputs between all these models. ¹¹⁸

Dalton Anderson: It has a very nice format that I appreciate because it's easy to read. And you'll see when I share my screen on Gemini's output, Gemini's output provides more information, but the output is not as clean and feels cluttered when you compare it to DeepSeek's. 119 With Gemini's output, it shows its thinking. It does a really good breakdown of the thinking, but I don't necessarily know if the thinking is really thinking or is it just prompt engineering? 120I think that this thinking feature now evolves into prompt engineering. 121

So if you write something to AI, it will break down and try to understand what you're saying and then it'll go solve it. ¹²²The actual thought process from Gemini is very detailed. ¹²³Might be more information than what people want, but I appreciate it. ¹²⁴The only thing I have an issue with is that it doesn't feel as clean as DeepSeek's. ¹²⁵ If I look at DeepSeek's, the information, there's a lot of information, but it doesn't feel cluttered at any moment. It just flows. ¹²⁶Whereas Gemini, it feels everywhere. ¹²⁷It feels cluttered compared to DeepSeek. ¹²⁸

RESOURCES MENTIONED

DeepSeek
NVIDIA
OpenAI
Google Gemini
Meta
High Flyer
Claude AI
Amazon
Microsoft
XAI
Avalanche
SemiAnalysis
The New York Times
NVIDIA Omniverse

INDEX OF CONCEPTS

Dalton Anderson, DeepSeek, DeepSeek R1, NVIDIA, OpenAI, Google, Gemini, Meta, High Flyer, Ling Wing-Feng, AGI, open source, Mixture of Experts (MOE), parallelism, PTX programming, SemiAnalysis, insider trading, AI chips, GPUs, TPUs, Claude AI, Amazon, Microsoft, XAI, NVIDIA Omniverse, Jensen models, crypto, stablecoin, Avalanche, foreign currency exposure