Venture Step

Apple's AI Strategy: The Flawed 'Illusion of Thinking'?

Apple's latest AI paper suggests fundamental limits in AI, but is it just a cop-out for their own lack of innovation? We break down the argument. Listen to the full episode to learn more.

Dalton Anderson

17 Jun 2025 • 11 min read

TL;DR

Apple's recent AI paper critiques large reasoning models, but it feels like a justification for their own slow progress, ignoring the industry's shift to modular, cost-effective AI toolchains. #VentureStep #Apple #AI

INTRODUCTION

For decades, Apple has been at the forefront of innovation, but as the AI race accelerates, their roadmap has been suspiciously quiet. A recently released AI paper from the tech giant, titled "The Illusion of Thinking," attempts to shed light on the fundamental limitations of today's Large Reasoning Models (LRMs), pointing out their struggles with complex problems. The paper suggests these models have inherent scaling limits and fail to develop truly generalizable reasoning. ¹¹

On this episode of Venture Step, host Dalton Anderson dives deep into this paper, offering a critical analysis of its findings. With a background in data science, Dalton challenges the paper's core premise, arguing it's not a groundbreaking discovery but rather a defensive "cop-out" for Apple's own lack of progress in the AI space. 222 He posits that the paper focuses on a flawed, monolithic approach to AI that the rest of the industry has already moved beyond.

Dalton breaks down why Apple's perspective seems disconnected from reality, highlighting the industry's rapid advancements in efficiency, the dramatic drop in token costs, and the strategic shift toward specialized, multimodal AI "toolchains." ³³³The conversation explores whether Apple's focus on things like a widely criticized UI revamp instead of a bold AI vision is a sign of a deeper, more fundamental misunderstanding of what the market and its users truly want. ⁴⁴⁴

KEY TAKEAWAYS

Apple's paper critiques AI's reasoning limits but uses a flawed "monolithic" framework, an approach the rest of the industry has already abandoned for more effective solutions. ⁵⁵⁵
The argument completely ignores crucial industry trends like plummeting token costs and soaring inference speeds, which make AI more powerful and scalable than ever. ⁶⁶⁶⁶
True AI innovation lies in building specialized, multimodal "toolchains" and leveraging a "mixture of experts"—not in creating a single, all-powerful "Swiss Army knife" model. ⁷⁷⁷
Apple's recent focus on a poorly received UI revamp over a clear AI vision suggests a fundamental disconnect with what customers and the market are demanding right now. ⁸⁸⁸⁸⁸

FULL CONVERSATION

Dalton: Welcome to VentureStep podcast, where we discuss entrepreneurship, trends, and the occasional book review. ⁹ Everyone's talking about Apple's new shiny UI, but their AI roadmap is suspiciously quiet. Not funny, haha, funny, weird. ¹⁰Today we're discussing the recent AI paper that Apple came out with as of June titled "The Illusion of Thinking." ¹¹ It might be a little bit more smoke than fire. It's overall an odd paper, so I wanted to review that and then talk about Apple's new planned release of their liquid UI, which has been widely criticized. ¹²Overall I love Apple, but have been disappointed with them as of late, unfortunately. ¹³

Reading the Conclusion of Apple's AI Paper

Dalton: So here it is, we're going to dive right into it. ¹⁴ My name is Dalton Anderson. I'm your host. I've got a mix of background. I work in insurance, I've done data science, and in my free time, I like to run, build my side business, or read a good book. ¹⁵Okay, so let's transition over to the conclusion from the paper, which I think is the most meaningful. ^16161616

Dalton: It says, "In this paper, we systematically examine frontier large reasoning models, LRMs, through the lens of problem complexity using controllable puzzle environments. Our findings reveal fundamental limitations in current models. ¹⁷Despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds." ¹⁸

Dalton: "We identified three distinct reasoning regimes. Standard LLMs outperform LRMs at low complexity. ¹⁹ LRMs excel at moderate complexity and both collapse at high complexity, particularly concerning as the counterintuitive reduction in reasoning effort as models approach critical complexity, suggesting an inherent compute scaling limit in LRMs. Our detailed analysis of reasoning traces further expose complexity dependent reasoning patterns from inefficient overthinking on simpler problems to complete failure on complex ones. ²⁰These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning." ²¹

Dalton: "Finally, we have presented some surprising results in LRMs that have led to several open questions for future work. Most notably, we have observed their limitations in performing exact computation. ²²For example, when we provided the solution algorithm for the Tower of Hanoi to the models, their performance on the puzzle did not improve. ²³ Moreover, investigating the first failure move of the models revealed surprising behaviors. They could form up to a hundred correct moves in the Tower of Hanoi, but failed to provide more than five correct moves in the river crossing puzzle. We believe our results can pave the way for future investigations into reasoning capabilities of these systems." ²⁴

Dalton: So that was their conclusion. It was a 30-page paper, but of those 30 pages, it's only 11 of substance and then there are about 19 pages of just references and acknowledgments. 25

The Flaw in Monolithic Thinking: Toolchains vs. Monoliths

Dalton: In the paper, they discuss what the title is, "The Illusion of Thinking," and they're talking about the generalization of reasoning. ²⁶I think they're fundamentally flawed in their thought process. ²⁷And these people are way smarter than I am, but hear me out here. ²⁸

Dalton: If you think about an analogy of someone trying to garden with a bulldozer and then they're like, bulldozers aren't that good at gardening; I really need a shovel or a rake. 29It's kind of the same gist where Apple is alluding to, hey, LRMs, which are large reasoning models, aren't very good at super complex problems. 30303030 Okay, I see that thought. And I would take it to say, they're harping on LRMs not being able to solve simple problems as fast as the simple models are. 31

I would think about it as tool chains versus monoliths. ³² A monolith would just be this massive model that's good at everything. But you've seen that in architectures throughout these other technologies where that doesn't work. ³³

Dalton: When have you seen one person be good at all sports? You don't see those people very often. 34 When have you seen somebody at work that's good at sales and good at product tech? Those people are founders, those people are CEOs. 35They don't happen very often because those are difficult things to manage and they're rare and they're normally inefficient to house all that knowledge with one person. 36

Dalton: The same thing is true in this regard where you're trying to generalize a model and it's supposed to be good at simple, medium, and large complexity problems. ³⁷And they're like, well, it's not better than the simple model at simple things, it exceeds expectations at medium complexity, but for large complexity, it seems like it just doesn't know what it's doing. ³⁸I would say that just at a high level, we don't have a model yet ready for very complex problems. ³⁹And I think that people would agree with that. ⁴⁰

Is This Paper Just a "Cop-Out" for Apple's Lack of Innovation?

Dalton: I just don't think what they're alluding to makes much sense. And I think it's counterproductive. ⁴¹

In my opinion, I think it's a cop out for their lack of innovation in this space and saying, there's fundamental issues with LRMs and LLMs. ⁴²We see this and we saw this and so that's why we're not too deep in AI right now. ⁴³

Dalton: Okay, sure. That's how it seems to me. 44 I see this company that's been on the forefront of innovation for years, my whole life. Apple has been that guy. 45And when you're outperforming everybody and then somebody new comes around and starts doing good things, you point the finger and say, "Oh, well, you know that's not scalable," or "that technology, the foundations aren't right." 46It just seems like a cop out. 47It doesn't seem legitimate. 48

Dalton: I would feel better about it if Google or Nvidia or OpenAI came out with this paper and said, hey, we did more additional research on this problem that we know is already a problem. ⁴⁹

Like what Apple is talking about and discussing. This isn't a groundbreaking theory. 50

Dalton: People know. I'm not an AI researcher, I'm just a podcaster. These people are way smarter than I am, and I even know it. 51And many people on the internet know that LLMs and LRMs have fundamental issues. 52But each year it's getting less expensive, more efficient. 53The cost per token is decreasing year in, year out. 54

Ignoring the Reality: Plummeting Costs and Rising AI Efficiency

Dalton: They're talking about how there's a lack of generalized reasoning. ⁵⁵And then they fail to bring up the decline in token costs, the rising inference speed, and the growing developer access to these AI tools. ⁵⁶It's still a groundbreaking technology where it hasn't been around that long. ⁵⁷

Dalton: This is no longer an innovator's technology. It's slowly transitioning from innovator to general public and people are using it all the time. 58And for that to happen, you've got to greatly reduce the costs. 59And I think that's true where the cost per token has substantially decreased. 60The inference speed has sped up by 280%. 61

Dalton: The cost per token at OpenAI decreased from $36 per million to $4 a million. So a 79% decrease in one year. 62Stanford's AI index reports a 280x drop in ChatGPT-level costs since 2022. And A16Z is referencing and stating a 10x drop year over year. 63

Dalton: You see these things and you're like, wow, so there's all these incredible gains on efficiency. ⁶⁴A lot of the things that people were saying years ago was, "well, it's too expensive and it's not scalable." ⁶⁵Life is a lot better when you're optimistic and the optimists build stuff and those are the people that make the money. ⁶⁶

Dalton: Everything dropped. The cost per token dropped, the inference speed increased. 67The time it took to inference the tokens and output data to the user has substantially dropped. 68All of those things just don't add up with what Apple is saying, where they're thinking monolithic, where everyone else is thinking purpose-built models that are not only purpose-built, but they're also multimodal, where one model doesn't serve all tasks. 69

Dalton: They use a mixture of experts, as I talked about in previous research papers. Meta and OpenAI specifically emphasize the efficiency gains of mixture of experts. 70

The people that are at the forefront of innovation, they're not even going that route because they already tried it and it doesn't work. ⁷¹

From AI Pioneer to Follower: What Happened to Siri?

Dalton: The whole thing is quite confusing and I just don't understand Apple. ⁷² I love Apple. They're awesome. ⁷³They make great products and they've consistently innovated for such a long time. ⁷⁴But lately their releases with the iPhone, their lack of integration with these AI functions... they were the first people in the AI space. ⁷⁵

Dalton: Steve Jobs was amazed by Siri back in the day when Siri could give you appointments and do stuff for you. ⁷⁶ Steve Jobs thought it was groundbreaking. He was like, "This is insane. We need this. This is going to be a differentiator for us." ⁷⁷So Siri was the first AI system. ⁷⁸It was advanced, and there were others that existed, but it was nowhere near as good as Siri. ⁷⁹

Dalton: Then Alexa came around and then Google. But Siri was at the forefront of all of this and they've got the data, the users, they've got all the inferences. 80 They have everything, right? They design their own chips in-house. 81They have a ton of money in cash. 82 So I'm just scratching my head here. What are you doing? 83

Why Healthy Competition Is Crucial for Tech Advancement

Dalton: I want all the companies to push each other. ⁸⁴If no one pushed Google to start releasing AI stuff, I bet they wouldn't have done it yet. ⁸⁵Before OpenAI started pushing all this, stockholders were getting concerned with Google's positioning because for the longest time, they were considered the AI company. ⁸⁶They were experts. ⁸⁷And then OpenAI comes along and starts releasing all these models and Google had no counter. ⁸⁸

Dalton: Bard was horrific. Their live demo cost billions in shareholder value on the stock price drop simply because Google got caught on its back foot and that woke Google up. 89Google has just shipped an amazing amount of stuff in 24 months. 90DeepMind, Google's AI research team, housed like 85 plus percent of the AI researcher workforce. 91They had an absolute domination on AI talent. 92

Google would have never done that if it wasn't OpenAI. And that's why I'm emphasizing Apple. You've got to push. You've got to push. ⁹³All your friends are out there doing great things and you need to join them. ⁹⁴You're part of that group and you've got to push them as well. ⁹⁵

Dalton: All I want is to have the best product ever. And to do that, each company needs to push each other to the limit. For years and years and never stop. 96That's capitalism, baby. 97

A Disconnected Vision: The Problem with the New 'Liquid UI'

Dalton: One recent thing that they've been focusing on is shipping is their liquid glass UI, which I don't even know how to say it. ⁹⁸It didn't look good. ⁹⁹It didn't get very much praise. ¹⁰⁰ And that was their hype for their WWDC. They had no major AI vision. The stock dipped. ¹⁰¹

Dalton: The biggest emphasis was like a UI revamp, which looked like Windows Vista back in the day. ¹⁰² And you can't even see the UI. On a light background, all your apps and stuff are translucent. ¹⁰³So if you pull down your notification center, your apps still bleed through and you can't even see what's going on. ¹⁰⁴People were sending clips of the beta, and hopefully they fix it before they release it because people are not going to like that at all. ¹⁰⁵

...that just shows a state of the disconnect of what they're deeming as important versus what is actually important. And that's a fundamental issue where leadership misunderstands what's critical at the time and then ships something like completely irrelevant, like a UI revamp. ¹⁰⁶

Dalton: No one's anticipating a UI revamp. ¹⁰⁷No one's really talking about it. ¹⁰⁸ People care about AI right now. The market cares about AI. Users care about it. ¹⁰⁹Some of my friends love simple features on other phones, like the call blocking for scammers or transcribing my calls automatically. ^110110110110Apple doesn't have much of any of that stuff. ¹¹¹

What's Next for Apple in the AI Race?

Dalton: It's not that I don't like Apple. I'm just disappointed. I know Apple's meant for so much more than what they're doing. ¹¹² This is an opportunity for others, right? While they stall, others can sprint. ¹¹³ So the longer that Apple stalls, maybe they do an acquisition. I have no idea. ¹¹⁴But that's going to be really expensive. ¹¹⁵There's only a couple of players, and of the ones that could be acquired, it's really like Anthropic or the French company with DeepSeek. ¹¹⁶

Dalton: In a general sense, the takeaways is that modular AI wins. ¹¹⁷

Tool sets versus monoliths, building systems, not Swiss Army knives, right? ¹¹⁸

Dalton: Cost curves are substantially decreasing and that allows for experimentation. ¹¹⁹And Apple is showing a disconnected vision on what people want of them and what they are willing to commit to or have vision for. ¹²⁰

Dalton: But hey, we all learn faster or slower than others, so give them a chance, right? ¹²¹ Of course, I appreciate everybody listening to this episode. If you found it enjoyable, please like and subscribe. ¹²²Thank you for listening and I hope you listen in next week. ¹²³

RESOURCES MENTIONED

Apple's AI Paper: "The Illusion of Thinking"
Tower of Hanoi (puzzle)
River crossing puzzle
Stanford's AI Index
A16Z (Andreessen Horowitz)
Gemma (AI Model)
Meta
Windows Vista

INDEX OF CONCEPTS

Dalton Anderson, Apple, The Illusion of Thinking, Large Reasoning Models (LRMs), Large Language Models (LLMs), Tower of Hanoi, river crossing puzzle, toolchains, monoliths, Barry Bonds, Google, Nvidia, OpenAI, Amazon, Microsoft, ChatGPT 3.5, Stanford's AI Index, A16Z, Gemma, Mini, mixture of experts, Meta, M1 Mac, M2 Mac, MacBook Pro, M4, Steve Jobs, Siri, Alexa, Gemini, Bard, XAI, DeepMind, WWDC, Windows Vista, Pixel, Alexa Plus, Anthropic, DeepSeek