The AI That Wasn't: Unmasking Reflection 70B's Hype

A new AI model promised to change the industry, but was it all a lie? We break down the Reflection 70B controversy and its impact on open-source AI. Listen to the full episode to learn more.

The AI That Wasn't: Unmasking Reflection 70B's Hype

TL;DR

Reflection 70B claimed to be the best open-source AI model, but it was just a wrapper for Anthropic's Sonnet. A cautionary tale of hype versus reality in the fast-moving world of AI. #VentureStep #AI #TechScandal

INTRODUCTION

In the fast-paced world of artificial intelligence, breakthroughs and controversies often go hand in hand. A recent story that shook the open-source community perfectly illustrates this: the rise and fall of Reflection 70B, a parameter model that promised to redefine industry standards. It generated immense excitement with claims of superior performance at a fraction of the size and cost of its competitors, positioning itself as the undisputed top open-source model.

The model was the creation of Max Schumer and his companies, HyperWrite AI and Glaive AI. They touted a proprietary "reflection prompt engineering" technique as the secret sauce behind their incredible results. However, the celebration was short-lived. As researchers and developers eagerly tried to replicate the benchmark scores, a wave of skepticism washed over the community. The numbers just weren't adding up, and the initial buzz quickly turned into a full-blown investigation.

This episode of VentureStep separates fact from fiction, exploring how the community unraveled the truth behind Reflection 70B. We dive into the misrepresentation, the lack of transparency, and what was ultimately discovered: the model was not a fine-tuned 70 billion parameter model, but a wrapper for Anthropic's much larger Sonnet 3.5. It’s a critical story about trust, ethics, and the future of open-source AI development.

KEY TAKEAWAYS

  • Reflection 70B was not a new model but a wrapper for Anthropic's Sonnet 3.5, misrepresenting its size and capabilities to generate hype.
  • The independent and collaborative nature of the AI research community was crucial in quickly debunking the false claims when they failed to replicate the published results.
  • Deceptive events like this damage public trust in AI, a field where skepticism is already high among the general population.
  • The primary motivation behind such a misrepresentation is often financial gain, such as securing a higher valuation during a funding round.
  • Attempting to deceive a community of the world's smartest and most curious people is a high-risk strategy that is almost certain to fail.

FULL CONVERSATION

Dalton: Welcome to VentureStep Podcasts where we discuss entrepreneurship, industry trends, and the occasional book review. In a fast-paced world of AI, there's bound to be some breakthroughs and controversies, and they often go hand in hand. Today, we're talking about a story that shook the open source community: Reflection, a 7 Billion Parameter Model. Today we're gonna discuss what is Reflection 70 billion and why did it generate so much excitement? What's the controversy around the model and the company? And what are some of the acquisitions that are being made surrounding the Reflection 70 billion? We're trying to separate fact from fiction and discuss the lessons learned for the future of open source AI.

Dalton: Before we dive in, I'm your host Dalton Anderson. My background is a bit of a mix of programming, data science, and insurance. You can find me building my side business, running, or lost in a good book. You can listen to the podcasts in both audio or video format on YouTube. If audio is more your thing, you can find the podcasts on Spotify, Apple podcasts, YouTube or wherever else you get your podcasts.

What Is Fraud?

Dalton: Okay, first before we dive in, I'm going to give you a quick overview about the situation. And I think the first thing that we should do is define what fraud is. So I went on Google and I Googled fraud. Fraud is a noun, "the wrongful or criminal deception intended to result in financial or personal gain." He was convicted of fraud, fraudulence, sharp practice, cheating, swindling. These are some of the words related to fraud and a person or thing intended to deceive others typically by unjustifiably claiming or being credited with accomplishments or qualities.

Dalton: That being said, I am not accusing Schumer, which is the person who's the leader of the company, or the people at Reflection 70 Billion of fraud. I am simply defining what fraud is and you as the listener will be able to make your own decision on what you deem is fraud or not fraud. I think that that is pretty fair.

The Key Players and The Hype

Dalton: The company who made Reflection 70 billion is led by this guy named Max Schumer, and it is connected to another company, Glaive AI, which is a company that sells data. This Reflection 70 billion was made by Hyperwriter AI. Glaive AI invested or created HyperWrite. HyperWrite is this AI-powered tool to help with writing and emails and things like that. You can craft these customized personas and such. They're not seen as a leader in this emerging industry, but they're known. They're not a household name. Hyperwrite is a little bit more niche, but it was respected in its own regard. So now you have the background. There are two companies involved, this guy named Max Schumer, and then there is HyperWrite and Glaive AI.

Dalton: It generated a lot of excitement because it was supposedly the best open source model, not only for its size, but in general. It offered superior performance, reduced costs, all sorts of crazy numbers. And it generated a lot of buzz and excitement. People were clamoring to get this into their systems and cloud providers were trying to get this into the cloud. People were downloading the model. It had 100,000 pulls and downloads on people's computers to try to match the results. There was this big, big excitement about this model from the tech community.

Their claim to the superior performance and how they went about it was they said that they had a breakthrough with their reflection prompt engineering, which is some custom proprietary piece that they build on top of these foundational models.

Unraveling the Truth

Dalton: The model was released September 5th. People were so excited about it because of all the news and all the hype. From September 6th to the 9th, independent evaluations failed to replicate the results. And that's when accusations started occurring. If you publish your experiment and people can't get the same results as you, maybe one or two people, that's fine, but if you're talking about 20 researchers can't get the same result, it seems sketchy, right? So then people are like, what's going on here? This is not really mapping up with what you're saying.

Dalton: On September 10th, Schumer does a half apology, like, "yeah, I was super enthusiastic about it all and I may have gotten too excited." And then in the following weeks, further investigation confirms that they were using a wrapper of Sonnet, which is 3.5, a model from Anthropic. The controversy has continued ever since then. It's really difficult for them to back out of that one because it was confirmed that it was Sonnet 3.5. They said that they were using Meta's 70 billion parameter model, and they did their own little fine tuning and tweaking to improve it, and they were changing the industry. And then people realized you're using Anthropic's model.

Misrepresentation and Lack of Transparency

Dalton: So one of the key issues is misrepresentation of the model performance. The issue was, they were using one of the best in-class models that has over 400 billion parameters and then saying it's a 70 billion parameter model. To get that kind of performance and speed in that size of a model would have been changing the industry for sure. And then there's just an overall lack of transparency. There is potential evidence for fraud. The model didn't seem to really exist and they went out in public and had lots of interviews, like VentureBeat interviewed them and a couple other large tech news companies. They had to publicly post apologies that they trusted these people at their word and couldn't individually verify the facts.

Dalton: There's a lot of people that had to apologize. There are companies that were super upset about using GPUs on this product to train or put up a cloud instance for customers when the whole thing was fraudulent. There are quite a few angry businesses that spent money and labor on putting up these models into the cloud to be able to offer it to their customers, and the model doesn't really seem to be what it was said it was.

The Inevitable Fall

Dalton: As I said, it was said to be the world's top open source model. It claimed superior performance and benchmarks, things that have never been done before in that small of a model. They followed up with all of this interest with hype and media showcases and they went on a whole road show just hyping up this model. And they were rudely awakened almost a couple days later when people started looking into it. I think there was a lot of initial buzz, but there was also skepticism too.

It just didn't seem, seemed so out of ordinary.

Dalton: These things aren't really a surprise. To put it in perspective, think about X. When they post Grok 3, and they're training it on 100,000 H100s, the new Grok is gonna be really good. Llama 1 was okay. Llama 2 was a lot better. Llama 3, phenomenal. Same thing with Google Gemini. These companies are spending billions of dollars on these things. It's not like a fly-by-night company just pops out of nowhere and has this phenomenal industry-shaking model with approaches that have never been thought about. It just doesn't really happen when you're putting that much money into the model.

It's very, very disingenuous to take somebody else's stuff and then just say it's yours. It's just not a good look for sure, especially when your big selling point is the thing that you supposedly built and it's not really yours.

Don't Lie to the Smartest People in the World

Dalton: I would say if you are going to commit fraud, I am not advocating for this, but if you are going to commit fraud, I would not try to commit fraud against some of the smartest people in the world. I don't think that one's going to play out that well. Not only are they one of the smartest people in the world, they're also one of the most curious people in the world. So they're going to be super curious about the model, the results, they're going to download it on their computer. They're going to set up their own instance. They're going to comb through it for hours and hours and hours.

...what really sparked flames... was when people like actual employees from OpenAI were looking into it and they questioned it. And then people from Meta and the people from Google, like actual AI researchers, they're like, this doesn't map up.

Dalton: Then it really started to unravel. People listen to those people. Some of these AI engineers at these companies have like 100 plus thousand followers on X. They have a big following and a lot of those people are highly technical and centralized in that segment of the industry. It was pretty quick that people really questioned Max Schumer's push that the Reflection 70 billion parameter model was that much better than everyone else's.

The Damage to Public Trust in AI

Dalton: I think there should be a little bit more transparency and ethical practices for sure, especially since these things are eye-opening. A lot of people spent money on these models to get these put together. There's labor, there's time. And if these things consistently happen, it doesn't look good for AI. People already are very skeptical of AI. Is AI really improving my life? Is AI really worth all the hype? Is AI dangerous? They have that skepticism in the back of their mind, and then they see something like this, and they're like, "yeah, of course, that's AI in a nutshell, just scammers and nonsense."

Dalton: The vast majority of Americans feel negatively about artificial intelligence and how it'll impact their future. 54% of Americans feel that they should be cautious towards AI. 52% of Americans are more concerned than they are excited about AI, compared to just 10% who said that they're more excited than concerned. A lot of people just don't feel that good about AI, and these kind of events don't help. My concern is I don't want this to turn into one of those crypto things, where there's so much crypto nonsense and there was so much hype and potential, and then with all these scams and other foolishness involved, people just turned off from it.

The people that are in it for the financial gain are not going to be around long-term, I don't think.

The Aftermath and Lessons Learned

Dalton: The result of the controversy is still playing out. I don't know if they have a clear action plan on what's gonna happen to Glaive AI and HyperWrite AI or with Max Schumer. Those things are still being investigated. Max Schumer produced a couple of apologies, but he never really addressed the issue. He just said, "hey, we're overly excited." With the skepticism and reaction of the AI community, I think that Reflection 70 billion parameter model, HyperWrite AI, and Glaive AI are toast. I don't know how they can come back from something like that. That's a really big breach of trust.

Dalton: Maybe this issue changes how people approach these new models. Maybe they have to have independent verification before they make these claims because since everything is moving so fast and a lot of the community is scientific, there's a lot of trust on the person making the claims. Maybe there needs to be independent validation from a different group to validate the claims before they're made, so there isn't this huge hype train and potential financial gain. The main reason why you would do something like that is if you needed to raise more funding.

Dalton: To recap, Reflection 70 billion was led by HyperWrite AI and Glaive AI and a guy named Max Schumer. There was a showcase of it being the world's best open source model, but it turns out they were using Anthropic's 400 billion parameter model. This situation provides a good opportunity and allows users to become more skeptical of these announcements. Things need to be proven before they are announced so widely, especially if you are unknown-ish and not one of the key contributors for these foundational models. There's always going to be people who are not excited about technology to better the world. They're excited about the potential financial gain. And then there are the genuine people who are excited about the technology because it's going to better the world. Those are the people that will make the best product.

RESOURCES MENTIONED

  • Companies: Glaive AI, HyperWrite AI, Anthropic, OpenAI, Meta, Google, DeepMind
  • AI Models: Reflection 70B, Sonnet 3.5, Meta 70B model, Llama (1, 2, 3), Grok (1, 2, 3), Google Gemini, Bard
  • News Organizations: VentureBeat, Hacker News, Tom's Guide
  • Hardware: Meta Quest 2

INDEX OF CONCEPTS

Anthropic, Apple Podcasts, Bard, Dalton Anderson, DeepMind, Glaive AI, Google, Google Gemini, Grok, Hacker News, HyperWrite AI, Llama, Max Schumer, Meta, Meta AI, Meta Quest 2, OpenAI, prompt engineering, Reflection 70B, Sonnet 3.5, Spotify, Tom's Guide, VentureBeat, YouTube