AlphaFold 3 & GPT-4o: AI's Leap in Science & Usability
Explore how AlphaFold 3 will revolutionize drug discovery and how GPT-4o's multimodal features will change how we interact with AI. Listen to the full episode to learn more.
TL;DR
AlphaFold 3 surpasses physics-based models, promising to slash drug discovery timelines, while GPT-4o's real-time voice & vision integration makes AI a seamless daily assistant. #VentureStep #AI #Tech
INTRODUCTION
The relentless pace of AI innovation continues to reshape entire industries, and two recent announcements stand as monumental pillars of this transformation. In this episode, we explore groundbreaking developments from the two titans of the AI world: Google's DeepMind and OpenAI. These aren't just incremental updates; they represent fundamental shifts in scientific research and human-computer interaction.
First, we dive into AlphaFold 3, the latest iteration of DeepMind's protein structure prediction model. Host Dalton breaks down how this new version is not just an improvement but a paradigm shift, becoming the first AI model to outperform traditional, physics-based methods in predicting molecular interactions. This breakthrough has the potential to accelerate drug discovery, reduce research costs by billions, and unlock new frontiers in biotechnology.
Then, the conversation pivots to OpenAI’s stunning announcement of GPT-4o. This new flagship model moves beyond text, embracing a truly multimodal experience with real-time voice, vision, and screen-sharing capabilities. We explore how this transforms ChatGPT from a tool into a dynamic, conversational partner, capable of everything from live translation to collaboratively solving math problems, heralding a future where AI is seamlessly integrated into our daily lives.
KEY TAKEAWAYS
- AlphaFold 3 is the first AI model to outperform traditional physics-based models in predicting molecular interactions, marking a significant milestone for AI in scientific research.
- Google is commercializing AlphaFold 3 through its subsidiary, Isomorphic Labs, signaling the model's immense value and potential to generate billions by transforming the pharmaceutical industry.
- AlphaFold 3 uses a new diffusion-based architecture, which is faster and more accurate than the previous transformer-based approach, allowing it to handle more complex biological systems.
- GPT-4o introduces real-time, human-like voice and vision capabilities, allowing users to have fluid conversations and get assistance with visual tasks through their device's camera.
- OpenAI's strategy with GPT-4o points to a future of AI as a deeply integrated assistant on personal devices, designed to navigate the human world rather than operating as a separate, autonomous agent.
FULL CONVERSATION
Dalton: Welcome to Venture Step Podcasts where we discuss entrepreneurship, industry trends, and the occasional book review. Today we'll be discussing AlphaFold 3, a protein structure prediction model that will help accelerate drug discovery and assist in scientific breakthroughs. Then we'll be discussing OpenAI's ChatGPT-4o, a recent announcement as of today, which will be a multimodal AI model that will allow the user to interact with ChatGPT with vision, voice, sharing your screen like a Teams meeting, and also the normal text capabilities.
Dalton: Before we dive in, I'm Dalton. I've got a bit of a background in programming, data science, and insurance. Offline, you can find me running, building my side business, or lost in a good book. You can listen to the podcast in video or audio format on YouTube. If audio is more your thing, you can find the podcasts on Apple Podcasts, Spotify, or wherever else you get your podcasts.
The Breakthrough of AlphaFold 3
Dalton: Today we'll be discussing AlphaFold 3 and GPT-4o. AlphaFold 3 is an iteration of AlphaFold 2. AlphaFold 2 was submitted in 2021 as a research paper.
It was a notable paper to say the least with over 15,000 citations, which if you're not familiar with research papers, over a couple hundred is a lot.
Dalton: So 15,000 is huge. It was a very popular paper and had a breakthrough in the approach of doing drug research or DNA research. The kind of drug interactions or drug testing was they're using machine learning instead of a physics-based model. At the time, physics-based models were the standard and AlphaFold 2 kind of questioned the standard with their results. Their results were good but hadn't outperformed a physics-based model. AlphaFold 2 was good at many different areas of this research. The physics-based models were good at one thing, like, okay, this lipid interacts with X things, and this is how this is done. Whereas AlphaFold 2 could do lipids, it could do proteins, it could do DNA structures. So it was able to do many things, but just not as good as a physics-based model.
With AlphaFold 3 that has changed. And so AlphaFold 3 becomes the first model that has beat a physics-based model, which is huge.
Dalton: With this whole AI thing going on, this is something else. This is cool. I think it's pretty cool because the information is available for public use, but not necessarily for commercial use. So it's open source. You can check it out and if you're curious and you want to do your own studies, you can. Where you get that knowledge, I don't know, but it's there for you. AlphaFold 3 has significant improvements from AlphaFold 2. From AlphaFold 2 to AlphaFold 3, many of the improvements were 50% or above in all areas. And now it can interact or predict interactions with lipids.
From Transformers to Diffusion Architecture
Dalton: It has a new architecture, which I thought was interesting. They switched to a diffusion-based architecture, which is different from their transformer-based architecture approach. In the previous model, it was using kind of the, it would map out the points. The way that it was explained how the previous model worked was, think about a social network and you kind of just map out everyone who's friends with everyone. And then you have this huge web diagram. And from that huge web diagram of all your friends and your friends of friends all linked together, then you would create your 3D structure graph.
Dalton: Now it uses diffusion. So it kind of slowly iterates upon itself like a neural network and then progressively gets better with each round of the layers and then generates this 3D structure using pairs instead of mapping out all the interactions, it just pairs them up. And this approach is supposed to be one, more accurate, 1and two, faster. So with it being more accurate and faster, it's supposed to be able to handle more complex interactions. With that comes increased scalability and faster processing of the structures. Apparently it can predict dynamics that would normally take weeks or even months to do testing on in a couple of hours.
The Commercial Strategy Behind Isomorphic Labs
Dalton: The only concern that we have is it's owned solely by Isomorphic Labs. So Google is putting in a lot of value on this product. And so they spun up a company called Isomorphic Labs, which has sole ownership of AlphaFold 3. Google is known and one of the key advocates, including Meta, for open-sourcing their algorithms, their products. A lot of their stuff is open source. And so when they don't open source something, it is because they find high value in this product. And so they're going to go commercial with this. And if it's that valuable, then I think it's worth upwards of billions of dollars.
I understand that they have kind of an internal dialogue at the company with the product managers and developers that they kill off because a lot of people were getting upset with Google and they said like, listen, if a product isn't worth, if we can't predict that it'll be worth over 10 billion, we don't necessarily have interest.
Dalton: It might be a great startup company or some other company, but it's not a company for Google. And so Google works on these big problems and they spread out their resources and kind of have these bets. And of these bets that hit like AlphaFold, there are hundreds that don't hit at all. But AlphaFold is gonna, they're swinging for the fences and they're hitting it big time, especially with AlphaFold 2 and then AlphaFold 3. These things will help accelerate new drug discoveries.
Dalton: They'll give drug companies—and I don't know if Google starts making drug companies or they just start working on licensing AlphaFold 3 or if they plan on maybe having some partnerships with leading drug companies. I don't know what they're going to do. They haven't announced any plans yet. This product will give accelerated timelines. It will improve drug safety. It will allow drug discovery. We'll have an enhanced understanding of biological systems. It shifts the dynamic from physics-based models being the leading industry standard for drug discovery and testing to now machine learning models.
Dalton: It positions Google and DeepMind very favorably in becoming the leadership in AI for drug discovery. So Google has solidified itself as the leader in drug discovery using AI. They have very good commercialization opportunities with Isomorphic Labs. They can license it to pharmaceutical companies. They could do it for research institutions, biotechs. And then they could do strategic partnerships. They could partner with key pharmaceutical companies to further drug discovery and development. There's a lot of different routes that they could take.
But from the paper and from other people were saying, they're predicting, and this is just predictions, this is not solid information, but they say like, reduce drug discovery costs by 50-plus percent, accelerating drug development timelines by two to five years.
Dalton: And so I think that that would translate into saving billions of dollars consistently for these large companies.
A Live Demo of the AlphaFold Server
Dalton: Let's get into it. I'm going to share my screen. I'm all caught up in this AlphaFold. AlphaFold this, AlphaFold that. AlphaFold Server. Let's go over here. Sharing my screen now. Okay. So I already have a pre-edited or already sent in the DNA sequence of this. We're going to preview our job. We're going to submit our job here. And so this is going to run in the background. We get on this website, we're in beta right now. I just ran a job for everyone. So I've got nine jobs remaining and while that job's running... I think it already ran. Open results. Wow. That was really fast. This one's less complex than the other one. That's why.
Dalton: Okay, so whatever this is, I asked the internet for a DNA example, and this is what I have. So if you guys are listening to the podcast, we're in the live demo section. I went on AlphaFold Server. AlphaFold Server allows you to input these sequences. And so I put in the DNA sequence for this part of DNA and then it generates this 3D model that you can scroll around on and zoom in, zoom out. And it tells you the confidence of the generation. Right now I've got confidence on two pieces and then I have low confidence on the majority of it. But this one was really small. So that's why it was quick. But this other job I have, the protein RNA ion, is really cool. This thing is crazy. It's something else. There's a lot going on on this and apparently you're supposed to be able to tell what is what. I have no idea. I just think it looks cool.
Introducing GPT-4o: A New Era of Interaction
Dalton: Okay, so now we're going to transition over to ChatGPT. So ChatGPT-4o is this new model that was announced today in a live demo. And I think it was kind of a precursor of an interview that Sam Altman had with the All-In podcast group, which I think was on Friday of last week or Saturday. It was recent. And they were asking him, okay, when is GPT-5 coming out? And he was like, well, honestly, I think we need to reevaluate how we're doing these models. I think that instead of having these large models, we should just do an iterative approach to these releases. So it's not GPT-4, GPT-5, GPT-6. It is ChatGPT. And that's our model and we slowly upgrade it.
The Vision for an Integrated AI
Dalton: And so then he was asked about AI interaction, like when AI gets super advanced, how would you want it to be? And there is kind of this independent AI agent approach where the agent would just do whatever you want and then go on its own and have no visibility of what it's doing. Maybe it interacts with companies with APIs or orders stuff on its own. And then there's also this other approach that Sam was talking about where...
the AI is integrated with you on your phone or your computer and can see what you see and can navigate the human world because the world is built for humans and it makes more sense for you to be able to see what the AI is doing.
Dalton: It makes the user more comfortable, more interactivity. And there's a lot of information that's lost if you can't visualize it. And the example that Sam gave was, okay, what if you asked ChatGPT to order you an Uber and then it does it via API. How do you know how far away the driver is? What are the options for pricing? What cars should I get? Where is the driver on the map when you order the Uber? There's all this information that would be a pain to send back and forth when the user can just look at their phone and see everything. So instead of building an additional world for this AI to interact within, Sam was suggesting, okay, just make the AI be able to live and interact in the human world versus building a separate structure for them to interact.
Real-World Applications and Demos
Dalton: So I think that makes sense and that kind of follows what was demoed today with GPT-4o. It is an app enhancement where you can talk to it. You can do things like live translation. You can talk about your feelings or, in the demo, you could flip your camera and you could show in real-time what you're doing. And they solved a linear math equation together and ChatGPT helped the user through solving the equation step by step. I think it was something like 3X plus one equals four. It showed the AI, "Hey, I need help solving this equation. How can I go about doing that?" And then it would say, "Have you thought about removing the constants and putting them all on one side?"
Being able to do that on the fly is huge, huge, it's crazy.
Dalton: And then they also did something where they reversed the camera and put it on his face and they said, "Hey, can you tell me my emotions today?" The guy's smiling. I mean, it's pretty easy, but it has to know that it's a human face, know that when your eyes are kind of closed and you've got a big smile on your face that you're really happy and do that real time. And it did make some mistakes. At one point it was trying to say, "Wow, you look great," and it was like a piece of wood that he showed the camera and it had a glimpse of that. And it started talking like, "Wow, you look great, Charles." And he's like, "No, that's not me. That's just a piece of wood." And it was like, "I got too excited there. Now I see you." Stuff like that, which is really cool.
A Fully Integrated Desktop Experience
Dalton: Then they demoed a desktop app, where in real time with your voice, you could interact with this AI agent while sharing your screen. And so he was asking it programming questions and troubleshooting what to do with the script and how they could change the shape of the graph. He was demoing it with this script and asking the AI XYZ about the script and it was answering it and it was scrolling to a new part of the script and it was explaining the script like verbatim, like really good stuff on the fly.
Dalton: It has a human-like voice and the ability to interact with vision and your voice. I think that having voice capabilities for these AI agents is huge. To be able to do that on your phone, your computer, and not only can you do that, you can also be like, "Hey, what is this?" And it will be able to tell you. You turn on your camera and it'll tell you what's going on or how should I solve this thing is huge. It reveals where they want to go. They want a piece of your phone, they want a piece of your computer. And so they want to be everywhere. And they want to be accessible wherever you are. But I don't necessarily think that they want to make their own device. But they can have this extension of your device that you're paying for.
The Future of OpenAI's Models
Dalton: They will have to figure out the model sizes because GPT-4 is huge. And I don't know if it's efficient for them to be running that constantly with their hardware because it gets really expensive. And that's the main reason why they don't have a free GPT-4 is because of the expenses of running the requests constantly from users as the models are much more complex. It's over a trillion parameters.
Dalton: They also talked about allowing for the first time custom chatbots for free users, which is something that was only available to paid users. Paid users have to pay 20 bucks a month. I think that if you're questioning whether you should do it or not, you should consider how much you're paying on things that don't save you time.
Dalton: In conclusion, we discussed AlphaFold 3, we talked about GPT-4o, we talked about the potential of AlphaFold 3 and how it's going to change and expedite drug discovery and testing. And then we briefly talked about GPT-4o and what that will be doing for users and what's coming to users in a couple of weeks. There isn't much information besides what they demoed and I haven't been able to try it out myself, but we will be doing that on the podcast. A hundred percent. No question. So I encourage everyone to go on AlphaFold server. I'll put the link in the show notes.
RESOURCES MENTIONED
- AlphaFold 3
- ChatGPT-4o (GPT-4o)
- Google DeepMind
- Isomorphic Labs
- OpenAI
- AlphaFold Server
- All-In Podcast
- Meta
- Facebook Blueprint
INDEX OF CONCEPTS
AlphaFold 3, AlphaFold 2, ChatGPT-4o, OpenAI, protein structure prediction, drug discovery, multimodal AI, Isomorphic Labs, Google, DeepMind, machine learning, physics-based models, diffusion-based architecture, transformer-based architecture, AlphaFold Server, Sam Altman, All-In Podcast, AI agent, live translation, Meta, Facebook Blueprint