Venture Step

Google's AI Future: A Deep Dive Into Gemini & Notebook LM

Get a beta tester's inside look at Google's latest AI tools, from the game-changing Notebook LM to Gemini 1.5 Pro. Listen to the full episode to learn more.

Dalton Anderson

21 May 2024 • 13 min read

TL;DR

Google's Notebook LM is a game-changer, letting you create a personal AI expert from your own documents. It's free, private, and powered by the massive Gemini 1.5 Pro model. #VentureStep #GoogleIO #AI

INTRODUCTION

The relentless pace of AI development can feel overwhelming, with major announcements dropping almost weekly. At its recent I/O conference, Google unveiled a barrage of groundbreaking tools and model updates set to redefine how we interact with technology. From creative suites that generate video and music to powerful new learning models, Google is pushing the boundaries of what's possible, and importantly, deploying it at a massive scale.

In this episode, host Dalton provides a unique, hands-on perspective on these new technologies. With months of experience as a beta tester for many of the tools discussed, he cuts through the marketing hype to reveal what actually works, what still needs improvement, and what entrepreneurs and professionals should be paying close attention to. This isn't just a recap; it's an insider's analysis from someone who has been putting these tools to the test long before their public release.

Dalton explores the entire stack of Google's AI ecosystem, from creative tools like ImageFX and MusicFX to the revolutionary Notebook LM, a personal AI that can become an expert on any topic you provide. He also breaks down the significant updates to the Gemini family of models and details how AI is being integrated directly into the products billions of people use every day, including Google Search, Workspace, and Android.

KEY TAKEAWAYS

Notebook LM is a powerful, free tool to create a personal AI expert from your documents, perfect for studying or managing company knowledge.
Gemini 1.5 Pro's context window has doubled to two million tokens, enabling deeper analysis of massive amounts of information directly in the web UI.
Google is embedding AI responsibly with tools like Synth ID (digital watermarking) and making advanced models accessible to billions via Search, Android, and Workspace.
New creative tools like ImageFX now excel at rendering text in images, solving a major frustration with previous AI image generators.
LearnLM aims to make learning more interactive by integrating AI tutors directly into educational content on platforms like YouTube.

FULL CONVERSATION

Dalton: Welcome to VentureStep podcast where we discuss entrepreneurship, industry trends, and the occasional book review.

Dalton: Today, you're going to have a front row seat to Google's latest announcements from their most recent IO. We're going to explore the groundbreaking tools and models that you might find useful. We'll cover everything from text to video, magic, enhanced image generation to the revolutionary learning models and responsible AI development. Join us as we uncover the exciting possibilities. Of course, before we dive in, I'm your host Dalton. I've got a bit of a mix of background in programming, data science and insurance. Offline you can find me running, building my side business or lost in a good book.

A Beta Tester's Perspective on Google's AI Announcements

Dalton: Today, we're going to be discussing all of the important items that were demoed and unveiled at Google IO. I think that I'm better positioned to speak about these topics as I've been a beta tester for the last five or so months for some of these items. So I've had access for a long time before the general publ¹ic. I've been testing these tools for a good amount of time, so I feel like I'm well positioned to speak about them. That being said, for the things that I don't know and I haven't been able to try myself, I will be a little less excited about it.

The Problem with Real-Time AI Demos

Dalton: About five months ago, Google unveiled their Gemini Ultra model. It's supposed to be their most advanced, expensive model that is capable of doing insane things. And they demoed a video where it was a human drawing things on a piece of paper with messy handwriting, and the other half was Google Gemini Ultra answering the questions. And they made it seem like it was a real time interaction. But really what it was is they had a video that they took and then they uploaded it for Gemini Ultra. In the video that they uploaded to YouTube, they didn't specify that this isn't real time, and they made it seem and they alluded to it being real time. I felt that I got bamboozled because a couple months later it came out that that actually wasn't real time. I think that that video was misleading.

Exploring Google's New Creative AI Suite: VideoFX, ImageFX, and MusicFX

Dalton: So Google introduced to the general public these new AI tools. We got VideoFX, which is going to be text to video. I haven't tried this as I think they just released it. So I don't have any experience with VideoFX. ImageFX, I applied to the ImageGen waitlist. But I'm also pretty sure if you have Gemini Pro, ImageGen is also used as the multimodal function for Gemini Pro.

ImageFX 3: Finally, AI That Can Spell

Dalton: The new update with ImageGen 3 allows for higher quality image generation, and it includes these editing tools, which are nice. And also, I think the most important point is it can now generate images with text with much better accuracy than before. My podcast cover art was generated by Gemini with probably ImageGen 1. ImageGen 1 and OpenAI's image generation model both had issues generating images with text. If you ask for text on the image, it's going to be misspelled. I was asking for this cool sci-fi explorer image, and I wanted Venture Step to be at the top. It would say something like "step vintage." It wouldn't say Venture Step. Or it would be all misspelled. After some testing, it doesn't seem like that is much of an issue anymore.

MusicFX in Action: How Loop Daddy Created a Song Live

Dalton: There's also MusicFX, which is something I tried a couple of months ago. It generates these beats for you and you can make a soundtrack, and from that soundtrack, it helps you make a song kind of on the fly. If you don't have bongos at the studio, you could just go to MusicFX, type in "bongo sounds," and then it would generate a beat for you. You could download that beat and add it into your song.

Dalton: Google did a pre-show concert with this person called Loop Daddy, and his artist name is Loop Daddy. Loop Daddy did a pre-show of this MusicFX. So he asked the audience, what beats do you want? And he gave them some choices, but it was pretty much like everything was generated on the fly. He's like, which ones do you guys want? And they voted on it. And then they made a song from it live. It's not something that you generate lyrics with as well, it's just beats. I tried it, my beats are garbage. It was horrible. I was like, this tool is so hard to make anything good, but Loop Daddy did it right, I guess.

What is Synth ID? Google's Answer to AI-Generated Content

Dalton: And then they also introduced Synth ID. Synth ID is this digital watermark of all AI-generated content. Meta has a similar approach, but they actually put a watermark on their images. I don't think that is as good because somebody could just crop out the watermark. With Google's approach, anything that's generated by these Google products will have an ID associated with it. It's supposed to link anything that's generated with Google products to allow them to prevent people from doing bad things with the images.

Notebook LM: The Game-Changer for Personal Knowledge Management

Dalton: The next thing, which is really cool and I think is something that everyone should be using at their work is this learning LM or Learn LM. It was originally called Project Tailwind and they renamed it to Notebook LM.

Notebook LM is something that was launched, I don't know, I think like six months ago or something like that. I used it to help me study for some of my insurance exams.

Dalton: So basically I am able to put in these books that I have. I put the books in, it reads the book, and then I could ask it questions like, "Okay, I'm struggling with this chapter. Can you make me a study guide?" You can drop 10 files into the product and Notebook LM will go in and read the files and it will become an expert in what you give it. So you can give it stuff that is difficult to look up on the internet. Something that would be useful would be compliance stuff, or things that you have at your company that are only related to your company or your group.

The notebook LM would know where it's at, tell you where it is in the file, which file it's from, the exact session, and then give you the answer or what it thinks to be the answer.

Dalton: And then you can verify it yourself because it tells you the exact location where it's getting this information from. It shows its sources and it's pretty straightforward. And I think it's a game changer for studying or managing knowledge bases for unstructured knowledge bases like documents.

How Notebook LM Works: A Live Demonstration

Dalton: I set up this notebook for us, Google IOTest. I took a corpus of all the information that was uploaded from Google IO. Google typically posts a blog per item they discuss. So I'll ask it, "What is the learning notebook that was demoed?" It said, "I don't know what you're talking about," because I said the wrong name. And I said, "Sorry, I meant Notebook LM." It says, "Notebook LM is an experimental tool that uses Gemini 1.5 Pro to create audio overviews. It can take source material and generate a personalized and interactive audio conversation."

Those two problems, hallucination and rejection, aren't an issue when you create your own LM.

Dalton: This is an easy way to create your own LM where the LM will only answer things that are within the files you upload. It becomes an expert in what you give it. And if you ask it something that it doesn't have any information on, like I asked earlier, it says, "I don't have any information." It prevents you from getting bad information.

What you put in is what you get out. And there is no extra that's added in there, no hallucinations.

Dalton: I think it's super cool and very useful for my job. It would be really great if we put in our underwriting guidelines, our manual and some of the compliance related requirements from state to state into an LM and have that LM be an expert in those things for us.

The Future of Learning: Interactive Audio and LearnLM on YouTube

Dalton: The next feature update that they're doing, they're gonna make it where you set up your Notebook LM, you put in your information, it will generate an FAQ, it will generate a study guide. And then you're going to also have audio. You make your LM an audio book, but not only is it an audio book, it's interactive to where you can ask the notebook questions. In the demo, they said, "My son is studying for this class and he really likes basketball. Can you make analogies with math using basketball?" And so there were two AI people talking about how they would solve this problem. The voices are very human-like. It's not robotic whatsoever. And it is super duper cool.

Google's Approach to Responsible AI Development

Dalton: So they're taking the responsibility piece seriously. They did the Synth ID. They're launching the Learn LMs to make a more accessible and impactful AI enhancement to people's lives to help everyone learn. And a lot of these tools are free, which is great for Google. Like this Notebook LM that I was talking about earlier, that's completely free. It's using one of the second most advanced models that Google offers. So it's very generous from Google to allow you to use those tools for free. Their kind of motto is AI for all.

They are launching industry best models... They are launching everything that they're doing. They're doing at scale.

Dalton: Google AI for the size of their company and the velocity that they're moving at is incredible. They are launching industry-best models. Not only are they doing that, they're also launching innovative products like AlphaFold that I talked about last week.

Key Model Updates: Gemini 1.5 Pro, Flash, and Nano

Dalton: Gemini 1.5 Pro, which is the model that you would use if you went to gemini.com and you had Gemini Advanced enabled, got an upgraded context window. So it used to be one million, now it's two million. Earlier in the blog post, it said that the Notebook LM is used by Gemini Pro 1.5. So that being said, I would assume that the Notebook LM would be able to handle a two million context window.

Dalton: They also discussed Gemini Flash. Gemini 1.5 Flash is going to be used for real-time response. So if you need something very complex, you can use Gemini 1.5 Pro or Ultra. If you need something more interactive and conversational and you don't want to wait long, you use Flash. Flash is also cheaper than Pro.

Dalton: And then they talked about Gemini Nano. Gemini Nano is what is used on the Pixel Pro 8 with Google's AI chip. It is a model that will be on your phone. It doesn't need that much power, doesn't need access to the internet, and so everything that you look up with Gemini Nano is private and secured.

Gemini's Integration into Google Workspace and Gmail

Dalton: They have integrated Gemini into your workspace. I know they recently rolled it out to the general public. They'll do summaries of your emails, they'll write emails for you. They'll help you make spreadsheets or format your notes for you all within Google Workspace. This feature is on your Gmail and your Google Doc. I was working on this light thing from Nanoleaf and I'm having trouble finding the email. I'll say, can you help me find the email that talks about how to fix my lights that aren't working? So I couldn't find the email, so I asked Gemini. It cites the sources just like the Notebook LM. It found the ticket. Then I could say, what did Michael tell me to do in the email? So what Gemini gave me is better than what I got via email.

How AI Overviews Are Changing Google Search

Dalton: AI was implemented into your search recently. It's pretty useful if you're asking questions. You'll have an opportunity to use the AI summary or what they call AI Overview of what you ask and it will give you answers and it will cite its sources, just like the Notebook LM. I asked, "What is the tallest animal in the world?" and it says giraffes are the tallest land animals and cited four sources.

The users don't click on the link anymore. They're just going to read the answer right here.

Dalton: One of the issues with this and why people are really upset about it, especially developers, is it basically scrapes the data off of your website. This is going to decrease traffic because a lot of people are just asking questions. You got to adapt with the times. Times are changing pretty quick nowadays.

Gemini on Android: The New Assistant and Scam Detection

Dalton: On Android, you switched over from Google Assistant to Gemini. Gemini is now the new Google Assistant. The only issue is that they made Google Assistant good at doing tasks, like set a timer or make a shopping list. Gemini is good at answering questions, but as far as doing general assistant things, it's not that good at it.

Dalton: They also are launching scam detection alerts. If you're in a phone call and they start bringing up scammy stuff like, "Can you send me your social?" it will give you an alert on your screen like, "high likelihood this is a scam." It helps people out because a lot of people get taken advantage of.

A Look at Google's Advanced AI Infrastructure: Veo and Trillium

Dalton: Google launched Veo, which is the high-definition video. It's able to be used today on Vertex AI, which is Google Cloud's platform for AI tools. They also introduced Trillium, which is the sixth generation of Google Cloud TPUs. TPUs are used for training. It's much faster, better, and more energy efficient, which is probably the biggest aspect because we're going to get to a point soon where you'll get a certain throughput through electricity. It's easier to innovate with your hardware than it is to innovate with the energy grid.

Gemma 2: The Small Model Punching Above Its Weight

Dalton: Gemma 2 was also released. Gemma 2 is kind of the smaller models that aren't necessarily flagship, but are still used and are still important. It had a new architectural breakthrough, so it has improved performance and efficiency. Gemma 2 models are outperforming models that are two times bigger than Gemma.

IDX: Google's AI-Powered Code Editor

Dalton: And the last thing that I want to talk about is Google's code editor called IDX. It uses Flutter. It's integrated into Google Cloud and Firebase, and has Gemini APIs. Flutter is used for the web, it makes a web app, it makes an Android app, and it also makes an Apple app. It's supposed to be an easy way to maintain your application and website where you can integrate everything all into one code base. You can highlight a piece of code and I can ask Gemini to explain what's going on. So it would break down the code pretty much function by function.

Final Thoughts: Making AI Accessible for Billions

Dalton: As I was saying, they're launching these features to many users, three billion, and they are allowing all these people to interact with AI and get more comfortable with it. Same thing with Meta, launching those AI agents to billions of users. Everyone's gonna get use out of this AI overview on Chrome. Every Google user could use the "summarize my email" feature. Those things, I feel like, improve people's lives. It's a nice nudge in the right direction where AI is helpful and it's something good and don't be scared of this change.

If you're here, you're learning. If I'm here, I'm learning. We're all learning together.

Dalton: Next week, I will hopefully be discussing AlphaFold in a little bit more detail. Thanks for tuning in and I hope you tune in again. Bye. See you.

RESOURCES MENTIONED

Google I/O
VideoFX
ImageFX (ImageGen 3)
MusicFX
Loop Daddy (Mark Rebillet)
Synth ID
Notebook LM (Project Tailwind)
LearnLM
Gemini (1.5 Pro, 1.5 Flash, Nano, Ultra)
AlphaFold 3
DeepMind
Sima
IDX
Trillium (TPUs)
Gemma 2
Flutter
Nanoleaf
The Growth Handbook

INDEX OF CONCEPTS

Google I/O, VideoFX, ImageFX, ImageGen 3, MusicFX, Loop Daddy, Mark Rebillet, Synth ID, Notebook LM, Project Tailwind, LearnLM, Gemini, Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini Nano, Gemini Ultra, AlphaFold 3, DeepMind, Sima, IDX, Trillium, TPUs, Gemma 2, Flutter, Nanoleaf, The Growth Handbook, AI Overviews, digital watermark, context window, multimodality, large language models, LLM, Google Workspace, Android, AI Test Kitchen, Vertex AI, Google Cloud, Firebase, Hugging Face, OpenAI