Meta's Tech Spree: Robotics, Video, and AI Releases
Meta just dropped major updates in robotics, video generation, and AI tracking. We break down CoTracker 3, MovieGen, and Sparsh. Listen to the full episode to learn more.
TL;DR
Meta is on an absolute tear, shipping game-changing open-source tech in robotics, computer vision, and video generation at a breakneck pace. From digitizing touch to creating videos from text, the future is arriving faster than you think. 111 #VentureStep #MetaAI #Robotics
INTRODUCTION
In the fast-paced world of technology, it's rare for a single company to release a series of groundbreaking innovations in just a matter of weeks, but Meta has done just that. 2From advanced computer vision that can track objects with uncanny precision to video generation tools that border on magic, the company is demonstrating a relentless drive to push the boundaries of what's possible. 333
In this episode of Venture Step, host Dalton Anderson dives into Meta's recent shipping spree, even while battling a serious case of COVID. 44Despite feeling like he's in a "state of delirium," Dalton walks us through the impressive capabilities of these new technologies, offering live demos and expert analysis on their potential impact. 555
We explore CoTracker 3, a general-purpose computer vision model; MovieGen, a text-to-video tool that can edit and personalize content with simple prompts; and Sparsh, a revolutionary robotics project that digitizes the sense of touch. 666This episode unpacks not just the "what," but the "why" behind Meta's aggressive open-source strategy and what it means for the future of AI, robotics, and virtual reality. 77
KEY TAKEAWAYS
- Meta's CoTracker 3 offers superior object tracking that can predict movement and re-identify objects even after their view is temporarily obscured. 888
- MovieGen is a powerful text-to-video model that allows for extensive editing and personalization, including adding your own face to generated scenes. 999
- Meta is open-sourcing advanced robotics technology like Sparsh, which digitizes the sense of touch and outperforms existing task-specific models by an average of over 95%. 1010101010
- Mark Zuckerberg's core strategy is to dominate the next wave of technology by open-sourcing superior products, making it incredibly difficult for closed-source competitors to keep up. 111111
FULL CONVERSATION
Dalton: Welcome to Venture Step Podcasts where we discuss entrepreneurship, industry trends, and the occasional book review. 12Fair play or playing to win. 13Meta has been on fire lately, but not in a bad way. 14In this episode, we're going to discuss some of the things that they have shipped in the last five or so weeks. 15This is going to include some of their robotics releases, some of their tracking releases, and we'll have an actual demo of a video that I took in Japan of a random man walking down a sidewalk and we're going to be tracking him, which is pretty cool. 16And then there's also some Hollywood-ish, not publicly released yet, but Meta's MovieGen. 17
Dalton: But of course, before we dive in, I'm your host, Dalton Anderson. 18My background is a bit of a mix of programming, data science and insurance. 19Offline, you can find me building my side business or lost in a good book. 20You can listen to this podcast in both video and audio format on YouTube. 21And if audio is your thing, you can find the podcast on Apple Podcasts, Spotify, or wherever else you get your podcasts. 22
A Quick Disclaimer: Recording While Sick
Dalton: And another thing I should state before we start this episode is I am in a state of delirium. 23I have COVID and this is not the COVID that I want to meet again. 24This COVID is not so friendly. 25So I am throwing that out there that I may be mentally jumping about in the episode, with trains of thought not being complete. 26I would describe myself as feeling weightless, but cloudy. 27So I'm just throwing that out there so when you are listening to this episode, you are not like, what the heck? 28It may or may not make sense, but this is the issue I find myself in because I have fallen victim to not having episodes in the bank. 29
Dalton: I've never had episodes in the bank, so I've been recording every episode every week instead of having many episodes banked. 30And yeah, I couldn't have that many episodes in the bank because of the nature of the content I create, but two episodes would be grand. 31So I just wanted to let you know if you're a little confused about the stuff I'm talking about and the way that I'm explaining it, that's probably why. 32
Introducing CoTracker 3: Advanced Object Tracking
Dalton: Okay, so now we've got that straight. 33Now we are going to talk about CoTracker 3. So CoTracker 3 is an initiative that was released as a general-purpose computer vision model. 34It's got many applications, which are quite interesting, but I'm going to talk about that a little bit later. 35So CoTracker 3 is this way of tracking and it basically pixelizes or puts a point on each pixel and then identifies the object. 36And then it tracks and predicts where the object is or will go, which I think is pretty cool. 37
So you will track, but if its vision gets obscured, it will still be able to pick up on it because it knows or it would already have identified and matched the pixels of the object. 38
Dalton: So when the object comes back into frame, the object is instantly picked back up, which isn't something that other tracking methodologies can do. 39So it's better. 40 And here are some examples. This biker goes up the hill and the vision of the object, which is a kid biking, it gets obscured a little bit and it messes up the other trackers and they get all out of whack. 41Whereas CoTracker, it stays right on target the whole time because it predicts where the next movement of the object will go depending on the flight path of the pixels. 42And so it picks right back up where it needs to go, which I think is pretty cool. 43
A Live Demo of CoTracker 3 in Action
Dalton: Okay, so there's a paper on this. 44I'm going to review the paper, but we also have an actual demo of a video. 45So I am going to share this instead. 46This is a random man that I took a video of. 47It's pretty short, but it's basically walking in a city in Japan. 48There's a man in front of me and the vision gets obscured a couple of times. 49
Dalton: So I've already loaded it up and I'm submitting it. 50Okay, now it's done. 51So then I'm just going to hit track. 52You could identify which object you want to track by each frame, but I'm not doing that because there's only one object that it'll pick up. 53It did a pretty good job in my opinion. 54Okay, here it is. 55So when the video originally starts, see how it has all the little dots on the map? 56And then it finally picks up the guy and then it moves the dots off, and now it starts tracking him. 57See how there aren't dots everywhere and the dots are being picked up by the object that it's tracking? 58
Dalton: The man... he's carrying three bags in his left arm and then no bags in his right arm. 59Life's all about balance, you know? 60But see how it's tracking again? 61The object left the focal point of the video twice, and then right when it flipped back, it was right there. 62It was right there every time. 63It's pretty cool. 64It flips back, boom, it's already there. 65It's crazy. 66I think it's pretty sick. 67
Practical Applications of CoTracker 3
Dalton: So, it does tracking. 68Why is that useful? 69Well, there are a couple of things it could be doing. 70We could start with sports. 71You could use it for sports tracking for players or tracking on who is doing what and identifying them on the map. 72Which would be pretty cool for sports analysis. 73Something that people may not like is surveillance and security. 74So government surveillance really comes to mind where you're tracking everybody and identifying them and then tracking where they go. 75
Dalton: Autonomous vehicles is a good shout. 76With autonomous vehicles, you could use something like this to improve their tracking capabilities of objects that go out of frame and then come in frame later. 77And then there's some for manufacturing. 78This improved tracking mechanism would be able to pick up these items and know where they're at or identify items that are potentially defective and track where they're going. 79Then there's healthcare. 80I would think like on the lines of rehabilitation, like tracking the movement of a body. 81If you had a tracker and it was tracking your arm and where it's supposed to go and then what you're doing, it could help correct your movement. 82828282
Exploring MovieGen: The Future of Video Creation
Dalton: But next thing that we're going to talk about here is this MovieGen. 83This is just going to be a list of things that Meta shipped. 84They've shipped so much stuff. 85So we're transitioning over to MovieGen. 86MovieGen is pretty cool. 87Meta MovieGen is what they're saying is one of the most advanced foundational media models in existence. 88They haven't launched it yet; they're still in what they call creator partnerships. 89But basically MovieGen allows you to not only generate movies or videos via text, it also allows you to... this is an example. 90This one is a text input: a girl running across the beach and holding a kite. 91She's wearing jean shorts, a yellow shirt and the sun is shining down. 92I mean, it's a pretty good video. 93
Dalton: The interesting thing here is that you could not only personalize videos of yourself or things, you can provide a face shot of your friend or of yourself and you can make a video from it, which I think is pretty neat. 94
The Power of Text-Based Video Editing
Dalton: This is pretty cool. 95You could have an original video that you have. 96So maybe you're at a house party and you want to change it to a nightclub or some kind of sky rooftop bar. 97You can do that. 98 So in this original video, this person is running and then it says "add blue pom poms to his hands." And now he's running with pom poms. 99And then the other one says "turn it into a cactus desert," and it does. 100And then the next one says, "replace the running clothes with an inflatable dinosaur costume." 101And it looks pretty real. 102It's crazy. 103
I don't know how they're doing it. It looks so real. Like I don't know what's going on here. There's some black magic stuff. Like I've seen some cool stuff before, but when I saw this, I was really wowed. 104
Dalton: What is the practical application here? 105Well, maybe you don't like your shirt that you're wearing in that video. 106Change the shirt. 107Or change the background. 108This original video is a man warming up for a run. 109It changed his shirt and the background, changed the background to an outdoor stadium and he's warming up with a blue shirt. 110I guess they didn't tell it to change the shirt; it changed the shirt on its own. 111Weird. 112
Personalizing Videos with Your Own Face
Dalton: So then the next thing that's pretty cool is you can make personalized videos. 113You can take a selfie of yourself and then it'll make a video of you doing something. 114Like this guy took a video of himself and then he became a scientist. 115This woman, who just had a selfie with bad lighting, has herself in a painting video. 116She's painting and the hands look great. 117Lots of dexterity there. 118
Dalton: And then you can add a sound, like create effects and soundtracks for your videos. 119So not only can you create a video from text, you can also personalize the video and put objects in it. 120You can edit the video with text, completely change the background or objects in the video. 121You can personalize it by putting yourself in, and then you can add sound. 122Crazy. 123This isn't released yet. 124They are only allowing creator partners to use this MovieGen thing, which is a bummer because I really wanted to use it. 125
Introducing Meta Spirit LM: An Open-Source Assistant
Dalton: There's this other object that they released called the Meta Spirit LM, which I originally thought was a Google Notebook LM competitor, but open source. 126But really it's a different approach where it's less about standardizing information and more about just having an assistant nearby that you could process. 127It's multimodal. 128128It can process text, audio, and just be an assistant nearby, on your computer locally. 129129129129It's a different approach than Notebook LM by Google, because that's clearly a summarization and educational tool, whereas Meta Spirit LM is just overall a different approach. 130
Meta's Breakthroughs in Robotics with Sparsh
Dalton: Okay, so Meta's robotics advancements. 131I think the best thing to do is to share my screen and watch a two-minute video. 132
[video plays] Yeah. Insane, right? I thought the same thing. 133Crazy. 134It's hard to understand that these independent teams launched all these things in the last five weeks or so. 135Crazy that they've launched all this stuff so soon within each other and they're all incredible projects. 136And they're all coming with a 90-page research paper, which I'll need to read. 137
Dalton: I'll probably start with the robotics one because digitizing touch seems very interesting and overall, super cool. 138How are they shipping this much stuff is what I'm having a hard time understanding. 139This whole Meta robotics advancement and the integrations with digitizing the touch with the sensors and having the 460,000 images and then having a 95% improvement... 140yes, 95%. 141
...we found that Sparsh outperforms task and sensor specific models by an average of over 95% by this benchmark. 142
Dalton: So not only are they shipping all this stuff, they are absolutely just dominating in the things that they're releasing. 143They're dominating with CoTracker 3, they're dominating with Sparsh and these other things. 144It's just not even close. 145And the MovieGen thing looks just... I'm flabbergasted with all the stuff that they're launching. 146Sparsh means "touch" in a language called Sanskrit, which is a language back in the Bronze Age. 147It sounds pretty cool, but really it literally means touch. 148
The Strategy Behind Meta's Open-Source Push
Dalton: This open source hot dog is so much better and you can make it yourself at home and you have all the knowledge of what goes in it. 149Why would you use the other one? 150So it really burns the alternative. 151The alternative is just done. 152And that's really the goal of Mark Zuckerberg and Meta. 153
He is really pushing for these generational technologies that are gonna transform not only the industry, but society in general. 154
Dalton: The man wants it to be open source. 155He doesn't want to have a closed system model like phones or computers. 156He wants it to be open source and he wants open source to win. 157So he's like, yeah, I'm fine. 158We're just open sourcing everything. 159We don't need the money. 160We need to invest in these things to become advanced and to protect ourselves against sophisticated nations attacking our systems. 161
How New Tech Ties Back to VR and the Metaverse
Dalton: It was something that Mark talked about before when he was getting a lot of crap for pushing into VR and really leaning into the metaverse. 162The technologies that we will create from investing and optimizing this technology are going to be incredible. 163And you can just see some of these things that a lot of these research projects fold over to VR or fold over to the other things that they're doing. 164Like CoTracker 3, I forgot to talk about it, but for VR gaming, tracking your hands, maybe other players in the room, huge. 165And for this thing right here, the Sparsh and the Digiplex and the MetaDigit 360, those items would help in VR. 166You could scan in textures and then simulate these textures back to the human with sensors. 167I mean, you get pretty wild pretty quick with all that stuff. 168
Dalton: I really like Meta's approach. 169I think it's very admirable. 170And overall their stuff is great. 171And they ship it with a paper, like everything they're shipping comes with a paper. 172And I think that it's such a different approach than Google or OpenAI. 173The only two big open source folks are X with XAI... and then there is Meta. 174OpenAI doesn't open source. 175Anthropic isn't open source. 176Microsoft's AI isn't open source. 177If they keep shipping at this rate, I'm not sure that there is a closed source available. 178I think everything just eventually becomes open source, which would be great for everyone. 179
Final Thoughts and What's Next for Venture Step
Dalton: So let me know what you thought about this episode. 180Maybe you felt my state of delirium, maybe not so much. 181I thought the CoTracker 3 demo was pretty cool. 182Meta's MovieGen was something else entirely. 183This Sparsh robotics and all of those things that come with it are incredible and would be really cool for robotic surgeries or manufacturing or prosthetics. 184
Everything's going to be really cool in like 10, five years. Like the future is bright. It's brighter than people believe. 185
Dalton: In the coming weeks, I've been trying to gather some guests on the show. 186I also have a special insurance series. 187I want to have some insurance vendors on the show that I met at ITC. 188With that comes the responsibility of explaining what really is going on in insurance, what's happening in these catastrophic areas and why is it so difficult to get insurance? 189And then in general explaining what insurance is, how it's created, what is reinsurance, how do you price insurance, all those things and more. 190So bear with me. 191I'm trying my best. 192COVID doesn't help. 193But as always, I appreciate you and I hope that you tune in next week. 194Wherever you are in this world, have a good afternoon, good evening, good morning and overall have a great day. 195Goodbye. 196
RESOURCES MENTIONED
- Meta
- CoTracker 3
- MovieGen
- Meta Spirit LM
- Google Notebook LM
- Sparsh
- Digi360
- DigiPlex
- MetaDigit 360
- OpenAI
- Anthropic
- Microsoft
- X (company)
- XAI
- Grok
- YouTube
- Apple Podcasts
- Spotify
INDEX OF CONCEPTS
Dalton Anderson, Meta, CoTracker 3, MovieGen, Meta Spirit LM, Google Notebook LM, Sparsh, Sanskrit, Digi360, DigiPlex, MetaDigit 360, Mark Zuckerberg, VR (Virtual Reality), Metaverse, Skynet, Google, OpenAI, X, XAI, Grok, Anthropic, Microsoft, YouTube, Apple Podcasts, Spotify, Japan, ITC