The Imperfect Echo: AI Voice Cloning & Digital Trust
AI can now convincingly clone voices and generate video, creating an authenticity crisis. How do we verify what's real? Listen to the full episode to learn more.

TL;DR
AI voice cloning is dangerously effective, creating an authenticity crisis where you can't trust what you hear or see online. Verification and personal security are now more critical than ever. #VentureStep #AI #DigitalTrust
INTRODUCTION
We are entering an era where the lines between reality and digital fabrication are irrevocably blurred. What if you could no longer trust a recording of a CEO's speech, a message from a loved one, or even the host of your favorite podcast? This isn't a distant sci-fi concept; it's a present-day reality fueled by rapid advancements in artificial intelligence that allow for the seamless cloning of voices and the generation of hyper-realistic video. This technological leap forward brings with it a host of ethical questions and an urgent societal problem: the erosion of digital trust.
In this episode of Venture Step, host Dalton Anderson confronts this issue head-on with a thought-provoking experiment. The first few minutes of the podcast are not voiced by him, but by an AI model trained on his own voice. The result is an "imperfect echo"—a clone that is convincing enough to deceive the casual listener, complete with the natural-sounding stumbles and monotone delivery that even a human host exhibits. Dalton uses this experiment as a launchpad to explore the profound implications of this technology.
From OpenAI's native image generation tools that can replicate the iconic style of Studio Ghibli to the dark-side applications of creating fake social media personas for profit, this conversation unpacks the current state of AI. Dalton discusses the societal pendulum swinging from online anonymity back to a need for identity verification and offers a surprisingly simple yet effective strategy for personal security in a world where you can't believe your ears: the secret password.
KEY TAKEAWAYS
- AI voice cloning technology has advanced to the point where it can create convincing, albeit imperfect, audio personas, challenging our ability to trust digital content.
- Native AI image and video generation tools, like those from OpenAI, make it increasingly easy to create stylized and realistic content, further blurring the lines of authenticity.
- The rise of anonymous, AI-generated "slop" content may force a societal shift back towards verified digital identities to maintain trust on social platforms.
- Unethical applications, such as creating deepfake personalities for financial scams, are becoming easier to execute, posing significant risks.
- A practical defense against voice cloning scams is to establish a non-digital secret password with family for verifying identity during emergencies.
FULL CONVERSATION
Dalton: Hey everybody, welcome to the episode. This is me, Dalton. I'm your host and it's actually me this time. It's not some AI generated voice or video. It's me, the actual host of this podcast episode.
The AI Deception: An Experiment in Voice Cloning
Dalton: I hope that you thought the AI generated me was thought provoking. And I know that there'd be some instances where the audio would mess up, but you have to think about it. In my recordings of my episodes, sometimes the audio messes up or things happen or I mispronounce words or I scramble a little bit and you can call those out and be like, "Oh, well that's when I noticed it was AI." The audio got jumbled a couple of times. Some of my tone was a little monotone. Think about it, when I'm talking in the podcast episode, my tone is monotone. And sometimes I like to change my frequency, try to vary my tone and my pace of my voice differently throughout the episode, just so it adds a little bit of spontaneous-ness to, see, I just did it, I just messed up a word. Mid episode. AI does it all the time.
Dalton: I want to emphasize that this is an episode that I have been pondering and thinking about for a while, and I just never had the time to sit down, create these AI prompts, then run them through an AI voice generator to match my voice, and then from there, add it into my episode and edit my audio and put that stuff in front. And I always questioned what I was going to do with the video.
Why Video Podcasting Matters in an Age of AI
Dalton: The video piece is what I deem is important. And one of the reasons why I do video podcasts is because one, they're more interactive. Two, it gives people the ability to verify that I am me and I've been doing this for a long time. I'm getting close to 70 episodes. But regardless of that, I do the video podcasts because they're more interactive. And the second is the most important part, is I felt that the way that the technology was going, you would easily be able to create some AI podcasts, some AI persona, online persona. You can have personality traits, hairstyle, skin tone, create a nice AI voice that does well with other podcasts listeners.
Dalton: You could ask, what podcasts perform well with certain demographics and interests and then create and curate an AI persona and person and tone and visuals that would resonate with that audience. That would be wrong. And if that does happen, then how do you know it's just not people telling you what you want to know? Or how do you know that what they're talking about is any real relevancy? And so there's all those questions that are going to become increasingly more prominent as we go on in this world with this technology.
The Rapid, Unsettling Pace of AI Advancement
Dalton: I'm not some kind of doomer, by the way. I'm not like, "Wow, AI is going to take over the world and we're going to be their slaves." This AI piece, we're in a very early stage of the development of this technology and the technology has been developing at a very rapid rate. Sooner or later we'll get to a recursion where the AI is improving itself and the AI is optimizing itself. And once we get there, then AI should improve a lot faster. Then we've got to build infrastructure for AI to communicate with other AIs, and there's got to be some kind of currency for them to trade jobs or hire humans or vice versa. Then you would need some kind of decentralized electronic currency that would support these transactions.
I don't think I've ever seen a technology change this fast. Like from one year ago to two years ago to three years ago to four, the difference is night and day.
Dalton: We always had trouble getting consistent character creation in your images. You always had trouble printing out any text. You can now stylize your text. You can create consistent characters. You can do all sorts of things without any effort. I mean, there's still some effort. For example, it took me 42 minutes to generate some Ghibli-style images, because it kept messing up. It was one, because I was being lazy with my prompts, and two, it just takes forever to generate. It might take you like eight minutes to generate an image. So I generated three images and that's 32 minutes right there. So keep that in mind that if you're going to do these things or generate these images, it does take time.
From Still Image to Animated Video: The Process Behind the AI Episode
Dalton: The core premise of the episode is to be discussing OpenAI's image generator, native image generation upgrade. I was going to make an anime Ghibli-style podcast studio setup, and then I'm going to turn that podcast studio setup image into a video. I'm going to turn it into a video using the image, lip sync it to my audio, and create an animated Ghibli-style podcast episode using a photo from my room using my webcam. The first 10 or so minutes are AI generated. The video is AI generated. Everything is just AI generated except this segment of the episode.
Dalton: I am having AI generate some of the parts of the audio and the scripts. AI generated all that script and the audio, and then I'm coming over with some human interaction at the end, but the video overlay is all going to be AI. And that's the plan at least. Why am I doing that? Well, I want to emphasize the point. Two, I think it's interesting. Three, why not? I saw somebody else do it a couple of weeks ago and it looked pretty cool. This is a perfect example of be careful of what you watch and listen to on the internet.
It's not only what you read anymore. You have to verify these people are legit.
Dalton: There's quite a few AI generated videos on TikTok with AI generated voices and they just say whatever. There's AI models that are created and trained to make AI social media videos and steal people's videos and then overlay text and stories over the top of them. Then there's some kind of AI story over the top of it to get views and to get engagement to generate revenue. And there's this whole shops where that's all they do is they just make these AI slop videos that people watch.
Recreating Memes and Art With AI: The Studio Ghibli Experiment
Dalton: Let's talk about OpenAI's recent release of their ChatGPT 4.0 Native Generation update. I want to start out with some examples. If you don't know what Studio Ghibli is, that's fine. Not everybody does, but Studio Ghibli is a very famous studio and they're known for only making just works of art. They only work on building the best animated movies ever. And I think probably one of the most notable ones is Howl's Moving Castle. They have a very unique art style. It's dreamlike, it's whimsical, it's light, it's fun.
Dalton: One thing that was pretty big on X a couple weeks ago was making traditional memes into Ghibli-like memes or Ghibli-styled memes. And so this meme is the child outside of her house that's burning down. The styling, it's very Ghibli. Then this person made a pig temple where there's all these pigs in a pool, giant pigs, maybe like pig gods. I don't know what's going on. And then here's this cat shrine, big cat in a massive tub. Very Ghibli-like art direction. The last one turned Resident Evil characters into Ghibli-like characters, which I think was interesting.
Dalton: Here are my two images. I originally asked for a room to feature high quality headphones, turn it into this cozy atmosphere and give it Ghibli-like vibes. And it gave me this, which I didn't like because it's not like my room. So I was like, okay, I need to be more direct. And then it gave me this, which is fine, but it lacked the character of my room. It didn't feel like my room. So I got this one. This one's actually decent. The only thing is the mic isn't what I want the mic to be. I want the mic to be like my mic. But this might work. I think the room and everything going on is perfect.
The Authenticity Crisis: Can We Trust What We See and Hear?
Dalton: Since now you can natively generate images, you'll easily be able to create videos using consistent characters, because you can create a couple images and then say, "turn this into a video" in the future. As I'd mentioned before, it calls into question the authenticity of the story of what you're trying to listen to. Having a system in place, either your own system or a system for these social media apps to verify that you're a real person will be increasingly more important.
And if you can't trust anything that you see or hear, then how do you know whether or not what you're listening to or looking at is real?
Anonymity vs. Verification: The Future of Social Media
Dalton: The AI generated audio that I talked in the beginning of the episode spoke about how the pendulum might have to switch from what is now anonymous social media. When I was a kid, everyone wanted social media to be their first and last name. After that point, there was a shift where people didn't want to be monitored. And so they started making Finstas, which were where you'd post your real stuff. After that point, there was a shift with the next generation where they're Finsta only, where their Instagram is anonymous by default.
Dalton: But now we're in the realm of AI, and AI is becoming more and more advanced. And that means that if everyone's anonymous, then you can't tell who's wearing the mask of the human and who is the human. And so I think the pendulum swings backwards to verification of your identity. X has done that in some fashion. If you wanted to do premium and be able to run ads and have increased visibility, you need to send in your ID and they'll verify it. Facebook has something like that as a beta for their premium program. I don't know how it would work. Maybe you have two feeds, like one is your verified feed with only verified people.
Dalton: You'd always get the question, okay, did you verify that with other sources? How do you know what you're saying is true? Increasingly in the world, people have just been blatantly lying. They'll go on video and they'll just lie or they'll publish something on social media that is a straight up lie and false. And so I think that there has been a weird acceptance of people who just have been lying. But what if that transfers to video and audio? Then you're questioning everything that you watch, everything you hear. How would you then know what is real and what is not?
Watermarking and Synthetic Keys: A Technical Solution?
Dalton: The promising piece is for videos and for audio and images, since those are created by numbers, you could encode a synthetic key, an AI synthetic ID to the audio. So you could easily verify whether or not that key was altered or if it has a key. With images, you can do the same thing. A pixel is just numbers, so you can encode your own little synthetic key, and Google's been doing this for a while for the AI images that they create. They have a digital synthetic key embedded within the photo. Where it gets a little wonky is text. I talked to a friend named Fabriz and he had completed some research on this, but overall it's a very complicated problem because text is not an image. If you made any edits to the text, then the key doesn't work. The text piece, I don't know if there's an easy solution for that. But people are already careful what they read online.
The Dark Side: Unethical Uses of AI Content Generation
Dalton: Be cautious on what you're viewing online. If you're picking up a new podcast, make sure that they have a video or make sure that they've been doing it a while. There's a lot of AI videos on YouTube now. I go on Facebook, there's so much interaction towards AI videos and AI images that people presumably have no idea is AI.
It's not the things that are blatantly AI. It's the things that you can't tell are AI that is the worst.
Dalton: There are increasingly opportunities being discussed on doing unethical things using these type of technologies. For example, someone had sent me a video about using women on social media, like on TikTok, using their face and then finding a woman who has a good physique, combining the two together and making an AI model that you would create videos with. Then use that to create an OnlyFans sales funnel where you use social media to create this AI model that's using people's real face. Then using the funnel to get people to subscribe to their OnlyFans, which is all AI generated. And that stuff is horrible. They use somebody's real face and just a complete disregard of other people. It's horrific.
A Practical Defense: The Power of a Secret Password
Dalton: We're going to be in this whack-a-mole situation, as I mentioned, for a while. You're always going to have the technology outpacing the detection. I haven't seen anything from TikTok or from Meta or even YouTube about trying to create a system to make sure that people's personas or their identity isn't being used in nefarious ways. My mom and I have a secret password and my Nana has a secret password that we never sent via text. We didn't discuss it in a room with electronics. And so this password will be used to verify whether or not you're really in trouble if someone calls or if something happened while I was traveling and I was like, "Hey, I really need help, send me $5,000 right now." You'd be asked to verify.
That's probably my takeaway for you. Be cautious about what you're listening to and watching on the internet and then create a secret password with your family and don't send it via text.
Dalton: If they don't know the password, then they will have to figure it out on their own or they'll have to meet you in person. Once again, I really appreciate everything that you guys are doing and supporting me and watching these episodes or listening to them. I really do appreciate it. I'm always surprised that people listen to these episodes and I'm just thankful that you guys are still around. Wherever you are in this world, have a great day, a good afternoon, a good evening. Thanks for listening and I can't wait for you to listen in next week. Goodbye.
RESOURCES MENTIONED
- OpenAI
- ChatGPT 4.0
- Google Gemini
- Studio Ghibli
- Howl's Moving Castle
- Resident Evil
- TikTok
- X (formerly Twitter)
- Meta
- YouTube
- OnlyFans
INDEX OF CONCEPTS
Dalton Anderson, Venture Step, OpenAI, Gemini, GPUs, AI personas, Studio Ghibli, Howl's Moving Castle, ChatGPT 4.0, X, TikTok, Finstas, Facebook, Meta, YouTube, Resident Evil, OnlyFans, AI voice cloning, digital trust, authenticity crisis, synthetic key, image generation, deepfake, AI slop