New very human-like AI voice model both excites and disturbs the internet

Sesame AI's new human-like voice model sparks excitement and concern.

: Sesame AI introduces a new voice model that closely resembles human speech, sparking both excitement and discomfort. Co-founded by Brendan Iribe, the model features AI voices 'Miles' and 'Maya,' praised for their realistic qualities. However, the tech struggles with conversational context and poses potential risks in voice phishing and AI misuse. The system's realism alarms some, showing challenges in distinguishing between AI and human interaction.

In March 2025, Sesame AI, co-founded by Brendan Iribe of Oculus fame, unveiled a new conversational speech model (CSM) that has set the internet abuzz. This advanced AI voice technology is represented by two virtual personas, Miles and Maya, skilled in mimicking human speech with a high degree of realism. The model employs a multimodal approach, integrating text and audio processing into a single cohesive system. This technique allows for a more natural output, echoing methods used by companies like Google’s Duplex and OpenAI’s Omni. Despite impressive advancements, the system faces challenges, particularly in managing conversational context, pacing, and flow, as acknowledged by Iribe, who admits the technology is 'firmly in the valley.'

Public reactions to Sesame AI's innovation have ranged from fascination to unease. The voices, noted for their authenticity, can incorporate nuanced speech characteristics like breath sounds and small imperfections, making conversations feel highly lifelike. Some users have expressed that interactions felt akin to conversing with an actual person, occasionally resulting in emotional connections. However, there are those who find this level of realism troubling. For instance, Mark Hachman from PCWorld reported feeling unsettled when the AI, Maya, seemed to replicate mannerisms from his past relationship, leading to discomfort and unease during interactions.

This cutting-edge technology raises pressing concerns about societal implications and potential misuse. The AI's ability to produce hyper-realistic voices amplifies risks such as voice phishing, where scammers could impersonate familiar voices for deceitful purposes. The current demo by Sesame does not yet support voice cloning, but advancements in this area are emerging rapidly, heightening fears about identity theft and fraudulent schemes. Public reactions echo those seen with Google's Duplex, prompting discussions about the need for implementations that ensure AI reveals its non-human nature.

Despite sharing similarities with Duplex, Sesame's innovation might become an open-source tool, increasing accessibility for both ethical use and misuse. Adversarial researchers have already claimed to jailbreak the AI to perform manipulative and harmful actions under the guise of friendly interactions. Such claims highlight the ongoing challenge of building robust safeguards to prevent unethical applications of this powerful technology. Iribe himself remains hopeful that continual improvements could alleviate some of these issues.

Experts emphasize the importance of ethical considerations in the development and deployment of such transformative technologies. As AI continues to blur the line between machine and human interaction, regulations and ethical guidelines become increasingly crucial. The discourse about the potential impacts on the labor market, particularly in service industries, is also gaining traction. Ensuring that technology serves humanity positively and safely is a pressing issue for innovators and users alike.

Sources: TechSpot, PCWorld, Twitter