A year later, OpenAI still hasn’t released its voice cloning tool

OpenAI's Voice Engine remains in limited preview for a year.

: OpenAI's Voice Engine, a tool claiming to clone voices with 15 seconds of speech, is still in limited testing with trusted partners a year after its small-scale preview announcement. This delay may be due to misuse concerns and regulatory scrutiny avoidance. OpenAI is learning from applications such as speech therapy and AI avatars. Developer access is limited, with no clear launch date or pricing yet.

In late March of the previous year, OpenAI revealed an AI service called Voice Engine, which reportedly could mimic a person’s voice from just 15 seconds of audio. Despite the promising announcement, the tool remains in limited preview almost a year later. OpenAI has not provided specific timelines for its broader release. OpenAI's cautious approach hints at concerns over potential misuse and regulatory scrutiny. Historically, OpenAI has faced criticism for favoring rapid product releases over ensuring safety, often in a bid to outpace competitors.

An OpenAI representative mentioned to TechCrunch that the platform tests Voice Engine with a select group of 'trusted partners.' These partners share insights on the tool's utility from varied domains like speech therapy, language learning, customer support, gaming, and AI avatars. OpenAI aims to refine the tool through such data to enhance user value and ensure its safety. Nevertheless, its development has been characterized by postponed launches, even though the tool is functional and used internally.

The underlying technology of the Voice Engine allows it to predict sound patterns and generate speech that mirrors different voices, accents, and styles. While OpenAI initially targeted March 7, 2024, for the API release, issues during that period caused delays. They prepared to allow 100 developers early access, with priority for projects demonstrating social benefit or sustainable and responsible tech applications. Despite pricing the service at $15 and $30 per million characters for varying voice quality, the service's debut faced postponements.

By mid-2024, OpenAI disclosed blog posts suggesting the delay partly derived from potential voice cloning misuses during the U.S. election. To mitigate risks, the company instituted features like watermarking generated audios and requiring developers to secure explicit consent from original speakers. Compliance enforcement for these policies, however, remains unclear, posing challenges even for OpenAI.

Carlos Pereira, CEO of Livox, a partner company, praised the potential use of the tool for assisting people with disabilities. However, Pereira noted hurdles due to the tool's current online-only mode, limiting its offline application necessary for many Livox users. Garnering balance between technological innovation and ethical considerations remains pertinent, especially given 2024's rise in AI voice-associated scams leading to fraudulent activities.

Sources: TechCrunch, OpenAI blog