Nari Labs Unveils Dia, a New AI Speech Model Designed for Customization

Nari Labs, a South Korean startup, has developed a new tool Dia. This cutting-edge AI speech model allows the creation of realistic dialogue straight from the script. This new synthetic voice platform has a staggering 1.6 billion parameters. The result is speech that’s nearly indistinguishable from human speech. Dia gives users the option to personalize…

Lisa Wong Avatar

By

Nari Labs Unveils Dia, a New AI Speech Model Designed for Customization

Nari Labs, a South Korean startup, has developed a new tool Dia. This cutting-edge AI speech model allows the creation of realistic dialogue straight from the script. This new synthetic voice platform has a staggering 1.6 billion parameters. The result is speech that’s nearly indistinguishable from human speech.

Dia gives users the option to personalize their experience, allowing them to customize the tones of speakers to their preferences. This feature is unique because it enhances the model’s generalizability. This opens new potential applications ranging from learning aids to art and amusement. In addition, Dia goes one step further by allowing for disfluencies, coughs, laughs, and other nonverbal cues to be integrated into generated speech. This capability increases the fluidity and liveliness of chats. This is a big leap forward in AI-generated conversational content.

Dia’s development made heavy use of Google’s TPU Research Cloud program. It utilized the speed and processing power of TPU AI chips, designed with massive datasets in mind. The model involved intensive training that consumed large computational resources. Serious worries have been raised regarding the potential use of copyrighted content in its development. These sorts of worries further highlight the need for stronger protections around intellectual property in this fast-developing sphere of technology that is artificial intelligence.

Perhaps the most striking feature of Dia is her ability to imitate particular voices. One of the samples Dia produced looks pretty darn similar to the faces of the NPR “Planet Money” podcast hosts. This curious juxtaposition is what has so many listeners intrigued. This capability demonstrates the model’s precision in replicating vocal attributes, further prompting discussions about ethical considerations in voice replication technology.

Nari Labs has expressed intentions to publish a technical report outlining Dia’s architecture and city-building capabilities. We hope that this report provides essential transparency, as well as insight into the technology that powers the model. Nari Labs looks to build Dia’s multilingual capabilities beyond just English. This’ll improve accessibility and usability for a wider variety of linguistic populations.

Looking ahead to future development, Nari Labs has honed in on the “social aspect” embedded within Dia’s design. This community-favorite feature will significantly increase user interaction and engagement. By giving people a means to communicate and connect, it establishes Dia as an empowering tool rather than a simple voice generation tool.