Amazon launches Nova Sonic AI model for voice generation

Summary: Amazon is going against the likes of Google’s Gemini and OpenAI’s GPT4.o AI models with the brand new Nova Sonic voice generation model. The company’s new voice model is capable of handling real-time speech processing and AI voice generation for conversational applications. Also, the brand has launched the new Nova Reel 1.1 AI model that can now generate longer videos based on text inputs.

Amazon has announced a new artificial intelligence (AI) model in its Nova family of models on Tuesday. Called Amazon Nova Sonic, the e-commerce giant’s new AI model is capable of generating human-like speech. The company said that developers can use its new AI model to build conversational AI chatbots and similar functionalities. Furthermore, Amazon has launched the Nova Reel 1.1 AI model that can generate two-minute-long videos.

You can use Amazon’s Nova Sonic AI model to build AI agents for various sectors

Amazon says that its Nova Sonic AI model can simplify the development of voice applications. These can include customer service call automation and AI agents across a wide range of industries, like travel, education, healthcare, entertainment, and more. Developers can use the AI model to create voice-powered applications that can complete various tasks for consumers with a “higher accuracy, while being more natural, and engaging.”

The Nova Sonic AI model isn’t a text-to-speech tool; instead, it can process voice inputs in real time and respond to them. Amazon said that traditional approaches to voice-enabled tools use multiple models for text recognition, speech-to-text conversion, data processing, and TTS. These could lead to an increase in latency, and failure in preserving linguistic context. However, the Nova Sonic AI model can unify speech understanding and speech generation components.

Amazon’s Nova Sonic AI model is available from the Bedrock developer platform

Furthermore, Amazon’s Nova Sonic is capable of recognizing different speaking styles. The company says that the AI model can also understand when a user misspeaks, pauses while speaking, or mumbles. As of now, it only supports the English language. However, the brand will add support for more languages soon. The model has a context window of 32,000 tokens for audio, with an additional window to handle longer conversations.

The Nova Sonic AI model is available through Bedrock, the e-commerce giant’s developer platform for making enterprise AI apps, via a new bi-directional streaming API. In a press release, the company called Nova Sonic “the most cost-efficient” AI voice model on the market. Amazon claims it to be approximately 80 percent less expensive than OpenAI’s GPT-4o.

Also, meet the new Nova Reel 1.1 video generation model

It’s worth adding that Amazon has also launched the new Nova Reel 1.1 AI model that can now generate longer videos based on text inputs. Successor to last year’s Nova Reel model, the new model can generate six-second-long shots, and a single video can have 20 such clips stitched together to create a 120-second-long video. It is also available to developers and general users via the Amazon Bedrock platform.