AWS reInvent

Amazon unveiled the next step in its “AI journey” on December 3, 2024, as part of the company’s AWS re:Invent conference: Amazon Nova, a suite of foundation models underlying applications for multimedia content.

Users can access Amazon Nova models via a single API provided by Amazon Bedrock, a managed service for “high-performing foundation models from leading AI companies and Amazon.” The newest series of models are designed to be “easy to use with a customer’s systems and data.”

The suite includes Amazon Nova Micro (text only); Amazon Nova Life (a very low-cost multimodal model for processing images, video, and text); and Amazon Nova Pro (a multimodal model offering combined improvements in accuracy, speed, and cost). 

According to the press release, the models support “a wide range of tasks across 200 languages and multiple modalities.”

“The models are especially proficient in English, German, Spanish, French, Italian, Japanese, Korean, Arabic, Simplified Chinese, Russian, Hindi, Portuguese, Dutch, Turkish, and Hebrew,” an AWS resources page explains. 

“For understanding models, which accept text, image, or video inputs to generate text output, the full range of over 200 languages is supported. This means these models can process and understand content in all these languages for tasks like summarization, translation, content classification, and visual question-answering.”

Models for “creative content generation,” however, are limited to supporting prompts in English. For example, users can generate images or videos using Amazon Nova Canvas and Amazon Nova Reel, but only through English-language text prompts.

Like many other large language models (LLMs), Amazon Nova does not seem to have been designed specifically for the task of translation; rather, translation is yet another skill the LLM has “developed.” 

Nonetheless, the team at Amazon Artificial General Intelligence, authors of the technical report on Amazon Nova’s capabilities, made sure to test the models on translation.

Using Flores200, a “machine translation benchmark of translations from 842 distinct web articles,” they tested Nova models on translation from and into English for Arabic, German, Spanish, French, Hindi, Italian, Japanese, Korean, Portuguese, Hebrew, Turkish, Simplified Chinese, Russian, and Dutch. 

According to the researchers, the results demonstrated “strong multilingual performance on translation for Amazon Nova Pro, Lite, and Micro,” which they compared to several versions each of Claude, Gemini, GPT-4o, and Llama 3. 

Tell Us How You Really Feel

Reactions to Amazon Nova have been somewhat mixed; contributors, understandably, have been excited to finally share their participation in the project. One observer, tempering others’ enthusiasm, wrote on X, “Nah bro they lost on most of the benchmarks” — without specifying which benchmarks.

British programmer Simon Willison, creator of the open-source web framework Django, was much more positive. 

“I spent some time yesterday exploring the new Amazon Nova LLM family, and I’m really impressed,” he posted on X, adding that “[w]ith this release I think Amazon may have earned a spot among the top tier of model providers.”

In particular, Willison noted that the Nova models are “price and quality competitive with Google Gemini – and Nova Micro is now the cheapest model from any of the major vendors (cheaper even than Gemini 1.5 Flash-8B).”

While Amazon Nova does seem to mark Amazon’s foray into GenAI, dominated up until now by its competitors, Willison said he does not believe Nova will compete with Anthropic’s Claude: “Nova models aren’t in the same quality class as Claude and Anthropic don’t currently seem to want to compete at the low end of the pricing scale.”

Much of the negativity Amazon Nova has attracted just days after its appearance is related to its accessibility, rather than its quality.

“And I thought [Google’s] APIs are difficult to access. Amazon is another level,” one would-be user wrote on X.

“It reeks of trying to have a competitive model as quickly as possible to compete. No regard for whether anyone can use it,” another declared.

“Honestly, when has Amazon not been challenging? I think you burn at least 1,000 calories just getting through AWS documentation!” another commenter chimed in. “One thing that makes the $20 I pay for Claude or GPT totally worth it is that I don’t have to read their docs anymore”

Amazon plans to introduce two additional Amazon Nova models in 2025: a speech-to-speech model, which will understand streaming speech input and interpret both verbal and non-verbal cues; and a native multimodal-to-multimodal (or any-to-any modality) model. The latter will process text, images, audio, and video as both input and output.