Most reports on AWS’ re:Invent conference earlier this month, which brought us new chips and new data centers, overlooked the cloud giant’s unveiling of its first “frontier” models in generative artificial intelligence, code that can compete with the best from OpenAI and Google.
Amazon debuted Nova, a “new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance.”
Having sat out the battle of frontier performance while Google’s Gemini and OpenAI’s GPT-4 got all the attention, Amazon is making haste to catch up. Nova’s models, which handle multiple modalities that include text and image, come in flavors suited to video generation (akin to OpenAI’s Sora) and image generation, which has become standard fare for large language models that integrate text and images.
The models come with snappy names, too: “Reel” is the name of the video-generation model, and “Canvas” is the name of the image-generation flavor. There are nice-looking demonstrations of the capabilities akin to what we’ve seen from OpenAI and Google: There’s a video generated by Reel using the keyword “A snowman in a Venetian gondola ride, 4k, high resolution” and a slick photo of an interior made using Canvas with the prompt, “A very fancy French restaurant.”
<!–>
Nova makes extensive use, in Amazon’s own testing, of the retrieval-augmented-generation (RAG) approach to tap into databases, as well as “chain of thought,” a process for producing output that is treated as a kind of reasoning exercise by the AI model.
All that is by now industry-standard in Gen AI.
So, what exactly is new in Amazon’s Nova?
It’s hard to say because, as is increasingly the case with commercial AI software, Amazon’s technical report discloses precious little about how the Nova models are built. (Even the names of the report’s authors are not disclosed!)
The company states that the Nova models are “based on the Transformer architecture,” referring to Google’s 2017 breakthrough AI language model. There is also a “fine-tuning” approach where successive rounds of training seek to refine the models’ handling of different domains of data.