At any given moment, many thousands of new videos are being posted to sites like YouTube, TikTok, and Instagram. An increasing number of those videos are being recorded and streamed live. But tech and media companies still struggle to understand what’s going in all that content.
Now MIT alumnus-founded Netra is using artificial intelligence to improve video analysis at scale. The company’s system can identify activities, objects, emotions, locations, and more to organize and provide context to videos in new ways.
Companies are using Netra’s solution to group similar content into highlight reels or news segments, flag nudity and violence, and improve ad placement. In advertising, Netra is helping ensure videos are paired with relevant ads so brands can move away from tracking individual people, which has led to privacy concerns.
“The industry as a whole is pivoting toward content-based advertising, or what they call affinity advertising, and away from cookie-based, pixel-based tracking, which was always sort of creepy,” Netra co-founder and CTO Shashi Kant SM ’06 says.
Netra also believes it is improving the searchability of video content. Once videos are processed by Netra’s system, users can start a search with a keyword. From there, they can click on results to see similar content and find increasingly specific events.
For instance, Netra’s system can process a baseball season’s worth of video and help users find all the singles. By clicking on certain plays to see more like it, they can also find all the singles that were almost outs and led the fans to boo angrily.
“Video is by far the biggest information resource today,” Kant says. “It dwarfs text by orders of magnitude in terms of information richness and size, yet no one’s even touched it with search. It’s the whitest of white space.”
Pursuing a vision
Internet pioneer and MIT professor Sir Tim Berners-Lee has long worked to improve machines’ ability to make sense of data on the internet. Kant researched under Berners-Lee as a graduate student and was inspired by his vision for improving the way information is stored and used by machines.
“The holy grail to me is a new paradigm in information retrieval,” Kant says. “I feel web search is still 1.0. Even Google is 1.0. That’s been the vision of Sir Tim Berners-Lee’s semantic web initiative and that’s what I took from that experience.”
Kant was also a member of the winning team in the MIT $100K Entrepreneurship Competition (the MIT $50K back then). He helped write the computer code for a solution called the Active Joint Brace, which was an electromechanical orthotic device for people with disabilities.
After graduating in 2006, Kant started a company that used AI in its solution called Cognika. AI still had a bad reputation from being overhyped, so Kant would use terms like cognitive computing when pitching his company to investors and customers.
Kant started Netra in 2013 to use AI for video analysis. These days he has to deal with the opposite end of the hype spectrum, with so many startups claiming they use AI in their solution.
Netra tries cutting through the hype with demonstrations of its system. Netra can quickly analyze videos and organize the content based on what’s going on in different clips, including scenes where people are doing similar things, expressing similar emotions, using similar products, and more. Netra’s analysis generates metadata for different scenes, but Kant says Netra’s system provides much more than keyword tagging.
“What we work with are embeddings,” Kant explains, referring to how his system classifies content. “If there’s a scene of someone hitting a home run, there’s a certain signature to that, and we generate an embedding for that. An embedding is a sequence of numbers, or a ‘vector,’ that captures the essence of a piece of content. Tags are just human readable representations of that. So, we’ll train a model that detects all the home runs, but underneath the cover there’s a neural network, and it’s creating an embedding of that video, and that differentiates the scene in other ways from an out or a walk.”
By defining the relationships between different clips, Netra’s system allows customers to organize and search their content in new ways. Media companies can determine the most exciting moments of sporting events based on fans’ emotions. They can also group content by subject, location, or by whether or not clips include sensitive or disturbing content.
Those abilities have major implications for online advertising. An advertising company representing a brand like the outdoor apparel company Patagonia could use Netra’s system to place Patagonia’s ads next to hiking content. Media companies could offer brands like Nike advertising space around clips of sponsored athletes.
Those capabilities are helping advertisers adhere to new privacy regulations around the world that put restrictions on gathering data on individual people, especially children. Targeting certain groups of people with ads and tracking them across the web has also become controversial.
Kant believes Netra’s AI engine is a step toward giving consumers more control over their data, an idea long championed by Berners-Lee.
“It’s not the implementation of my CSAIL work, but I’d say the conceptual ideas I was pursuing at CSAIL come through in Netra’s solution,” Kant says.
Transforming the way information is stored
Netra currently counts some of the country’s largest media and advertising companies as customers. Kant believes Netra’s system could one day help anyone search through and organize the growing ocean of video content on the internet. To that end, he sees Netra’s solution continuing to evolve.
“Search hasn’t changed much since it was invented for web 1.0,” Kant says. “Right now there’s lots of link-based search. Links are obsolete in my view. You don’t want to visit different documents. You want information from those documents aggregated into something contextual and customizable, including just the information you need.”
Kant believes such contextualization would greatly improve the way information is organized and shared on the internet.
“It’s about relying less and less on keywords and more and more on examples,” Kant explains. “For instance, in this video, if Shashi makes a statement, is that because he’s a crackpot or is there more to it? Imagine a system that could say, ‘This other scientist said something similar to validate that statement and this scientist responded similarly to that question.’ To me, those types of things are the future of information retrieval, and that’s my life’s passion. That’s why I came to MIT. That’s why I’ve spent one and a half decades of my life fighting this battle of AI, and that’s what I’ll continue to do.”