More stories

  • in

    A new state of the art for unsupervised vision

    Labeling data can be a chore. It’s the main source of sustenance for computer-vision models; without it, they’d have a lot of difficulty identifying objects, people, and other important image characteristics. Yet producing just an hour of tagged and labeled data can take a whopping 800 hours of human time. Our high-fidelity understanding of the world develops as machines can better perceive and interact with our surroundings. But they need more help.

    Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Microsoft, and Cornell University have attempted to solve this problem plaguing vision models by creating “STEGO,” an algorithm that can jointly discover and segment objects without any human labels at all, down to the pixel.

    STEGO learns something called “semantic segmentation” — fancy speak for the process of assigning a label to every pixel in an image. Semantic segmentation is an important skill for today’s computer-vision systems because images can be cluttered with objects. Even more challenging is that these objects don’t always fit into literal boxes; algorithms tend to work better for discrete “things” like people and cars as opposed to “stuff” like vegetation, sky, and mashed potatoes. A previous system might simply perceive a nuanced scene of a dog playing in the park as just a dog, but by assigning every pixel of the image a label, STEGO can break the image into its main ingredients: a dog, sky, grass, and its owner.

    Play video

    A new state of the art for unsupervised computer vision

    Assigning every single pixel of the world a label is ambitious — especially without any kind of feedback from humans. The majority of algorithms today get their knowledge from mounds of labeled data, which can take painstaking human-hours to source. Just imagine the excitement of labeling every pixel of 100,000 images! To discover these objects without a human’s helpful guidance, STEGO looks for similar objects that appear throughout a dataset. It then associates these similar objects together to construct a consistent view of the world across all of the images it learns from.

    Seeing the world

    Machines that can “see” are crucial for a wide array of new and emerging technologies like self-driving cars and predictive modeling for medical diagnostics. Since STEGO can learn without labels, it can detect objects in many different domains, even those that humans don’t yet understand fully. 

    “If you’re looking at oncological scans, the surface of planets, or high-resolution biological images, it’s hard to know what objects to look for without expert knowledge. In emerging domains, sometimes even human experts don’t know what the right objects should be,” says Mark Hamilton, a PhD student in electrical engineering and computer science at MIT, research affiliate of MIT CSAIL, software engineer at Microsoft, and lead author on a new paper about STEGO. “In these types of situations where you want to design a method to operate at the boundaries of science, you can’t rely on humans to figure it out before machines do.”

    STEGO was tested on a slew of visual domains spanning general images, driving images, and high-altitude aerial photographs. In each domain, STEGO was able to identify and segment relevant objects that were closely aligned with human judgments. STEGO’s most diverse benchmark was the COCO-Stuff dataset, which is made up of diverse images from all over the world, from indoor scenes to people playing sports to trees and cows. In most cases, the previous state-of-the-art system could capture a low-resolution gist of a scene, but struggled on fine-grained details: A human was a blob, a motorcycle was captured as a person, and it couldn’t recognize any geese. On the same scenes, STEGO doubled the performance of previous systems and discovered concepts like animals, buildings, people, furniture, and many others.

    STEGO not only doubled the performance of prior systems on the COCO-Stuff benchmark, but made similar leaps forward in other visual domains. When applied to driverless car datasets, STEGO successfully segmented out roads, people, and street signs with much higher resolution and granularity than previous systems. On images from space, the system broke down every single square foot of the surface of the Earth into roads, vegetation, and buildings. 

    Connecting the pixels

    STEGO — which stands for “Self-supervised Transformer with Energy-based Graph Optimization” — builds on top of the DINO algorithm, which learned about the world through 14 million images from the ImageNet database. STEGO refines the DINO backbone through a learning process that mimics our own way of stitching together pieces of the world to make meaning. 

    For example, you might consider two images of dogs walking in the park. Even though they’re different dogs, with different owners, in different parks, STEGO can tell (without humans) how each scene’s objects relate to each other. The authors even probe STEGO’s mind to see how each little, brown, furry thing in the images are similar, and likewise with other shared objects like grass and people. By connecting objects across images, STEGO builds a consistent view of the word.

    “The idea is that these types of algorithms can find consistent groupings in a largely automated fashion so we don’t have to do that ourselves,” says Hamilton. “It might have taken years to understand complex visual datasets like biological imagery, but if we can avoid spending 1,000 hours combing through data and labeling it, we can find and discover new information that we might have missed. We hope this will help us understand the visual word in a more empirically grounded way.”

    Looking ahead

    Despite its improvements, STEGO still faces certain challenges. One is that labels can be arbitrary. For example, the labels of the COCO-Stuff dataset distinguish between “food-things” like bananas and chicken wings, and “food-stuff” like grits and pasta. STEGO doesn’t see much of a distinction there. In other cases, STEGO was confused by odd images — like one of a banana sitting on a phone receiver — where the receiver was labeled “foodstuff,” instead of “raw material.” 

    For upcoming work, they’re planning to explore giving STEGO a bit more flexibility than just labeling pixels into a fixed number of classes as things in the real world can sometimes be multiple things at the same time (like “food”, “plant” and “fruit”). The authors hope this will give the algorithm room for uncertainty, trade-offs, and more abstract thinking.

    “In making a general tool for understanding potentially complicated datasets, we hope that this type of an algorithm can automate the scientific process of object discovery from images. There’s a lot of different domains where human labeling would be prohibitively expensive, or humans simply don’t even know the specific structure, like in certain biological and astrophysical domains. We hope that future work enables application to a very broad scope of datasets. Since you don’t need any human labels, we can now start to apply ML tools more broadly,” says Hamilton.

    “STEGO is simple, elegant, and very effective. I consider unsupervised segmentation to be a benchmark for progress in image understanding, and a very difficult problem. The research community has made terrific progress in unsupervised image understanding with the adoption of transformer architectures,” says Andrea Vedaldi, professor of computer vision and machine learning and a co-lead of the Visual Geometry Group at the engineering science department of the University of Oxford. “This research provides perhaps the most direct and effective demonstration of this progress on unsupervised segmentation.” 

    Hamilton wrote the paper alongside MIT CSAIL PhD student Zhoutong Zhang, Assistant Professor Bharath Hariharan of Cornell University, Associate Professor Noah Snavely of Cornell Tech, and MIT professor William T. Freeman. They will present the paper at the 2022 International Conference on Learning Representations (ICLR).  More

  • in

    Security tool guarantees privacy in surveillance footage

    Surveillance cameras have an identity problem, fueled by an inherent tension between utility and privacy. As these powerful little devices have cropped up seemingly everywhere, the use of machine learning tools has automated video content analysis at a massive scale — but with increasing mass surveillance, there are currently no legally enforceable rules to limit privacy invasions. 

    Security cameras can do a lot — they’ve become smarter and supremely more competent than their ghosts of grainy pictures past, the ofttimes “hero tool” in crime media. (“See that little blurry blue blob in the right hand corner of that densely populated corner — we got him!”) Now, video surveillance can help health officials measure the fraction of people wearing masks, enable transportation departments to monitor the density and flow of vehicles, bikes, and pedestrians, and provide businesses with a better understanding of shopping behaviors. But why has privacy remained a weak afterthought? 

    The status quo is to retrofit video with blurred faces or black boxes. Not only does this prevent analysts from asking some genuine queries (e.g., Are people wearing masks?), it also doesn’t always work; the system may miss some faces and leave them unblurred for the world to see. Dissatisfied with this status quo, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), in collaboration with other institutions, came up with a system to better guarantee privacy in video footage from surveillance cameras. Called “Privid,” the system lets analysts submit video data queries, and adds a little bit of noise (extra data) to the end result to ensure that an individual can’t be identified. The system builds on a formal definition of privacy — “differential privacy” — which allows access to aggregate statistics about private data without revealing personally identifiable information.

    Typically, analysts would just have access to the entire video to do whatever they want with it, but Privid makes sure the video isn’t a free buffet. Honest analysts can get access to the information they need, but that access is restrictive enough that malicious analysts can’t do too much with it. To enable this, rather than running the code over the entire video in one shot, Privid breaks the video into small pieces and runs processing code over each chunk. Instead of getting results back from each piece, the segments are aggregated, and that additional noise is added. (There’s also information on the error bound you’re going to get on your result — maybe a 2 percent error margin, given the extra noisy data added). 

    For example, the code might output the number of people observed in each video chunk, and the aggregation might be the “sum,” to count the total number of people wearing face coverings, or the “average” to estimate the density of crowds. 

    Privid allows analysts to use their own deep neural networks that are commonplace for video analytics today. This gives analysts the flexibility to ask questions that the designers of Privid did not anticipate. Across a variety of videos and queries, Privid was accurate within 79 to 99 percent of a non-private system.

    “We’re at a stage right now where cameras are practically ubiquitous. If there’s a camera on every street corner, every place you go, and if someone could actually process all of those videos in aggregate, you can imagine that entity building a very precise timeline of when and where a person has gone,” says MIT CSAIL PhD student ​​Frank Cangialosi, the lead author on a paper about Privid. “People are already worried about location privacy with GPS — video data in aggregate could capture not only your location history, but also moods, behaviors, and more at each location.” 

    Privid introduces a new notion of “duration-based privacy,” which decouples the definition of privacy from its enforcement — with obfuscation, if your privacy goal is to protect all people, the enforcement mechanism needs to do some work to find the people to protect, which it may or may not do perfectly. With this mechanism, you don’t need to fully specify everything, and you’re not hiding more information than you need to. 

    Let’s say we have a video overlooking a street. Two analysts, Alice and Bob, both claim they want to count the number of people that pass by each hour, so they submit a video processing module and ask for a sum aggregation.

    The first analyst is the city planning department, which hopes to use this information to understand footfall patterns and plan sidewalks for the city. Their model counts people and outputs this count for each video chunk.

    The other analyst is malicious. They hope to identify every time “Charlie” passes by the camera. Their model only looks for Charlie’s face, and outputs a large number if Charlie is present (i.e., the “signal” they’re trying to extract), or zero otherwise. Their hope is that the sum will be non-zero if Charlie was present. 

    From Privid’s perspective, these two queries look identical. It’s hard to reliably determine what their models might be doing internally, or what the analyst hopes to use the data for. This is where the noise comes in. Privid executes both of the queries, and adds the same amount of noise for each. In the first case, because Alice was counting all people, this noise will only have a small impact on the result, but likely won’t impact the usefulness. 

    In the second case, since Bob was looking for a specific signal (Charlie was only visible for a few chunks), the noise is enough to prevent them from knowing if Charlie was there or not. If they see a non-zero result, it might be because Charlie was actually there, or because the model outputs “zero,” but the noise made it non-zero. Privid didn’t need to know anything about when or where Charlie appeared, the system just needed to know a rough upper bound on how long Charlie might appear for, which is easier to specify than figuring out the exact locations, which prior methods rely on. 

    The challenge is determining how much noise to add — Privid wants to add just enough to hide everyone, but not so much that it would be useless for analysts. Adding noise to the data and insisting on queries over time windows means that your result isn’t going to be as accurate as it could be, but the results are still useful while providing better privacy. 

    Cangialosi wrote the paper with Princeton PhD student Neil Agarwal, MIT CSAIL PhD student Venkat Arun, assistant professor at the University of Chicago Junchen Jiang, assistant professor at Rutgers University and former MIT CSAIL postdoc Srinivas Narayana, associate professor at Rutgers University Anand Sarwate, and assistant professor at Princeton University and Ravi Netravali SM ’15, PhD ’18. Cangialosi will present the paper at the USENIX Symposium on Networked Systems Design and Implementation Conference in April in Renton, Washington. 

    This work was partially supported by a Sloan Research Fellowship and National Science Foundation grants. More

  • in

    Unlocking new doors to artificial intelligence

    Artificial intelligence research is constantly developing new hypotheses that have the potential to benefit society and industry; however, sometimes these benefits are not fully realized due to a lack of engineering tools. To help bridge this gap, graduate students in the MIT Department of Electrical Engineering and Computer Science’s 6-A Master of Engineering (MEng) Thesis Program work with some of the most innovative companies in the world and collaborate on cutting-edge projects, while contributing to and completing their MEng thesis.

    During a portion of the last year, four 6-A MEng students teamed up and completed an internship with IBM Research’s advanced prototyping team through the MIT-IBM Watson AI Lab on AI projects, often developing web applications to solve a real-world issue or business use cases. Here, the students worked alongside AI engineers, user experience engineers, full-stack researchers, and generalists to accommodate project requests and receive thesis advice, says Lee Martie, IBM research staff member and 6-A manager. The students’ projects ranged from generating synthetic data to allow for privacy-sensitive data analysis to using computer vision to identify actions in video that allows for monitoring human safety and tracking build progress on a construction site.

    “I appreciated all of the expertise from the team and the feedback,” says 6-A graduate Violetta Jusiega ’21, who participated in the program. “I think that working in industry gives the lens of making sure that the project’s needs are satisfied and [provides the opportunity] to ground research and make sure that it is helpful for some use case in the future.”

    Jusiega’s research intersected the fields of computer vision and design to focus on data visualization and user interfaces for the medical field. Working with IBM, she built an application programming interface (API) that let clinicians interact with a medical treatment strategy AI model, which was deployed in the cloud. Her interface provided a medical decision tree, as well as some prescribed treatment plans. After receiving feedback on her design from physicians at a local hospital, Jusiega developed iterations of the API and how the results where displayed, visually, so that it would be user-friendly and understandable for clinicians, who don’t usually code. She says that, “these tools are often not acquired into the field because they lack some of these API principles which become more important in an industry where everything is already very fast paced, so there’s little time to incorporate a new technology.” But this project might eventually allow for industry deployment. “I think this application has a bunch of potential, whether it does get picked up by clinicians or whether it’s simply used in research. It’s very promising and very exciting to see how technology can help us modify, or I can improve, the health-care field to be even more custom-tailored towards patients and giving them the best care possible,” she says.

    Another 6-A graduate student, Spencer Compton, was also considering aiding professionals to make more informed decisions, for use in settings including health care, but he was tackling it from a causal perspective. When given a set of related variables, Compton was investigating if there was a way to determine not just correlation, but the cause-and-effect relationship between them (the direction of the interaction) from the data alone. For this, he and his collaborators from IBM Research and Purdue University turned to a field of math called information theory. With the goal of designing an algorithm to learn complex networks of causal relationships, Compton used ideas relating to entropy, the randomness in a system, to help determine if a causal relationship is present and how variables might be interacting. “When judging an explanation, people often default to Occam’s razor” says Compton. “We’re more inclined to believe a simpler explanation than a more complex one.” In many cases, he says, it seemed to perform well. For instance, they were able to consider variables such as lung cancer, pollution, and X-ray findings. He was pleased that his research allowed him to help create a framework of “entropic causal inference” that could aid in safe and smart decisions in the future, in a satisfying way. “The math is really surprisingly deep, interesting, and complex,” says Compton. “We’re basically asking, ‘when is the simplest explanation correct?’ but as a math question.”

    Determining relationships within data can sometimes require large volumes of it to suss out patterns, but for data that may contain sensitive information, this may not be available. For her master’s work, Ivy Huang worked with IBM Research to generate synthetic tabular data using a natural language processing tool called a transformer model, which can learn and predict future values from past values. Trained on real data, the model can produce new data with similar patterns, properties, and relationships without restrictions like privacy, availability, and access that might come with real data in financial transactions and electronic medical records. Further, she created an API and deployed the model in an IBM cluster, which allowed users increased access to the model and abilities to query it without compromising the original data.

    Working with the advanced prototyping team, MEng candidate Brandon Perez also considered how to gather and investigate data with restrictions, but in his case it was to use computer vision frameworks, centered on an action recognition model, to identify construction site happenings. The team based their work on the Moments in Time dataset, which contains over a million three-second video clips with about 300 attached classification labels, and has performed well during AI training. However, the group needed more construction-based video data. For this, they used YouTube-8M. Perez built a framework for testing and fine-tuning existing object detection models and action recognition models that could plug into an automatic spatial and temporal localization tool — how they would identify and label particular actions in a video timeline. “I was satisfied that I was able to explore what made me curious, and I was grateful for the autonomy that I was given with this project,” says Perez. “I felt like I was always supported, and my mentor was a great support to the project.”

    “The kind of collaborations that we have seen between our MEng students and IBM researchers are exactly what the 6-A MEng Thesis program at MIT is all about,” says Tomas Palacios, professor of electrical engineering and faculty director of the MIT 6-A MEng Thesis program. “For more than 100 years, 6-A has been connecting MIT students with industry to solve together some of the most important problems in the world.” More

  • in

    Q&A: Cathy Wu on developing algorithms to safely integrate robots into our world

    Cathy Wu is the Gilbert W. Winslow Assistant Professor of Civil and Environmental Engineering and a member of the MIT Institute for Data, Systems, and Society. As an undergraduate, Wu won MIT’s toughest robotics competition, and as a graduate student took the University of California at Berkeley’s first-ever course on deep reinforcement learning. Now back at MIT, she’s working to improve the flow of robots in Amazon warehouses under the Science Hub, a new collaboration between the tech giant and the MIT Schwarzman College of Computing. Outside of the lab and classroom, Wu can be found running, drawing, pouring lattes at home, and watching YouTube videos on math and infrastructure via 3Blue1Brown and Practical Engineering. She recently took a break from all of that to talk about her work.

    Q: What put you on the path to robotics and self-driving cars?

    A: My parents always wanted a doctor in the family. However, I’m bad at following instructions and became the wrong kind of doctor! Inspired by my physics and computer science classes in high school, I decided to study engineering. I wanted to help as many people as a medical doctor could.

    At MIT, I looked for applications in energy, education, and agriculture, but the self-driving car was the first to grab me. It has yet to let go! Ninety-four percent of serious car crashes are caused by human error and could potentially be prevented by self-driving cars. Autonomous vehicles could also ease traffic congestion, save energy, and improve mobility.

    I first learned about self-driving cars from Seth Teller during his guest lecture for the course Mobile Autonomous Systems Lab (MASLAB), in which MIT undergraduates compete to build the best full-functioning robot from scratch. Our ball-fetching bot, Putzputz, won first place. From there, I took more classes in machine learning, computer vision, and transportation, and joined Teller’s lab. I also competed in several mobility-related hackathons, including one sponsored by Hubway, now known as Blue Bike.

    Q: You’ve explored ways to help humans and autonomous vehicles interact more smoothly. What makes this problem so hard?

    A: Both systems are highly complex, and our classical modeling tools are woefully insufficient. Integrating autonomous vehicles into our existing mobility systems is a huge undertaking. For example, we don’t know whether autonomous vehicles will cut energy use by 40 percent, or double it. We need more powerful tools to cut through the uncertainty. My PhD thesis at Berkeley tried to do this. I developed scalable optimization methods in the areas of robot control, state estimation, and system design. These methods could help decision-makers anticipate future scenarios and design better systems to accommodate both humans and robots.

    Q: How is deep reinforcement learning, combining deep and reinforcement learning algorithms, changing robotics?

    A: I took John Schulman and Pieter Abbeel’s reinforcement learning class at Berkeley in 2015 shortly after Deepmind published their breakthrough paper in Nature. They had trained an agent via deep learning and reinforcement learning to play “Space Invaders” and a suite of Atari games at superhuman levels. That created quite some buzz. A year later, I started to incorporate reinforcement learning into problems involving mixed traffic systems, in which only some cars are automated. I realized that classical control techniques couldn’t handle the complex nonlinear control problems I was formulating.

    Deep RL is now mainstream but it’s by no means pervasive in robotics, which still relies heavily on classical model-based control and planning methods. Deep learning continues to be important for processing raw sensor data like camera images and radio waves, and reinforcement learning is gradually being incorporated. I see traffic systems as gigantic multi-robot systems. I’m excited for an upcoming collaboration with Utah’s Department of Transportation to apply reinforcement learning to coordinate cars with traffic signals, reducing congestion and thus carbon emissions.

    Q: You’ve talked about the MIT course, 6.007 (Signals and Systems), and its impact on you. What about it spoke to you?

    A: The mindset. That problems that look messy can be analyzed with common, and sometimes simple, tools. Signals are transformed by systems in various ways, but what do these abstract terms mean, anyway? A mechanical system can take a signal like gears turning at some speed and transform it into a lever turning at another speed. A digital system can take binary digits and turn them into other binary digits or a string of letters or an image. Financial systems can take news and transform it via millions of trading decisions into stock prices. People take in signals every day through advertisements, job offers, gossip, and so on, and translate them into actions that in turn influence society and other people. This humble class on signals and systems linked mechanical, digital, and societal systems and showed me how foundational tools can cut through the noise.

    Q: In your project with Amazon you’re training warehouse robots to pick up, sort, and deliver goods. What are the technical challenges?

    A: This project involves assigning robots to a given task and routing them there. [Professor] Cynthia Barnhart’s team is focused on task assignment, and mine, on path planning. Both problems are considered combinatorial optimization problems because the solution involves a combination of choices. As the number of tasks and robots increases, the number of possible solutions grows exponentially. It’s called the curse of dimensionality. Both problems are what we call NP Hard; there may not be an efficient algorithm to solve them. Our goal is to devise a shortcut.

    Routing a single robot for a single task isn’t difficult. It’s like using Google Maps to find the shortest path home. It can be solved efficiently with several algorithms, including Dijkstra’s. But warehouses resemble small cities with hundreds of robots. When traffic jams occur, customers can’t get their packages as quickly. Our goal is to develop algorithms that find the most efficient paths for all of the robots.

    Q: Are there other applications?

    A: Yes. The algorithms we test in Amazon warehouses might one day help to ease congestion in real cities. Other potential applications include controlling planes on runways, swarms of drones in the air, and even characters in video games. These algorithms could also be used for other robotic planning tasks like scheduling and routing.

    Q: AI is evolving rapidly. Where do you hope to see the big breakthroughs coming?

    A: I’d like to see deep learning and deep RL used to solve societal problems involving mobility, infrastructure, social media, health care, and education. Deep RL now has a toehold in robotics and industrial applications like chip design, but we still need to be careful in applying it to systems with humans in the loop. Ultimately, we want to design systems for people. Currently, we simply don’t have the right tools.

    Q: What worries you most about AI taking on more and more specialized tasks?

    A: AI has the potential for tremendous good, but it could also help to accelerate the widening gap between the haves and the have-nots. Our political and regulatory systems could help to integrate AI into society and minimize job losses and income inequality, but I worry that they’re not equipped yet to handle the firehose of AI.

    Q: What’s the last great book you read?

    A: “How to Avoid a Climate Disaster,” by Bill Gates. I absolutely loved the way that Gates was able to take an overwhelmingly complex topic and distill it down into words that everyone can understand. His optimism inspires me to keep pushing on applications of AI and robotics to help avoid a climate disaster. More

  • in

    Nonsense can make sense to machine-learning models

    For all that neural networks can accomplish, we still don’t really understand how they operate. Sure, we can program them to learn, but making sense of a machine’s decision-making process remains much like a fancy puzzle with a dizzying, complex pattern where plenty of integral pieces have yet to be fitted. 

    If a model was trying to classify an image of said puzzle, for example, it could encounter well-known, but annoying adversarial attacks, or even more run-of-the-mill data or processing issues. But a new, more subtle type of failure recently identified by MIT scientists is another cause for concern: “overinterpretation,” where algorithms make confident predictions based on details that don’t make sense to humans, like random patterns or image borders. 

    This could be particularly worrisome for high-stakes environments, like split-second decisions for self-driving cars, and medical diagnostics for diseases that need more immediate attention. Autonomous vehicles in particular rely heavily on systems that can accurately understand surroundings and then make quick, safe decisions. The network used specific backgrounds, edges, or particular patterns of the sky to classify traffic lights and street signs — irrespective of what else was in the image. 

    The team found that neural networks trained on popular datasets like CIFAR-10 and ImageNet suffered from overinterpretation. Models trained on CIFAR-10, for example, made confident predictions even when 95 percent of input images were missing, and the remainder is senseless to humans. 

    “Overinterpretation is a dataset problem that’s caused by these nonsensical signals in datasets. Not only are these high-confidence images unrecognizable, but they contain less than 10 percent of the original image in unimportant areas, such as borders. We found that these images were meaningless to humans, yet models can still classify them with high confidence,” says Brandon Carter, MIT Computer Science and Artificial Intelligence Laboratory PhD student and lead author on a paper about the research. 

    Deep-image classifiers are widely used. In addition to medical diagnosis and boosting autonomous vehicle technology, there are use cases in security, gaming, and even an app that tells you if something is or isn’t a hot dog, because sometimes we need reassurance. The tech in discussion works by processing individual pixels from tons of pre-labeled images for the network to “learn.” 

    Image classification is hard, because machine-learning models have the ability to latch onto these nonsensical subtle signals. Then, when image classifiers are trained on datasets such as ImageNet, they can make seemingly reliable predictions based on those signals. 

    Although these nonsensical signals can lead to model fragility in the real world, the signals are actually valid in the datasets, meaning overinterpretation can’t be diagnosed using typical evaluation methods based on that accuracy. 

    To find the rationale for the model’s prediction on a particular input, the methods in the present study start with the full image and repeatedly ask, what can I remove from this image? Essentially, it keeps covering up the image, until you’re left with the smallest piece that still makes a confident decision. 

    To that end, it could also be possible to use these methods as a type of validation criteria. For example, if you have an autonomously driving car that uses a trained machine-learning method for recognizing stop signs, you could test that method by identifying the smallest input subset that constitutes a stop sign. If that consists of a tree branch, a particular time of day, or something that’s not a stop sign, you could be concerned that the car might come to a stop at a place it’s not supposed to.

    While it may seem that the model is the likely culprit here, the datasets are more likely to blame. “There’s the question of how we can modify the datasets in a way that would enable models to be trained to more closely mimic how a human would think about classifying images and therefore, hopefully, generalize better in these real-world scenarios, like autonomous driving and medical diagnosis, so that the models don’t have this nonsensical behavior,” says Carter. 

    This may mean creating datasets in more controlled environments. Currently, it’s just pictures that are extracted from public domains that are then classified. But if you want to do object identification, for example, it might be necessary to train models with objects with an uninformative background. 

    This work was supported by Schmidt Futures and the National Institutes of Health. Carter wrote the paper alongside Siddhartha Jain and Jonas Mueller, scientists at Amazon, and MIT Professor David Gifford. They are presenting the work at the 2021 Conference on Neural Information Processing Systems. More

  • in

    Design’s new frontier

    In the 1960s, the advent of computer-aided design (CAD) sparked a revolution in design. For his PhD thesis in 1963, MIT Professor Ivan Sutherland developed Sketchpad, a game-changing software program that enabled users to draw, move, and resize shapes on a computer. Over the course of the next few decades, CAD software reshaped how everything from consumer products to buildings and airplanes were designed.

    “CAD was part of the first wave in computing in design. The ability of researchers and practitioners to represent and model designs using computers was a major breakthrough and still is one of the biggest outcomes of design research, in my opinion,” says Maria Yang, Gail E. Kendall Professor and director of MIT’s Ideation Lab.

    Innovations in 3D printing during the 1980s and 1990s expanded CAD’s capabilities beyond traditional injection molding and casting methods, providing designers even more flexibility. Designers could sketch, ideate, and develop prototypes or models faster and more efficiently. Meanwhile, with the push of a button, software like that developed by Professor Emeritus David Gossard of MIT’s CAD Lab could solve equations simultaneously to produce a new geometry on the fly.

    In recent years, mechanical engineers have expanded the computing tools they use to ideate, design, and prototype. More sophisticated algorithms and the explosion of machine learning and artificial intelligence technologies have sparked a second revolution in design engineering.

    Researchers and faculty at MIT’s Department of Mechanical Engineering are utilizing these technologies to re-imagine how the products, systems, and infrastructures we use are designed. These researchers are at the forefront of the new frontier in design.

    Computational design

    Faez Ahmed wants to reinvent the wheel, or at least the bicycle wheel. He and his team at MIT’s Design Computation & Digital Engineering Lab (DeCoDE) use an artificial intelligence-driven design method that can generate entirely novel and improved designs for a range of products — including the traditional bicycle. They create advanced computational methods to blend human-driven design with simulation-based design.

    “The focus of our DeCoDE lab is computational design. We are looking at how we can create machine learning and AI algorithms to help us discover new designs that are optimized based on specific performance parameters,” says Ahmed, an assistant professor of mechanical engineering at MIT.

    For their work using AI-driven design for bicycles, Ahmed and his collaborator Professor Daniel Frey wanted to make it easier to design customizable bicycles, and by extension, encourage more people to use bicycles over transportation methods that emit greenhouse gases.

    To start, the group gathered a dataset of 4,500 bicycle designs. Using this massive dataset, they tested the limits of what machine learning could do. First, they developed algorithms to group bicycles that looked similar together and explore the design space. They then created machine learning models that could successfully predict what components are key in identifying a bicycle style, such as a road bike versus a mountain bike.

    Once the algorithms were good enough at identifying bicycle designs and parts, the team proposed novel machine learning tools that could use this data to create a unique and creative design for a bicycle based on certain performance parameters and rider dimensions.

    Ahmed used a generative adversarial network — or GAN — as the basis of this model. GAN models utilize neural networks that can create new designs based on vast amounts of data. However, using GAN models alone would result in homogeneous designs that lack novelty and can’t be assessed in terms of performance. To address these issues in design problems, Ahmed has developed a new method which he calls “PaDGAN,” performance augmented diverse GAN.

    “When we apply this type of model, what we see is that we can get large improvements in the diversity, quality, as well as novelty of the designs,” Ahmed explains.

    Using this approach, Ahmed’s team developed an open-source computational design tool for bicycles freely available on their lab website. They hope to further develop a set of generalizable tools that can be used across industries and products.

    Longer term, Ahmed has his sights set on loftier goals. He hopes the computational design tools he develops could lead to “design democratization,” putting more power in the hands of the end user.

    “With these algorithms, you can have more individualization where the algorithm assists a customer in understanding their needs and helps them create a product that satisfies their exact requirements,” he adds.

    Using algorithms to democratize the design process is a goal shared by Stefanie Mueller, an associate professor in electrical engineering and computer science and mechanical engineering.

    Personal fabrication

    Platforms like Instagram give users the freedom to instantly edit their photographs or videos using filters. In one click, users can alter the palette, tone, and brightness of their content by applying filters that range from bold colors to sepia-toned or black-and-white. Mueller, X-Window Consortium Career Development Professor, wants to bring this concept of the Instagram filter to the physical world.

    “We want to explore how digital capabilities can be applied to tangible objects. Our goal is to bring reprogrammable appearance to the physical world,” explains Mueller, director of the HCI Engineering Group based out of MIT’s Computer Science and Artificial Intelligence Laboratory.

    Mueller’s team utilizes a combination of smart materials, optics, and computation to advance personal fabrication technologies that would allow end users to alter the design and appearance of the products they own. They tested this concept in a project they dubbed “Photo-Chromeleon.”

    First, a mix of photochromic cyan, magenta, and yellow dies are airbrushed onto an object — in this instance, a 3D sculpture of a chameleon. Using software they developed, the team sketches the exact color pattern they want to achieve on the object itself. An ultraviolet light shines on the object to activate the dyes.

    To actually create the physical pattern on the object, Mueller has developed an optimization algorithm to use alongside a normal office projector outfitted with red, green, and blue LED lights. These lights shine on specific pixels on the object for a given period of time to physically change the makeup of the photochromic pigments.

    “This fancy algorithm tells us exactly how long we have to shine the red, green, and blue light on every single pixel of an object to get the exact pattern we’ve programmed in our software,” says Mueller.

    Giving this freedom to the end user enables limitless possibilities. Mueller’s team has applied this technology to iPhone cases, shoes, and even cars. In the case of shoes, Mueller envisions a shoebox embedded with UV and LED light projectors. Users could put their shoes in the box overnight and the next day have a pair of shoes in a completely new pattern.

    Mueller wants to expand her personal fabrication methods to the clothes we wear. Rather than utilize the light projection technique developed in the PhotoChromeleon project, her team is exploring the possibility of weaving LEDs directly into clothing fibers, allowing people to change their shirt’s appearance as they wear it. These personal fabrication technologies could completely alter consumer habits.

    “It’s very interesting for me to think about how these computational techniques will change product design on a high level,” adds Mueller. “In the future, a consumer could buy a blank iPhone case and update the design on a weekly or daily basis.”

    Computational fluid dynamics and participatory design

    Another team of mechanical engineers, including Sili Deng, the Brit (1961) & Alex (1949) d’Arbeloff Career Development Professor, are developing a different kind of design tool that could have a large impact on individuals in low- and middle-income countries across the world.

    As Deng walked down the hallway of Building 1 on MIT’s campus, a monitor playing a video caught her eye. The video featured work done by mechanical engineers and MIT D-Lab on developing cleaner burning briquettes for cookstoves in Uganda. Deng immediately knew she wanted to get involved.

    “As a combustion scientist, I’ve always wanted to work on such a tangible real-world problem, but the field of combustion tends to focus more heavily on the academic side of things,” explains Deng.

    After reaching out to colleagues in MIT D-Lab, Deng joined a collaborative effort to develop a new cookstove design tool for the 3 billion people across the world who burn solid fuels to cook and heat their homes. These stoves often emit soot and carbon monoxide, leading not only to millions of deaths each year, but also worsening the world’s greenhouse gas emission problem.

    The team is taking a three-pronged approach to developing this solution, using a combination of participatory design, physical modeling, and experimental validation to create a tool that will lead to the production of high-performing, low-cost energy products.

    Deng and her team in the Deng Energy and Nanotechnology Group use physics-based modeling for the combustion and emission process in cookstoves.

    “My team is focused on computational fluid dynamics. We use computational and numerical studies to understand the flow field where the fuel is burned and releases heat,” says Deng.

    These flow mechanics are crucial to understanding how to minimize heat loss and make cookstoves more efficient, as well as learning how dangerous pollutants are formed and released in the process.

    Using computational methods, Deng’s team performs three-dimensional simulations of the complex chemistry and transport coupling at play in the combustion and emission processes. They then use these simulations to build a combustion model for how fuel is burned and a pollution model that predicts carbon monoxide emissions.

    Deng’s models are used by a group led by Daniel Sweeney in MIT D-Lab to test the experimental validation in prototypes of stoves. Finally, Professor Maria Yang uses participatory design methods to integrate user feedback, ensuring the design tool can actually be used by people across the world.

    The end goal for this collaborative team is to not only provide local manufacturers with a prototype they could produce themselves, but to also provide them with a tool that can tweak the design based on local needs and available materials.

    Deng sees wide-ranging applications for the computational fluid dynamics her team is developing.

    “We see an opportunity to use physics-based modeling, augmented with a machine learning approach, to come up with chemical models for practical fuels that help us better understand combustion. Therefore, we can design new methods to minimize carbon emissions,” she adds.

    While Deng is utilizing simulations and machine learning at the molecular level to improve designs, others are taking a more macro approach.

    Designing intelligent systems

    When it comes to intelligent design, Navid Azizan thinks big. He hopes to help create future intelligent systems that are capable of making decisions autonomously by using the enormous amounts of data emerging from the physical world. From smart robots and autonomous vehicles to smart power grids and smart cities, Azizan focuses on the analysis, design, and control of intelligent systems.

    Achieving such massive feats takes a truly interdisciplinary approach that draws upon various fields such as machine learning, dynamical systems, control, optimization, statistics, and network science, among others.

    “Developing intelligent systems is a multifaceted problem, and it really requires a confluence of disciplines,” says Azizan, assistant professor of mechanical engineering with a dual appointment in MIT’s Institute for Data, Systems, and Society (IDSS). “To create such systems, we need to go beyond standard approaches to machine learning, such as those commonly used in computer vision, and devise algorithms that can enable safe, efficient, real-time decision-making for physical systems.”

    For robot control to work in the complex dynamic environments that arise in the real world, real-time adaptation is key. If, for example, an autonomous vehicle is going to drive in icy conditions or a drone is operating in windy conditions, they need to be able to adapt to their new environment quickly.

    To address this challenge, Azizan and his collaborators at MIT and Stanford University have developed a new algorithm that combines adaptive control, a powerful methodology from control theory, with meta learning, a new machine learning paradigm.

    “This ‘control-oriented’ learning approach outperforms the existing ‘regression-oriented’ methods, which are mostly focused on just fitting the data, by a wide margin,” says Azizan.

    Another critical aspect of deploying machine learning algorithms in physical systems that Azizan and his team hope to address is safety. Deep neural networks are a crucial part of autonomous systems. They are used for interpreting complex visual inputs and making data-driven predictions of future behavior in real time. However, Azizan urges caution.

    “These deep neural networks are only as good as their training data, and their predictions can often be untrustworthy in scenarios not covered by their training data,” he says. Making decisions based on such untrustworthy predictions could lead to fatal accidents in autonomous vehicles or other safety-critical systems.

    To avoid these potentially catastrophic events, Azizan proposes that it is imperative to equip neural networks with a measure of their uncertainty. When the uncertainty is high, they can then be switched to a “safe policy.”

    In pursuit of this goal, Azizan and his collaborators have developed a new algorithm known as SCOD — Sketching Curvature of Out-of-Distribution Detection. This framework could be embedded within any deep neural network to equip them with a measure of their uncertainty.

    “This algorithm is model-agnostic and can be applied to neural networks used in various kinds of autonomous systems, whether it’s drones, vehicles, or robots,” says Azizan.

    Azizan hopes to continue working on algorithms for even larger-scale systems. He and his team are designing efficient algorithms to better control supply and demand in smart energy grids. According to Azizan, even if we create the most efficient solar panels and batteries, we can never achieve a sustainable grid powered by renewable resources without the right control mechanisms.

    Mechanical engineers like Ahmed, Mueller, Deng, and Azizan serve as the key to realizing the next revolution of computing in design.

    “MechE is in a unique position at the intersection of the computational and physical worlds,” Azizan says. “Mechanical engineers build a bridge between theoretical, algorithmic tools and real, physical world applications.”

    Sophisticated computational tools, coupled with the ground truth mechanical engineers have in the physical world, could unlock limitless possibilities for design engineering, well beyond what could have been imagined in those early days of CAD. More