More stories

  • in

    Researchers release open-source photorealistic simulator for autonomous driving

    Hyper-realistic virtual worlds have been heralded as the best driving schools for autonomous vehicles (AVs), since they’ve proven fruitful test beds for safely trying out dangerous driving scenarios. Tesla, Waymo, and other self-driving companies all rely heavily on data to enable expensive and proprietary photorealistic simulators, since testing and gathering nuanced I-almost-crashed data usually isn’t the most easy or desirable to recreate. 

    To that end, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being open-sourced to the public. 

    “Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says MIT Professor and CSAIL Director Daniela Rus, senior author on a paper about the research. 

    Play video

    VISTA is a data-driven, photorealistic simulator for autonomous driving. It can simulate not just live video but LiDAR data and event cameras, and also incorporate other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found below.

    VISTA 2.0 builds off of the team’s previous model, VISTA, and it’s fundamentally different from existing AV simulators since it’s data-driven — meaning it was built and photorealistically rendered from real-world data — thereby enabling direct transfer to reality. While the initial iteration supported only single car lane-following with one camera sensor, achieving high-fidelity data-driven simulation required rethinking the foundations of how different sensors and behavioral interactions can be synthesized. 

    Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massively interactive scenarios and intersections at scale. With much less data than previous models, the team was able to train autonomous vehicles that could be substantially more robust than those trained on large amounts of real-world data. 

    “This is a massive jump in capabilities of data-driven simulation for autonomous vehicles, as well as the increase of scale and ability to handle greater driving complexity,” says Alexander Amini, CSAIL PhD student and co-lead author on two new papers, together with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high dimensional 3D lidars with millions of points, irregularly timed event-based cameras, and even interactive and dynamic scenarios with other vehicles as well.” 

    The team was able to scale the complexity of the interactive driving tasks for things like overtaking, following, and negotiating, including multiagent scenarios in highly photorealistic environments. 

    Training AI models for autonomous vehicles involves hard-to-secure fodder of different varieties of edge cases and strange, dangerous scenarios, because most of our data (thankfully) is just run-of-the-mill, day-to-day driving. Logically, we can’t just crash into other cars just to teach a neural network how to not crash into other cars.

    Recently, there’s been a shift away from more classic, human-designed simulation environments to those built up from real-world data. The latter have immense photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question has emerged: Can the richness and complexity of all of the sensors that autonomous vehicles need, such as lidar and event-based cameras that are more sparse, accurately be synthesized? 

    Lidar sensor data is much harder to interpret in a data-driven world — you’re effectively trying to generate brand-new 3D point clouds with millions of points, only from sparse views of the world. To synthesize 3D lidar point clouds, the team used the data that the car collected, projected it into a 3D space coming from the lidar data, and then let a new virtual vehicle drive around locally from where that original vehicle was. Finally, they projected all of that sensory information back into the frame of view of this new virtual vehicle, with the help of neural networks. 

    Together with the simulation of event-based cameras, which operate at speeds greater than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but also doing so all in real time — making it possible to train neural nets offline, but also test online on the car in augmented reality setups for safe evaluations. “The question of if multisensor simulation at this scale of complexity and photorealism was possible in the realm of data-driven simulation was very much an open question,” says Amini. 

    With that, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and just drop in brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following, and more dicey scenarios like static and dynamic overtaking (seeing obstacles and moving around so you don’t collide). With the multi-agency, both real and simulated agents interact, and new agents can be dropped into the scene and controlled any which way. 

    Taking their full-scale car out into the “wild” — a.k.a. Devens, Massachusetts — the team saw  immediate transferability of results, with both failures and successes. They were also able to demonstrate the bodacious, magic word of self-driving car models: “robust.” They showed that AVs, trained entirely in VISTA 2.0, were so robust in the real world that they could handle that elusive tail of challenging failures. 

    Now, one guardrail humans rely on that can’t yet be simulated is human emotion. It’s the friendly wave, nod, or blinker switch of acknowledgement, which are the type of nuances the team wants to implement in future work. 

    “The central algorithm of this research is how we can take a dataset and build a completely synthetic world for learning and autonomy,” says Amini. “It’s a platform that I believe one day could extend in many different axes across robotics. Not just autonomous driving, but many areas that rely on vision and complex behaviors. We’re excited to release VISTA 2.0 to help enable the community to collect their own datasets and convert them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds, and then can directly transfer them to full-sized, real self-driving cars.” 

    Amini and Wang wrote the paper alongside Zhijian Liu, MIT CSAIL PhD student; Igor Gilitschenski, assistant professor in computer science at the University of Toronto; Wilko Schwarting, AI research scientist and MIT CSAIL PhD ’20; Song Han, associate professor at MIT’s Department of Electrical Engineering and Computer Science; Sertac Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Rus, MIT professor and CSAIL director. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia. 

    This work was supported by the National Science Foundation and Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of the Drive AGX Pegasus. More

  • in

    Living better with algorithms

    Laboratory for Information and Decision Systems (LIDS) student Sarah Cen remembers the lecture that sent her down the track to an upstream question.

    At a talk on ethical artificial intelligence, the speaker brought up a variation on the famous trolley problem, which outlines a philosophical choice between two undesirable outcomes.

    The speaker’s scenario: Say a self-driving car is traveling down a narrow alley with an elderly woman walking on one side and a small child on the other, and no way to thread between both without a fatality. Who should the car hit?

    Then the speaker said: Let’s take a step back. Is this the question we should even be asking?

    That’s when things clicked for Cen. Instead of considering the point of impact, a self-driving car could have avoided choosing between two bad outcomes by making a decision earlier on — the speaker pointed out that, when entering the alley, the car could have determined that the space was narrow and slowed to a speed that would keep everyone safe.

    Recognizing that today’s AI safety approaches often resemble the trolley problem, focusing on downstream regulation such as liability after someone is left with no good choices, Cen wondered: What if we could design better upstream and downstream safeguards to such problems? This question has informed much of Cen’s work.

    “Engineering systems are not divorced from the social systems on which they intervene,” Cen says. Ignoring this fact risks creating tools that fail to be useful when deployed or, more worryingly, that are harmful.

    Cen arrived at LIDS in 2018 via a slightly roundabout route. She first got a taste for research during her undergraduate degree at Princeton University, where she majored in mechanical engineering. For her master’s degree, she changed course, working on radar solutions in mobile robotics (primarily for self-driving cars) at Oxford University. There, she developed an interest in AI algorithms, curious about when and why they misbehave. So, she came to MIT and LIDS for her doctoral research, working with Professor Devavrat Shah in the Department of Electrical Engineering and Computer Science, for a stronger theoretical grounding in information systems.

    Auditing social media algorithms

    Together with Shah and other collaborators, Cen has worked on a wide range of projects during her time at LIDS, many of which tie directly to her interest in the interactions between humans and computational systems. In one such project, Cen studies options for regulating social media. Her recent work provides a method for translating human-readable regulations into implementable audits.

    To get a sense of what this means, suppose that regulators require that any public health content — for example, on vaccines — not be vastly different for politically left- and right-leaning users. How should auditors check that a social media platform complies with this regulation? Can a platform be made to comply with the regulation without damaging its bottom line? And how does compliance affect the actual content that users do see?

    Designing an auditing procedure is difficult in large part because there are so many stakeholders when it comes to social media. Auditors have to inspect the algorithm without accessing sensitive user data. They also have to work around tricky trade secrets, which can prevent them from getting a close look at the very algorithm that they are auditing because these algorithms are legally protected. Other considerations come into play as well, such as balancing the removal of misinformation with the protection of free speech.

    To meet these challenges, Cen and Shah developed an auditing procedure that does not need more than black-box access to the social media algorithm (which respects trade secrets), does not remove content (which avoids issues of censorship), and does not require access to users (which preserves users’ privacy).

    In their design process, the team also analyzed the properties of their auditing procedure, finding that it ensures a desirable property they call decision robustness. As good news for the platform, they show that a platform can pass the audit without sacrificing profits. Interestingly, they also found the audit naturally incentivizes the platform to show users diverse content, which is known to help reduce the spread of misinformation, counteract echo chambers, and more.

    Who gets good outcomes and who gets bad ones?

    In another line of research, Cen looks at whether people can receive good long-term outcomes when they not only compete for resources, but also don’t know upfront what resources are best for them.

    Some platforms, such as job-search platforms or ride-sharing apps, are part of what is called a matching market, which uses an algorithm to match one set of individuals (such as workers or riders) with another (such as employers or drivers). In many cases, individuals have matching preferences that they learn through trial and error. In labor markets, for example, workers learn their preferences about what kinds of jobs they want, and employers learn their preferences about the qualifications they seek from workers.

    But learning can be disrupted by competition. If workers with a particular background are repeatedly denied jobs in tech because of high competition for tech jobs, for instance, they may never get the knowledge they need to make an informed decision about whether they want to work in tech. Similarly, tech employers may never see and learn what these workers could do if they were hired.

    Cen’s work examines this interaction between learning and competition, studying whether it is possible for individuals on both sides of the matching market to walk away happy.

    Modeling such matching markets, Cen and Shah found that it is indeed possible to get to a stable outcome (workers aren’t incentivized to leave the matching market), with low regret (workers are happy with their long-term outcomes), fairness (happiness is evenly distributed), and high social welfare.

    Interestingly, it’s not obvious that it’s possible to get stability, low regret, fairness, and high social welfare simultaneously.  So another important aspect of the research was uncovering when it is possible to achieve all four criteria at once and exploring the implications of those conditions.

    What is the effect of X on Y?

    For the next few years, though, Cen plans to work on a new project, studying how to quantify the effect of an action X on an outcome Y when it’s expensive — or impossible — to measure this effect, focusing in particular on systems that have complex social behaviors.

    For instance, when Covid-19 cases surged in the pandemic, many cities had to decide what restrictions to adopt, such as mask mandates, business closures, or stay-home orders. They had to act fast and balance public health with community and business needs, public spending, and a host of other considerations.

    Typically, in order to estimate the effect of restrictions on the rate of infection, one might compare the rates of infection in areas that underwent different interventions. If one county has a mask mandate while its neighboring county does not, one might think comparing the counties’ infection rates would reveal the effectiveness of mask mandates. 

    But of course, no county exists in a vacuum. If, for instance, people from both counties gather to watch a football game in the maskless county every week, people from both counties mix. These complex interactions matter, and Sarah plans to study questions of cause and effect in such settings.

    “We’re interested in how decisions or interventions affect an outcome of interest, such as how criminal justice reform affects incarceration rates or how an ad campaign might change the public’s behaviors,” Cen says.

    Cen has also applied the principles of promoting inclusivity to her work in the MIT community.

    As one of three co-presidents of the Graduate Women in MIT EECS student group, she helped organize the inaugural GW6 research summit featuring the research of women graduate students — not only to showcase positive role models to students, but also to highlight the many successful graduate women at MIT who are not to be underestimated.

    Whether in computing or in the community, a system taking steps to address bias is one that enjoys legitimacy and trust, Cen says. “Accountability, legitimacy, trust — these principles play crucial roles in society and, ultimately, will determine which systems endure with time.”  More

  • in

    On the road to cleaner, greener, and faster driving

    No one likes sitting at a red light. But signalized intersections aren’t just a minor nuisance for drivers; vehicles consume fuel and emit greenhouse gases while waiting for the light to change.

    What if motorists could time their trips so they arrive at the intersection when the light is green? While that might be just a lucky break for a human driver, it could be achieved more consistently by an autonomous vehicle that uses artificial intelligence to control its speed.

    In a new study, MIT researchers demonstrate a machine-learning approach that can learn to control a fleet of autonomous vehicles as they approach and travel through a signalized intersection in a way that keeps traffic flowing smoothly.

    Using simulations, they found that their approach reduces fuel consumption and emissions while improving average vehicle speed. The technique gets the best results if all cars on the road are autonomous, but even if only 25 percent use their control algorithm, it still leads to substantial fuel and emissions benefits.

    “This is a really interesting place to intervene. No one’s life is better because they were stuck at an intersection. With a lot of other climate change interventions, there is a quality-of-life difference that is expected, so there is a barrier to entry there. Here, the barrier is much lower,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in the Department of Civil and Environmental Engineering and a member of the Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS).

    The lead author of the study is Vindula Jayawardana, a graduate student in LIDS and the Department of Electrical Engineering and Computer Science. The research will be presented at the European Control Conference.

    Intersection intricacies

    While humans may drive past a green light without giving it much thought, intersections can present billions of different scenarios depending on the number of lanes, how the signals operate, the number of vehicles and their speeds, the presence of pedestrians and cyclists, etc.

    Typical approaches for tackling intersection control problems use mathematical models to solve one simple, ideal intersection. That looks good on paper, but likely won’t hold up in the real world, where traffic patterns are often about as messy as they come.

    Wu and Jayawardana shifted gears and approached the problem using a model-free technique known as deep reinforcement learning. Reinforcement learning is a trial-and-error method where the control algorithm learns to make a sequence of decisions. It is rewarded when it finds a good sequence. With deep reinforcement learning, the algorithm leverages assumptions learned by a neural network to find shortcuts to good sequences, even if there are billions of possibilities.

    This is useful for solving a long-horizon problem like this; the control algorithm must issue upwards of 500 acceleration instructions to a vehicle over an extended time period, Wu explains.

    “And we have to get the sequence right before we know that we have done a good job of mitigating emissions and getting to the intersection at a good speed,” she adds.

    But there’s an additional wrinkle. The researchers want the system to learn a strategy that reduces fuel consumption and limits the impact on travel time. These goals can be conflicting.

    “To reduce travel time, we want the car to go fast, but to reduce emissions, we want the car to slow down or not move at all. Those competing rewards can be very confusing to the learning agent,” Wu says.

    While it is challenging to solve this problem in its full generality, the researchers employed a workaround using a technique known as reward shaping. With reward shaping, they give the system some domain knowledge it is unable to learn on its own. In this case, they penalized the system whenever the vehicle came to a complete stop, so it would learn to avoid that action.

    Traffic tests

    Once they developed an effective control algorithm, they evaluated it using a traffic simulation platform with a single intersection. The control algorithm is applied to a fleet of connected autonomous vehicles, which can communicate with upcoming traffic lights to receive signal phase and timing information and observe their immediate surroundings. The control algorithm tells each vehicle how to accelerate and decelerate.

    Their system didn’t create any stop-and-go traffic as vehicles approached the intersection. (Stop-and-go traffic occurs when cars are forced to come to a complete stop due to stopped traffic ahead). In simulations, more cars made it through in a single green phase, which outperformed a model that simulates human drivers. When compared to other optimization methods also designed to avoid stop-and-go traffic, their technique resulted in larger fuel consumption and emissions reductions. If every vehicle on the road is autonomous, their control system can reduce fuel consumption by 18 percent and carbon dioxide emissions by 25 percent, while boosting travel speeds by 20 percent.

    “A single intervention having 20 to 25 percent reduction in fuel or emissions is really incredible. But what I find interesting, and was really hoping to see, is this non-linear scaling. If we only control 25 percent of vehicles, that gives us 50 percent of the benefits in terms of fuel and emissions reduction. That means we don’t have to wait until we get to 100 percent autonomous vehicles to get benefits from this approach,” she says.

    Down the road, the researchers want to study interaction effects between multiple intersections. They also plan to explore how different intersection set-ups (number of lanes, signals, timings, etc.) can influence travel time, emissions, and fuel consumption. In addition, they intend to study how their control system could impact safety when autonomous vehicles and human drivers share the road. For instance, even though autonomous vehicles may drive differently than human drivers, slower roadways and roadways with more consistent speeds could improve safety, Wu says.

    While this work is still in its early stages, Wu sees this approach as one that could be more feasibly implemented in the near-term.

    “The aim in this work is to move the needle in sustainable mobility. We want to dream, as well, but these systems are big monsters of inertia. Identifying points of intervention that are small changes to the system but have significant impact is something that gets me up in the morning,” she says.  

    This work was supported, in part, by the MIT-IBM Watson AI Lab. More

  • in

    Q&A: Cathy Wu on developing algorithms to safely integrate robots into our world

    Cathy Wu is the Gilbert W. Winslow Assistant Professor of Civil and Environmental Engineering and a member of the MIT Institute for Data, Systems, and Society. As an undergraduate, Wu won MIT’s toughest robotics competition, and as a graduate student took the University of California at Berkeley’s first-ever course on deep reinforcement learning. Now back at MIT, she’s working to improve the flow of robots in Amazon warehouses under the Science Hub, a new collaboration between the tech giant and the MIT Schwarzman College of Computing. Outside of the lab and classroom, Wu can be found running, drawing, pouring lattes at home, and watching YouTube videos on math and infrastructure via 3Blue1Brown and Practical Engineering. She recently took a break from all of that to talk about her work.

    Q: What put you on the path to robotics and self-driving cars?

    A: My parents always wanted a doctor in the family. However, I’m bad at following instructions and became the wrong kind of doctor! Inspired by my physics and computer science classes in high school, I decided to study engineering. I wanted to help as many people as a medical doctor could.

    At MIT, I looked for applications in energy, education, and agriculture, but the self-driving car was the first to grab me. It has yet to let go! Ninety-four percent of serious car crashes are caused by human error and could potentially be prevented by self-driving cars. Autonomous vehicles could also ease traffic congestion, save energy, and improve mobility.

    I first learned about self-driving cars from Seth Teller during his guest lecture for the course Mobile Autonomous Systems Lab (MASLAB), in which MIT undergraduates compete to build the best full-functioning robot from scratch. Our ball-fetching bot, Putzputz, won first place. From there, I took more classes in machine learning, computer vision, and transportation, and joined Teller’s lab. I also competed in several mobility-related hackathons, including one sponsored by Hubway, now known as Blue Bike.

    Q: You’ve explored ways to help humans and autonomous vehicles interact more smoothly. What makes this problem so hard?

    A: Both systems are highly complex, and our classical modeling tools are woefully insufficient. Integrating autonomous vehicles into our existing mobility systems is a huge undertaking. For example, we don’t know whether autonomous vehicles will cut energy use by 40 percent, or double it. We need more powerful tools to cut through the uncertainty. My PhD thesis at Berkeley tried to do this. I developed scalable optimization methods in the areas of robot control, state estimation, and system design. These methods could help decision-makers anticipate future scenarios and design better systems to accommodate both humans and robots.

    Q: How is deep reinforcement learning, combining deep and reinforcement learning algorithms, changing robotics?

    A: I took John Schulman and Pieter Abbeel’s reinforcement learning class at Berkeley in 2015 shortly after Deepmind published their breakthrough paper in Nature. They had trained an agent via deep learning and reinforcement learning to play “Space Invaders” and a suite of Atari games at superhuman levels. That created quite some buzz. A year later, I started to incorporate reinforcement learning into problems involving mixed traffic systems, in which only some cars are automated. I realized that classical control techniques couldn’t handle the complex nonlinear control problems I was formulating.

    Deep RL is now mainstream but it’s by no means pervasive in robotics, which still relies heavily on classical model-based control and planning methods. Deep learning continues to be important for processing raw sensor data like camera images and radio waves, and reinforcement learning is gradually being incorporated. I see traffic systems as gigantic multi-robot systems. I’m excited for an upcoming collaboration with Utah’s Department of Transportation to apply reinforcement learning to coordinate cars with traffic signals, reducing congestion and thus carbon emissions.

    Q: You’ve talked about the MIT course, 6.007 (Signals and Systems), and its impact on you. What about it spoke to you?

    A: The mindset. That problems that look messy can be analyzed with common, and sometimes simple, tools. Signals are transformed by systems in various ways, but what do these abstract terms mean, anyway? A mechanical system can take a signal like gears turning at some speed and transform it into a lever turning at another speed. A digital system can take binary digits and turn them into other binary digits or a string of letters or an image. Financial systems can take news and transform it via millions of trading decisions into stock prices. People take in signals every day through advertisements, job offers, gossip, and so on, and translate them into actions that in turn influence society and other people. This humble class on signals and systems linked mechanical, digital, and societal systems and showed me how foundational tools can cut through the noise.

    Q: In your project with Amazon you’re training warehouse robots to pick up, sort, and deliver goods. What are the technical challenges?

    A: This project involves assigning robots to a given task and routing them there. [Professor] Cynthia Barnhart’s team is focused on task assignment, and mine, on path planning. Both problems are considered combinatorial optimization problems because the solution involves a combination of choices. As the number of tasks and robots increases, the number of possible solutions grows exponentially. It’s called the curse of dimensionality. Both problems are what we call NP Hard; there may not be an efficient algorithm to solve them. Our goal is to devise a shortcut.

    Routing a single robot for a single task isn’t difficult. It’s like using Google Maps to find the shortest path home. It can be solved efficiently with several algorithms, including Dijkstra’s. But warehouses resemble small cities with hundreds of robots. When traffic jams occur, customers can’t get their packages as quickly. Our goal is to develop algorithms that find the most efficient paths for all of the robots.

    Q: Are there other applications?

    A: Yes. The algorithms we test in Amazon warehouses might one day help to ease congestion in real cities. Other potential applications include controlling planes on runways, swarms of drones in the air, and even characters in video games. These algorithms could also be used for other robotic planning tasks like scheduling and routing.

    Q: AI is evolving rapidly. Where do you hope to see the big breakthroughs coming?

    A: I’d like to see deep learning and deep RL used to solve societal problems involving mobility, infrastructure, social media, health care, and education. Deep RL now has a toehold in robotics and industrial applications like chip design, but we still need to be careful in applying it to systems with humans in the loop. Ultimately, we want to design systems for people. Currently, we simply don’t have the right tools.

    Q: What worries you most about AI taking on more and more specialized tasks?

    A: AI has the potential for tremendous good, but it could also help to accelerate the widening gap between the haves and the have-nots. Our political and regulatory systems could help to integrate AI into society and minimize job losses and income inequality, but I worry that they’re not equipped yet to handle the firehose of AI.

    Q: What’s the last great book you read?

    A: “How to Avoid a Climate Disaster,” by Bill Gates. I absolutely loved the way that Gates was able to take an overwhelmingly complex topic and distill it down into words that everyone can understand. His optimism inspires me to keep pushing on applications of AI and robotics to help avoid a climate disaster. More

  • in

    Nonsense can make sense to machine-learning models

    For all that neural networks can accomplish, we still don’t really understand how they operate. Sure, we can program them to learn, but making sense of a machine’s decision-making process remains much like a fancy puzzle with a dizzying, complex pattern where plenty of integral pieces have yet to be fitted. 

    If a model was trying to classify an image of said puzzle, for example, it could encounter well-known, but annoying adversarial attacks, or even more run-of-the-mill data or processing issues. But a new, more subtle type of failure recently identified by MIT scientists is another cause for concern: “overinterpretation,” where algorithms make confident predictions based on details that don’t make sense to humans, like random patterns or image borders. 

    This could be particularly worrisome for high-stakes environments, like split-second decisions for self-driving cars, and medical diagnostics for diseases that need more immediate attention. Autonomous vehicles in particular rely heavily on systems that can accurately understand surroundings and then make quick, safe decisions. The network used specific backgrounds, edges, or particular patterns of the sky to classify traffic lights and street signs — irrespective of what else was in the image. 

    The team found that neural networks trained on popular datasets like CIFAR-10 and ImageNet suffered from overinterpretation. Models trained on CIFAR-10, for example, made confident predictions even when 95 percent of input images were missing, and the remainder is senseless to humans. 

    “Overinterpretation is a dataset problem that’s caused by these nonsensical signals in datasets. Not only are these high-confidence images unrecognizable, but they contain less than 10 percent of the original image in unimportant areas, such as borders. We found that these images were meaningless to humans, yet models can still classify them with high confidence,” says Brandon Carter, MIT Computer Science and Artificial Intelligence Laboratory PhD student and lead author on a paper about the research. 

    Deep-image classifiers are widely used. In addition to medical diagnosis and boosting autonomous vehicle technology, there are use cases in security, gaming, and even an app that tells you if something is or isn’t a hot dog, because sometimes we need reassurance. The tech in discussion works by processing individual pixels from tons of pre-labeled images for the network to “learn.” 

    Image classification is hard, because machine-learning models have the ability to latch onto these nonsensical subtle signals. Then, when image classifiers are trained on datasets such as ImageNet, they can make seemingly reliable predictions based on those signals. 

    Although these nonsensical signals can lead to model fragility in the real world, the signals are actually valid in the datasets, meaning overinterpretation can’t be diagnosed using typical evaluation methods based on that accuracy. 

    To find the rationale for the model’s prediction on a particular input, the methods in the present study start with the full image and repeatedly ask, what can I remove from this image? Essentially, it keeps covering up the image, until you’re left with the smallest piece that still makes a confident decision. 

    To that end, it could also be possible to use these methods as a type of validation criteria. For example, if you have an autonomously driving car that uses a trained machine-learning method for recognizing stop signs, you could test that method by identifying the smallest input subset that constitutes a stop sign. If that consists of a tree branch, a particular time of day, or something that’s not a stop sign, you could be concerned that the car might come to a stop at a place it’s not supposed to.

    While it may seem that the model is the likely culprit here, the datasets are more likely to blame. “There’s the question of how we can modify the datasets in a way that would enable models to be trained to more closely mimic how a human would think about classifying images and therefore, hopefully, generalize better in these real-world scenarios, like autonomous driving and medical diagnosis, so that the models don’t have this nonsensical behavior,” says Carter. 

    This may mean creating datasets in more controlled environments. Currently, it’s just pictures that are extracted from public domains that are then classified. But if you want to do object identification, for example, it might be necessary to train models with objects with an uninformative background. 

    This work was supported by Schmidt Futures and the National Institutes of Health. Carter wrote the paper alongside Siddhartha Jain and Jonas Mueller, scientists at Amazon, and MIT Professor David Gifford. They are presenting the work at the 2021 Conference on Neural Information Processing Systems. More

  • in

    Design’s new frontier

    In the 1960s, the advent of computer-aided design (CAD) sparked a revolution in design. For his PhD thesis in 1963, MIT Professor Ivan Sutherland developed Sketchpad, a game-changing software program that enabled users to draw, move, and resize shapes on a computer. Over the course of the next few decades, CAD software reshaped how everything from consumer products to buildings and airplanes were designed.

    “CAD was part of the first wave in computing in design. The ability of researchers and practitioners to represent and model designs using computers was a major breakthrough and still is one of the biggest outcomes of design research, in my opinion,” says Maria Yang, Gail E. Kendall Professor and director of MIT’s Ideation Lab.

    Innovations in 3D printing during the 1980s and 1990s expanded CAD’s capabilities beyond traditional injection molding and casting methods, providing designers even more flexibility. Designers could sketch, ideate, and develop prototypes or models faster and more efficiently. Meanwhile, with the push of a button, software like that developed by Professor Emeritus David Gossard of MIT’s CAD Lab could solve equations simultaneously to produce a new geometry on the fly.

    In recent years, mechanical engineers have expanded the computing tools they use to ideate, design, and prototype. More sophisticated algorithms and the explosion of machine learning and artificial intelligence technologies have sparked a second revolution in design engineering.

    Researchers and faculty at MIT’s Department of Mechanical Engineering are utilizing these technologies to re-imagine how the products, systems, and infrastructures we use are designed. These researchers are at the forefront of the new frontier in design.

    Computational design

    Faez Ahmed wants to reinvent the wheel, or at least the bicycle wheel. He and his team at MIT’s Design Computation & Digital Engineering Lab (DeCoDE) use an artificial intelligence-driven design method that can generate entirely novel and improved designs for a range of products — including the traditional bicycle. They create advanced computational methods to blend human-driven design with simulation-based design.

    “The focus of our DeCoDE lab is computational design. We are looking at how we can create machine learning and AI algorithms to help us discover new designs that are optimized based on specific performance parameters,” says Ahmed, an assistant professor of mechanical engineering at MIT.

    For their work using AI-driven design for bicycles, Ahmed and his collaborator Professor Daniel Frey wanted to make it easier to design customizable bicycles, and by extension, encourage more people to use bicycles over transportation methods that emit greenhouse gases.

    To start, the group gathered a dataset of 4,500 bicycle designs. Using this massive dataset, they tested the limits of what machine learning could do. First, they developed algorithms to group bicycles that looked similar together and explore the design space. They then created machine learning models that could successfully predict what components are key in identifying a bicycle style, such as a road bike versus a mountain bike.

    Once the algorithms were good enough at identifying bicycle designs and parts, the team proposed novel machine learning tools that could use this data to create a unique and creative design for a bicycle based on certain performance parameters and rider dimensions.

    Ahmed used a generative adversarial network — or GAN — as the basis of this model. GAN models utilize neural networks that can create new designs based on vast amounts of data. However, using GAN models alone would result in homogeneous designs that lack novelty and can’t be assessed in terms of performance. To address these issues in design problems, Ahmed has developed a new method which he calls “PaDGAN,” performance augmented diverse GAN.

    “When we apply this type of model, what we see is that we can get large improvements in the diversity, quality, as well as novelty of the designs,” Ahmed explains.

    Using this approach, Ahmed’s team developed an open-source computational design tool for bicycles freely available on their lab website. They hope to further develop a set of generalizable tools that can be used across industries and products.

    Longer term, Ahmed has his sights set on loftier goals. He hopes the computational design tools he develops could lead to “design democratization,” putting more power in the hands of the end user.

    “With these algorithms, you can have more individualization where the algorithm assists a customer in understanding their needs and helps them create a product that satisfies their exact requirements,” he adds.

    Using algorithms to democratize the design process is a goal shared by Stefanie Mueller, an associate professor in electrical engineering and computer science and mechanical engineering.

    Personal fabrication

    Platforms like Instagram give users the freedom to instantly edit their photographs or videos using filters. In one click, users can alter the palette, tone, and brightness of their content by applying filters that range from bold colors to sepia-toned or black-and-white. Mueller, X-Window Consortium Career Development Professor, wants to bring this concept of the Instagram filter to the physical world.

    “We want to explore how digital capabilities can be applied to tangible objects. Our goal is to bring reprogrammable appearance to the physical world,” explains Mueller, director of the HCI Engineering Group based out of MIT’s Computer Science and Artificial Intelligence Laboratory.

    Mueller’s team utilizes a combination of smart materials, optics, and computation to advance personal fabrication technologies that would allow end users to alter the design and appearance of the products they own. They tested this concept in a project they dubbed “Photo-Chromeleon.”

    First, a mix of photochromic cyan, magenta, and yellow dies are airbrushed onto an object — in this instance, a 3D sculpture of a chameleon. Using software they developed, the team sketches the exact color pattern they want to achieve on the object itself. An ultraviolet light shines on the object to activate the dyes.

    To actually create the physical pattern on the object, Mueller has developed an optimization algorithm to use alongside a normal office projector outfitted with red, green, and blue LED lights. These lights shine on specific pixels on the object for a given period of time to physically change the makeup of the photochromic pigments.

    “This fancy algorithm tells us exactly how long we have to shine the red, green, and blue light on every single pixel of an object to get the exact pattern we’ve programmed in our software,” says Mueller.

    Giving this freedom to the end user enables limitless possibilities. Mueller’s team has applied this technology to iPhone cases, shoes, and even cars. In the case of shoes, Mueller envisions a shoebox embedded with UV and LED light projectors. Users could put their shoes in the box overnight and the next day have a pair of shoes in a completely new pattern.

    Mueller wants to expand her personal fabrication methods to the clothes we wear. Rather than utilize the light projection technique developed in the PhotoChromeleon project, her team is exploring the possibility of weaving LEDs directly into clothing fibers, allowing people to change their shirt’s appearance as they wear it. These personal fabrication technologies could completely alter consumer habits.

    “It’s very interesting for me to think about how these computational techniques will change product design on a high level,” adds Mueller. “In the future, a consumer could buy a blank iPhone case and update the design on a weekly or daily basis.”

    Computational fluid dynamics and participatory design

    Another team of mechanical engineers, including Sili Deng, the Brit (1961) & Alex (1949) d’Arbeloff Career Development Professor, are developing a different kind of design tool that could have a large impact on individuals in low- and middle-income countries across the world.

    As Deng walked down the hallway of Building 1 on MIT’s campus, a monitor playing a video caught her eye. The video featured work done by mechanical engineers and MIT D-Lab on developing cleaner burning briquettes for cookstoves in Uganda. Deng immediately knew she wanted to get involved.

    “As a combustion scientist, I’ve always wanted to work on such a tangible real-world problem, but the field of combustion tends to focus more heavily on the academic side of things,” explains Deng.

    After reaching out to colleagues in MIT D-Lab, Deng joined a collaborative effort to develop a new cookstove design tool for the 3 billion people across the world who burn solid fuels to cook and heat their homes. These stoves often emit soot and carbon monoxide, leading not only to millions of deaths each year, but also worsening the world’s greenhouse gas emission problem.

    The team is taking a three-pronged approach to developing this solution, using a combination of participatory design, physical modeling, and experimental validation to create a tool that will lead to the production of high-performing, low-cost energy products.

    Deng and her team in the Deng Energy and Nanotechnology Group use physics-based modeling for the combustion and emission process in cookstoves.

    “My team is focused on computational fluid dynamics. We use computational and numerical studies to understand the flow field where the fuel is burned and releases heat,” says Deng.

    These flow mechanics are crucial to understanding how to minimize heat loss and make cookstoves more efficient, as well as learning how dangerous pollutants are formed and released in the process.

    Using computational methods, Deng’s team performs three-dimensional simulations of the complex chemistry and transport coupling at play in the combustion and emission processes. They then use these simulations to build a combustion model for how fuel is burned and a pollution model that predicts carbon monoxide emissions.

    Deng’s models are used by a group led by Daniel Sweeney in MIT D-Lab to test the experimental validation in prototypes of stoves. Finally, Professor Maria Yang uses participatory design methods to integrate user feedback, ensuring the design tool can actually be used by people across the world.

    The end goal for this collaborative team is to not only provide local manufacturers with a prototype they could produce themselves, but to also provide them with a tool that can tweak the design based on local needs and available materials.

    Deng sees wide-ranging applications for the computational fluid dynamics her team is developing.

    “We see an opportunity to use physics-based modeling, augmented with a machine learning approach, to come up with chemical models for practical fuels that help us better understand combustion. Therefore, we can design new methods to minimize carbon emissions,” she adds.

    While Deng is utilizing simulations and machine learning at the molecular level to improve designs, others are taking a more macro approach.

    Designing intelligent systems

    When it comes to intelligent design, Navid Azizan thinks big. He hopes to help create future intelligent systems that are capable of making decisions autonomously by using the enormous amounts of data emerging from the physical world. From smart robots and autonomous vehicles to smart power grids and smart cities, Azizan focuses on the analysis, design, and control of intelligent systems.

    Achieving such massive feats takes a truly interdisciplinary approach that draws upon various fields such as machine learning, dynamical systems, control, optimization, statistics, and network science, among others.

    “Developing intelligent systems is a multifaceted problem, and it really requires a confluence of disciplines,” says Azizan, assistant professor of mechanical engineering with a dual appointment in MIT’s Institute for Data, Systems, and Society (IDSS). “To create such systems, we need to go beyond standard approaches to machine learning, such as those commonly used in computer vision, and devise algorithms that can enable safe, efficient, real-time decision-making for physical systems.”

    For robot control to work in the complex dynamic environments that arise in the real world, real-time adaptation is key. If, for example, an autonomous vehicle is going to drive in icy conditions or a drone is operating in windy conditions, they need to be able to adapt to their new environment quickly.

    To address this challenge, Azizan and his collaborators at MIT and Stanford University have developed a new algorithm that combines adaptive control, a powerful methodology from control theory, with meta learning, a new machine learning paradigm.

    “This ‘control-oriented’ learning approach outperforms the existing ‘regression-oriented’ methods, which are mostly focused on just fitting the data, by a wide margin,” says Azizan.

    Another critical aspect of deploying machine learning algorithms in physical systems that Azizan and his team hope to address is safety. Deep neural networks are a crucial part of autonomous systems. They are used for interpreting complex visual inputs and making data-driven predictions of future behavior in real time. However, Azizan urges caution.

    “These deep neural networks are only as good as their training data, and their predictions can often be untrustworthy in scenarios not covered by their training data,” he says. Making decisions based on such untrustworthy predictions could lead to fatal accidents in autonomous vehicles or other safety-critical systems.

    To avoid these potentially catastrophic events, Azizan proposes that it is imperative to equip neural networks with a measure of their uncertainty. When the uncertainty is high, they can then be switched to a “safe policy.”

    In pursuit of this goal, Azizan and his collaborators have developed a new algorithm known as SCOD — Sketching Curvature of Out-of-Distribution Detection. This framework could be embedded within any deep neural network to equip them with a measure of their uncertainty.

    “This algorithm is model-agnostic and can be applied to neural networks used in various kinds of autonomous systems, whether it’s drones, vehicles, or robots,” says Azizan.

    Azizan hopes to continue working on algorithms for even larger-scale systems. He and his team are designing efficient algorithms to better control supply and demand in smart energy grids. According to Azizan, even if we create the most efficient solar panels and batteries, we can never achieve a sustainable grid powered by renewable resources without the right control mechanisms.

    Mechanical engineers like Ahmed, Mueller, Deng, and Azizan serve as the key to realizing the next revolution of computing in design.

    “MechE is in a unique position at the intersection of the computational and physical worlds,” Azizan says. “Mechanical engineers build a bridge between theoretical, algorithmic tools and real, physical world applications.”

    Sophisticated computational tools, coupled with the ground truth mechanical engineers have in the physical world, could unlock limitless possibilities for design engineering, well beyond what could have been imagined in those early days of CAD. More

  • in

    One autonomous taxi, please

    If you don’t get seasick, an autonomous boat might be the right mode of transportation for you. 

    Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Senseable City Laboratory, together with Amsterdam Institute for Advanced Metropolitan Solutions (AMS Institute) in the Netherlands, have now created the final project in their self-navigating trilogy: a full-scale, fully autonomous robotic boat that’s ready to be deployed along the canals of Amsterdam. 

    “Roboat” has come a long way since the team first started prototyping small vessels in the MIT pool in late 2015. Last year, the team released their half-scale, medium model that was 2 meters long and demonstrated promising navigational prowess. 

    This year, two full-scale Roboats were launched, proving more than just proof-of-concept: these craft can comfortably carry up to five people, collect waste, deliver goods, and provide on-demand infrastructure. 

    The boat looks futuristic — it’s a sleek combination of black and gray with two seats that face each other, with orange block letters on the sides that illustrate the makers’ namesakes. It’s a fully electrical boat with a battery that’s the size of a small chest, enabling up to 10 hours of operation and wireless charging capabilities. 

    Play video

    Autonomous Roboats set sea in the Amsterdam canals and can comfortably carry up to five people, collect waste, deliver goods, and provide on-demand infrastructure.

    “We now have higher precision and robustness in the perception, navigation, and control systems, including new functions, such as close-proximity approach mode for latching capabilities, and improved dynamic positioning, so the boat can navigate real-world waters,” says Daniela Rus, MIT professor of electrical engineering and computer science and director of CSAIL. “Roboat’s control system is adaptive to the number of people in the boat.” 

    To swiftly navigate the bustling waters of Amsterdam, Roboat needs a meticulous fusion of proper navigation, perception, and control software. 

    Using GPS, the boat autonomously decides on a safe route from A to B, while continuously scanning the environment to  avoid collisions with objects, such as bridges, pillars, and other boats.

    To autonomously determine a free path and avoid crashing into objects, Roboat uses lidar and a number of cameras to enable a 360-degree view. This bundle of sensors is referred to as the “perception kit” and lets Roboat understand its surroundings. When the perception picks up an unseen object, like a canoe, for example, the algorithm flags the item as “unknown.” When the team later looks at the collected data from the day, the object is manually selected and can be tagged as “canoe.” 

    The control algorithms — similar to ones used for self-driving cars — function a little like a coxswain giving orders to rowers, by translating a given path into instructions toward the “thrusters,” which are the propellers that help the boat move.  

    If you think the boat feels slightly futuristic, its latching mechanism is one of its most impressive feats: small cameras on the boat guide it to the docking station, or other boats, when they detect specific QR codes. “The system allows Roboat to connect to other boats, and to the docking station, to form temporary bridges to alleviate traffic, as well as floating stages and squares, which wasn’t possible with the last iteration,” says Carlo Ratti, professor of the practice in the MIT Department of Urban Studies and Planning (DUSP) and director of the Senseable City Lab. 

    Roboat, by design, is also versatile. The team created a universal “hull” design — that’s the part of the boat that rides both in and on top of the water. While regular boats have unique hulls, designed for specific purposes, Roboat has a universal hull design where the base is the same, but the top decks can be switched out depending on the use case.

    “As Roboat can perform its tasks 24/7, and without a skipper on board, it adds great value for a city. However, for safety reasons it is questionable if reaching level A autonomy is desirable,” says Fabio Duarte, a principal research scientist in DUSP and lead scientist on the project. “Just like a bridge keeper, an onshore operator will monitor Roboat remotely from a control center. One operator can monitor over 50 Roboat units, ensuring smooth operations.”

    The next step for Roboat is to pilot the technology in the public domain. “The historic center of Amsterdam is the perfect place to start, with its capillary network of canals suffering from contemporary challenges, such as mobility and logistics,” says Stephan van Dijk, director of innovation at AMS Institute. 

    Previous iterations of Roboat have been presented at the IEEE International Conference on Robotics and Automation. The boats will be unveiled on Oct. 28 in the waters of Amsterdam. 

    Ratti, Rus, Duarte, and Dijk worked on the project alongside Andrew Whittle, MIT’s Edmund K Turner Professor in civil and environmental engineering; Dennis Frenchman, professor at MIT’s Department of Urban Studies and Planning; and Ynse Deinema of AMS Institute. The full team can be found at Roboat’s website. The project is a joint collaboration with AMS Institute. The City of Amsterdam is a project partner. More

  • in

    Deep learning helps predict traffic crashes before they happen

    Today’s world is one big maze, connected by layers of concrete and asphalt that afford us the luxury of navigation by vehicle. For many of our road-related advancements — GPS lets us fire fewer neurons thanks to map apps, cameras alert us to potentially costly scrapes and scratches, and electric autonomous cars have lower fuel costs — our safety measures haven’t quite caught up. We still rely on a steady diet of traffic signals, trust, and the steel surrounding us to safely get from point A to point B. 

    To get ahead of the uncertainty inherent to crashes, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Qatar Center for Artificial Intelligence developed a deep learning model that predicts very high-resolution crash risk maps. Fed on a combination of historical crash data, road maps, satellite imagery, and GPS traces, the risk maps describe the expected number of crashes over a period of time in the future, to identify high-risk areas and predict future crashes. 

    Typically, these types of risk maps are captured at much lower resolutions that hover around hundreds of meters, which means glossing over crucial details since the roads become blurred together. These maps, though, are 5×5 meter grid cells, and the higher resolution brings newfound clarity: The scientists found that a highway road, for example, has a higher risk than nearby residential roads, and ramps merging and exiting the highway have an even higher risk than other roads. 

    “By capturing the underlying risk distribution that determines the probability of future crashes at all places, and without any historical data, we can find safer routes, enable auto insurance companies to provide customized insurance plans based on driving trajectories of customers, help city planners design safer roads, and even predict future crashes,” says MIT CSAIL PhD student Songtao He, a lead author on a new paper about the research. 

    Even though car crashes are sparse, they cost about 3 percent of the world’s GDP and are the leading cause of death in children and young adults. This sparsity makes inferring maps at such a high resolution a tricky task. Crashes at this level are thinly scattered — the average annual odds of a crash in a 5×5 grid cell is about one-in-1,000 — and they rarely happen at the same location twice. Previous attempts to predict crash risk have been largely “historical,” as an area would only be considered high-risk if there was a previous nearby crash. 

    The team’s approach casts a wider net to capture critical data. It identifies high-risk locations using GPS trajectory patterns, which give information about density, speed, and direction of traffic, and satellite imagery that describes road structures, such as the number of lanes, whether there’s a shoulder, or if there’s a large number of pedestrians. Then, even if a high-risk area has no recorded crashes, it can still be identified as high-risk, based on its traffic patterns and topology alone. 

    To evaluate the model, the scientists used crashes and data from 2017 and 2018, and tested its performance at predicting crashes in 2019 and 2020. Many locations were identified as high-risk, even though they had no recorded crashes, and also experienced crashes during the follow-up years.

    “Our model can generalize from one city to another by combining multiple clues from seemingly unrelated data sources. This is a step toward general AI, because our model can predict crash maps in uncharted territories,” says Amin Sadeghi, a lead scientist at Qatar Computing Research Institute (QCRI) and an author on the paper. “The model can be used to infer a useful crash map even in the absence of historical crash data, which could translate to positive use for city planning and policymaking by comparing imaginary scenarios.” 

    The dataset covered 7,500 square kilometers from Los Angeles, New York City, Chicago and Boston. Among the four cities, L.A. was the most unsafe, since it had the highest crash density, followed by New York City, Chicago, and Boston. 

    “If people can use the risk map to identify potentially high-risk road segments, they can take action in advance to reduce the risk of trips they take. Apps like Waze and Apple Maps have incident feature tools, but we’re trying to get ahead of the crashes — before they happen,” says He. 

    He and Sadeghi wrote the paper alongside Sanjay Chawla, research director at QCRI, and MIT professors of electrical engineering and computer science Mohammad Alizadeh, ​​Hari Balakrishnan, and Sam Madden. They will present the paper at the 2021 International Conference on Computer Vision. More