More stories

  • in

    Deep-learning technique predicts clinical treatment outcomes

    When it comes to treatment strategies for critically ill patients, clinicians want to be able to consider all their options and timing of administration, and make the optimal decision for their patients. While clinician experience and study has helped them to be successful in this effort, not all patients are the same, and treatment decisions at this crucial time could mean the difference between patient improvement and quick deterioration. Therefore, it would be helpful for doctors to be able to take a patient’s previous known health status and received treatments and use that to predict that patient’s health outcome under different treatment scenarios, in order to pick the best path.

    Now, a deep-learning technique, called G-Net, from researchers at MIT and IBM provides a window into causal counterfactual prediction, affording physicians the opportunity to explore how a patient might fare under different treatment plans. The foundation of G-Net is the g-computation algorithm, a causal inference method that estimates the effect of dynamic exposures in the presence of measured confounding variables — ones that may influence both treatments and outcomes. Unlike previous implementations of the g-computation framework, which have used linear modeling approaches, G-Net uses recurrent neural networks (RNN), which have node connections that allow them to better model temporal sequences with complex and nonlinear dynamics, like those found in the physiological and clinical time series data. In this way, physicians can develop alternative plans based on patient history and test them before making a decision.

    “Our ultimate goal is to develop a machine learning technique that would allow doctors to explore various ‘What if’ scenarios and treatment options,” says Li-wei Lehman, MIT research scientist in the MIT Institute for Medical Engineering and Science and an MIT-IBM Watson AI Lab project lead. “A lot of work has been done in terms of deep learning for counterfactual prediction but [it’s] been focusing on a point exposure setting,” or a static, time-varying treatment strategy, which doesn’t allow for adjustment of treatments as patient history changes. However, her team’s new prediction approach provides for treatment plan flexibility and chances for treatment alteration over time as patient covariate history and past treatments change. “G-Net is the first deep-learning approach based on g-computation that can predict both the population-level and individual-level treatment effects under dynamic and time varying treatment strategies.”

    The research, which was recently published in the Proceedings of Machine Learning Research, was co-authored by Rui Li MEng ’20, Stephanie Hu MEng ’21, former MIT postdoc Mingyu Lu MD, graduate student Yuria Utsumi, IBM research staff member Prithwish Chakraborty, IBM Research director of Hybrid Cloud Services Daby Sow, IBM data scientist Piyush Madan, IBM research scientist Mohamed Ghalwash, and IBM research scientist Zach Shahn.

    Tracking disease progression

    To build, validate, and test G-Net’s predictive abilities, the researchers considered the circulatory system in septic patients in the ICU. During critical care, doctors need to make trade-offs and judgement calls, such as ensuring the organs are receiving adequate blood supply without overworking the heart. For this, they could give intravenous fluids to patients to increase blood pressure; however, too much can cause edema. Alternatively, physicians can administer vasopressors, which act to contract blood vessels and raise blood pressure.

    In order to mimic this and demonstrate G-Net’s proof-of-concept, the team used CVSim, a mechanistic model of a human cardiovascular system that’s governed by 28 input variables characterizing the system’s current state, such as arterial pressure, central venous pressure, total blood volume, and total peripheral resistance, and modified it to simulate various disease processes (e.g., sepsis or blood loss) and effects of interventions (e.g., fluids and vasopressors). The researchers used CVSim to generate observational patient data for training and for “ground truth” comparison against counterfactual prediction. In their G-Net architecture, the researchers ran two RNNs to handle and predict variables that are continuous, meaning they can take on a range of values, like blood pressure, and categorical variables, which have discrete values, like the presence or absence of pulmonary edema. The researchers simulated the health trajectories of thousands of “patients” exhibiting symptoms under one treatment regime, let’s say A, for 66 timesteps, and used them to train and validate their model.

    Testing G-Net’s prediction capability, the team generated two counterfactual datasets. Each contained roughly 1,000 known patient health trajectories, which were created from CVSim using the same “patient” condition as the starting point under treatment A. Then at timestep 33, treatment changed to plan B or C, depending on the dataset. The team then performed 100 prediction trajectories for each of these 1,000 patients, whose treatment and medical history was known up until timestep 33 when a new treatment was administered. In these cases, the prediction agreed well with the “ground-truth” observations for individual patients and averaged population-level trajectories.

    A cut above the rest

    Since the g-computation framework is flexible, the researchers wanted to examine G-Net’s prediction using different nonlinear models — in this case, long short-term memory (LSTM) models, which are a type of RNN that can learn from previous data patterns or sequences — against the more classical linear models and a multilayer perception model (MLP), a type of neural network that can make predictions using a nonlinear approach. Following a similar setup as before, the team found that the error between the known and predicted cases was smallest in the LSTM models compared to the others. Since G-Net is able to model the temporal patterns of the patient’s ICU history and past treatment, whereas a linear model and MLP cannot, it was better able to predict the patient’s outcome.

    The team also compared G-Net’s prediction in a static, time-varying treatment setting against two state-of-the-art deep-learning based counterfactual prediction approaches, a recurrent marginal structural network (rMSN) and a counterfactual recurrent neural network (CRN), as well as a linear model and an MLP. For this, they investigated a model for tumor growth under no treatment, radiation, chemotherapy, and both radiation and chemotherapy scenarios. “Imagine a scenario where there’s a patient with cancer, and an example of a static regime would be if you only give a fixed dosage of chemotherapy, radiation, or any kind of drug, and wait until the end of your trajectory,” comments Lu. For these investigations, the researchers generated simulated observational data using tumor volume as the primary influence dictating treatment plans and demonstrated that G-Net outperformed the other models. One potential reason could be because g-computation is known to be more statistically efficient than rMSN and CRN, when models are correctly specified.

    While G-Net has done well with simulated data, more needs to be done before it can be applied to real patients. Since neural networks can be thought of as “black boxes” for prediction results, the researchers are beginning to investigate the uncertainty in the model to help ensure safety. In contrast to these approaches that recommend an “optimal” treatment plan without any clinician involvement, “as a decision support tool, I believe that G-Net would be more interpretable, since the clinicians would input treatment strategies themselves,” says Lehman, and “G-Net will allow them to be able to explore different hypotheses.” Further, the team has moved on to using real data from ICU patients with sepsis, bringing it one step closer to implementation in hospitals.

    “I think it is pretty important and exciting for real-world applications,” says Hu. “It’d be helpful to have some way to predict whether or not a treatment might work or what the effects might be — a quicker iteration process for developing these hypotheses for what to try, before actually trying to implement them in in a years-long, potentially very involved and very invasive type of clinical trial.”

    This research was funded by the MIT-IBM Watson AI Lab. More

  • in

    Can machine-learning models overcome biased datasets?

    Artificial intelligence systems may be able to complete tasks quickly, but that doesn’t mean they always do so fairly. If the datasets used to train machine-learning models contain biased data, it is likely the system could exhibit that same bias when it makes decisions in practice.

    For instance, if a dataset contains mostly images of white men, then a facial-recognition model trained with these data may be less accurate for women or people with different skin tones.

    A group of researchers at MIT, in collaboration with researchers at Harvard University and Fujitsu Ltd., sought to understand when and how a machine-learning model is capable of overcoming this kind of dataset bias. They used an approach from neuroscience to study how training data affects whether an artificial neural network can learn to recognize objects it has not seen before. A neural network is a machine-learning model that mimics the human brain in the way it contains layers of interconnected nodes, or “neurons,” that process data.

    The new results show that diversity in training data has a major influence on whether a neural network is able to overcome bias, but at the same time dataset diversity can degrade the network’s performance. They also show that how a neural network is trained, and the specific types of neurons that emerge during the training process, can play a major role in whether it is able to overcome a biased dataset.

    “A neural network can overcome dataset bias, which is encouraging. But the main takeaway here is that we need to take into account data diversity. We need to stop thinking that if you just collect a ton of raw data, that is going to get you somewhere. We need to be very careful about how we design datasets in the first place,” says Xavier Boix, a research scientist in the Department of Brain and Cognitive Sciences (BCS) and the Center for Brains, Minds, and Machines (CBMM), and senior author of the paper.  

    Co-authors include former MIT graduate students Timothy Henry, Jamell Dozier, Helen Ho, Nishchal Bhandari, and Spandan Madan, a corresponding author who is currently pursuing a PhD at Harvard; Tomotake Sasaki, a former visiting scientist now a senior researcher at Fujitsu Research; Frédo Durand, a professor of electrical engineering and computer science at MIT and a member of the Computer Science and Artificial Intelligence Laboratory; and Hanspeter Pfister, the An Wang Professor of Computer Science at the Harvard School of Enginering and Applied Sciences. The research appears today in Nature Machine Intelligence.

    Thinking like a neuroscientist

    Boix and his colleagues approached the problem of dataset bias by thinking like neuroscientists. In neuroscience, Boix explains, it is common to use controlled datasets in experiments, meaning a dataset in which the researchers know as much as possible about the information it contains.

    The team built datasets that contained images of different objects in varied poses, and carefully controlled the combinations so some datasets had more diversity than others. In this case, a dataset had less diversity if it contains more images that show objects from only one viewpoint. A more diverse dataset had more images showing objects from multiple viewpoints. Each dataset contained the same number of images.

    The researchers used these carefully constructed datasets to train a neural network for image classification, and then studied how well it was able to identify objects from viewpoints the network did not see during training (known as an out-of-distribution combination). 

    For example, if researchers are training a model to classify cars in images, they want the model to learn what different cars look like. But if every Ford Thunderbird in the training dataset is shown from the front, when the trained model is given an image of a Ford Thunderbird shot from the side, it may misclassify it, even if it was trained on millions of car photos.

    The researchers found that if the dataset is more diverse — if more images show objects from different viewpoints — the network is better able to generalize to new images or viewpoints. Data diversity is key to overcoming bias, Boix says.

    “But it is not like more data diversity is always better; there is a tension here. When the neural network gets better at recognizing new things it hasn’t seen, then it will become harder for it to recognize things it has already seen,” he says.

    Testing training methods

    The researchers also studied methods for training the neural network.

    In machine learning, it is common to train a network to perform multiple tasks at the same time. The idea is that if a relationship exists between the tasks, the network will learn to perform each one better if it learns them together.

    But the researchers found the opposite to be true — a model trained separately for each task was able to overcome bias far better than a model trained for both tasks together.

    “The results were really striking. In fact, the first time we did this experiment, we thought it was a bug. It took us several weeks to realize it was a real result because it was so unexpected,” he says.

    They dove deeper inside the neural networks to understand why this occurs.

    They found that neuron specialization seems to play a major role. When the neural network is trained to recognize objects in images, it appears that two types of neurons emerge — one that specializes in recognizing the object category and another that specializes in recognizing the viewpoint.

    When the network is trained to perform tasks separately, those specialized neurons are more prominent, Boix explains. But if a network is trained to do both tasks simultaneously, some neurons become diluted and don’t specialize for one task. These unspecialized neurons are more likely to get confused, he says.

    “But the next question now is, how did these neurons get there? You train the neural network and they emerge from the learning process. No one told the network to include these types of neurons in its architecture. That is the fascinating thing,” he says.

    That is one area the researchers hope to explore with future work. They want to see if they can force a neural network to develop neurons with this specialization. They also want to apply their approach to more complex tasks, such as objects with complicated textures or varied illuminations.

    Boix is encouraged that a neural network can learn to overcome bias, and he is hopeful their work can inspire others to be more thoughtful about the datasets they are using in AI applications.

    This work was supported, in part, by the National Science Foundation, a Google Faculty Research Award, the Toyota Research Institute, the Center for Brains, Minds, and Machines, Fujitsu Research, and the MIT-Sensetime Alliance on Artificial Intelligence. More

  • in

    The downside of machine learning in health care

    While working toward her dissertation in computer science at MIT, Marzyeh Ghassemi wrote several papers on how machine-learning techniques from artificial intelligence could be applied to clinical data in order to predict patient outcomes. “It wasn’t until the end of my PhD work that one of my committee members asked: ‘Did you ever check to see how well your model worked across different groups of people?’”

    That question was eye-opening for Ghassemi, who had previously assessed the performance of models in aggregate, across all patients. Upon a closer look, she saw that models often worked differently — specifically worse — for populations including Black women, a revelation that took her by surprise. “I hadn’t made the connection beforehand that health disparities would translate directly to model disparities,” she says. “And given that I am a visible minority woman-identifying computer scientist at MIT, I am reasonably certain that many others weren’t aware of this either.”

    In a paper published Jan. 14 in the journal Patterns, Ghassemi — who earned her doctorate in 2017 and is now an assistant professor in the Department of Electrical Engineering and Computer Science and the MIT Institute for Medical Engineering and Science (IMES) — and her coauthor, Elaine Okanyene Nsoesie of Boston University, offer a cautionary note about the prospects for AI in medicine. “If used carefully, this technology could improve performance in health care and potentially reduce inequities,” Ghassemi says. “But if we’re not actually careful, technology could worsen care.”

    It all comes down to data, given that the AI tools in question train themselves by processing and analyzing vast quantities of data. But the data they are given are produced by humans, who are fallible and whose judgments may be clouded by the fact that they interact differently with patients depending on their age, gender, and race, without even knowing it.

    Furthermore, there is still great uncertainty about medical conditions themselves. “Doctors trained at the same medical school for 10 years can, and often do, disagree about a patient’s diagnosis,” Ghassemi says. That’s different from the applications where existing machine-learning algorithms excel — like object-recognition tasks — because practically everyone in the world will agree that a dog is, in fact, a dog.

    Machine-learning algorithms have also fared well in mastering games like chess and Go, where both the rules and the “win conditions” are clearly defined. Physicians, however, don’t always concur on the rules for treating patients, and even the win condition of being “healthy” is not widely agreed upon. “Doctors know what it means to be sick,” Ghassemi explains, “and we have the most data for people when they are sickest. But we don’t get much data from people when they are healthy because they’re less likely to see doctors then.”

    Even mechanical devices can contribute to flawed data and disparities in treatment. Pulse oximeters, for example, which have been calibrated predominately on light-skinned individuals, do not accurately measure blood oxygen levels for people with darker skin. And these deficiencies are most acute when oxygen levels are low — precisely when accurate readings are most urgent. Similarly, women face increased risks during “metal-on-metal” hip replacements, Ghassemi and Nsoesie write, “due in part to anatomic differences that aren’t taken into account in implant design.” Facts like these could be buried within the data fed to computer models whose output will be undermined as a result.

    Coming from computers, the product of machine-learning algorithms offers “the sheen of objectivity,” according to Ghassemi. But that can be deceptive and dangerous, because it’s harder to ferret out the faulty data supplied en masse to a computer than it is to discount the recommendations of a single possibly inept (and maybe even racist) doctor. “The problem is not machine learning itself,” she insists. “It’s people. Human caregivers generate bad data sometimes because they are not perfect.”

    Nevertheless, she still believes that machine learning can offer benefits in health care in terms of more efficient and fairer recommendations and practices. One key to realizing the promise of machine learning in health care is to improve the quality of data, which is no easy task. “Imagine if we could take data from doctors that have the best performance and share that with other doctors that have less training and experience,” Ghassemi says. “We really need to collect this data and audit it.”

    The challenge here is that the collection of data is not incentivized or rewarded, she notes. “It’s not easy to get a grant for that, or ask students to spend time on it. And data providers might say, ‘Why should I give my data out for free when I can sell it to a company for millions?’ But researchers should be able to access data without having to deal with questions like: ‘What paper will I get my name on in exchange for giving you access to data that sits at my institution?’

    “The only way to get better health care is to get better data,” Ghassemi says, “and the only way to get better data is to incentivize its release.”

    It’s not only a question of collecting data. There’s also the matter of who will collect it and vet it. Ghassemi recommends assembling diverse groups of researchers — clinicians, statisticians, medical ethicists, and computer scientists — to first gather diverse patient data and then “focus on developing fair and equitable improvements in health care that can be deployed in not just one advanced medical setting, but in a wide range of medical settings.”

    The objective of the Patterns paper is not to discourage technologists from bringing their expertise in machine learning to the medical world, she says. “They just need to be cognizant of the gaps that appear in treatment and other complexities that ought to be considered before giving their stamp of approval to a particular computer model.” More

  • in

    When should someone trust an AI assistant’s predictions?

    In a busy hospital, a radiologist is using an artificial intelligence system to help her diagnose medical conditions based on patients’ X-ray images. Using the AI system can help her make faster diagnoses, but how does she know when to trust the AI’s predictions?

    She doesn’t. Instead, she may rely on her expertise, a confidence level provided by the system itself, or an explanation of how the algorithm made its prediction — which may look convincing but still be wrong — to make an estimation.

    To help people better understand when to trust an AI “teammate,” MIT researchers created an onboarding technique that guides humans to develop a more accurate understanding of those situations in which a machine makes correct predictions and those in which it makes incorrect predictions.

    By showing people how the AI complements their abilities, the training technique could help humans make better decisions or come to conclusions faster when working with AI agents.

    “We propose a teaching phase where we gradually introduce the human to this AI model so they can, for themselves, see its weaknesses and strengths,” says Hussein Mozannar, a graduate student in the Social and Engineering Systems doctoral program within the Institute for Data, Systems, and Society (IDSS) who is also a researcher with the Clinical Machine Learning Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Institute for Medical Engineering and Science. “We do this by mimicking the way the human will interact with the AI in practice, but we intervene to give them feedback to help them understand each interaction they are making with the AI.”

    Mozannar wrote the paper with Arvind Satyanarayan, an assistant professor of computer science who leads the Visualization Group in CSAIL; and senior author David Sontag, an associate professor of electrical engineering and computer science at MIT and leader of the Clinical Machine Learning Group. The research will be presented at the Association for the Advancement of Artificial Intelligence in February.

    Mental models

    This work focuses on the mental models humans build about others. If the radiologist is not sure about a case, she may ask a colleague who is an expert in a certain area. From past experience and her knowledge of this colleague, she has a mental model of his strengths and weaknesses that she uses to assess his advice.

    Humans build the same kinds of mental models when they interact with AI agents, so it is important those models are accurate, Mozannar says. Cognitive science suggests that humans make decisions for complex tasks by remembering past interactions and experiences. So, the researchers designed an onboarding process that provides representative examples of the human and AI working together, which serve as reference points the human can draw on in the future. They began by creating an algorithm that can identify examples that will best teach the human about the AI.

    “We first learn a human expert’s biases and strengths, using observations of their past decisions unguided by AI,” Mozannar says. “We combine our knowledge about the human with what we know about the AI to see where it will be helpful for the human to rely on the AI. Then we obtain cases where we know the human should rely on the AI and similar cases where the human should not rely on the AI.”

    The researchers tested their onboarding technique on a passage-based question answering task: The user receives a written passage and a question whose answer is contained in the passage. The user then has to answer the question and can click a button to “let the AI answer.” The user can’t see the AI answer in advance, however, requiring them to rely on their mental model of the AI. The onboarding process they developed begins by showing these examples to the user, who tries to make a prediction with the help of the AI system. The human may be right or wrong, and the AI may be right or wrong, but in either case, after solving the example, the user sees the correct answer and an explanation for why the AI chose its prediction. To help the user generalize from the example, two contrasting examples are shown that explain why the AI got it right or wrong.

    For instance, perhaps the training question asks which of two plants is native to more continents, based on a convoluted paragraph from a botany textbook. The human can answer on her own or let the AI system answer. Then, she sees two follow-up examples that help her get a better sense of the AI’s abilities. Perhaps the AI is wrong on a follow-up question about fruits but right on a question about geology. In each example, the words the system used to make its prediction are highlighted. Seeing the highlighted words helps the human understand the limits of the AI agent, explains Mozannar.

    To help the user retain what they have learned, the user then writes down the rule she infers from this teaching example, such as “This AI is not good at predicting flowers.” She can then refer to these rules later when working with the agent in practice. These rules also constitute a formalization of the user’s mental model of the AI.

    The impact of teaching

    The researchers tested this teaching technique with three groups of participants. One group went through the entire onboarding technique, another group did not receive the follow-up comparison examples, and the baseline group didn’t receive any teaching but could see the AI’s answer in advance.

    “The participants who received teaching did just as well as the participants who didn’t receive teaching but could see the AI’s answer. So, the conclusion there is they are able to simulate the AI’s answer as well as if they had seen it,” Mozannar says.

    The researchers dug deeper into the data to see the rules individual participants wrote. They found that almost 50 percent of the people who received training wrote accurate lessons of the AI’s abilities. Those who had accurate lessons were right on 63 percent of the examples, whereas those who didn’t have accurate lessons were right on 54 percent. And those who didn’t receive teaching but could see the AI answers were right on 57 percent of the questions.

    “When teaching is successful, it has a significant impact. That is the takeaway here. When we are able to teach participants effectively, they are able to do better than if you actually gave them the answer,” he says.

    But the results also show there is still a gap. Only 50 percent of those who were trained built accurate mental models of the AI, and even those who did were only right 63 percent of the time. Even though they learned accurate lessons, they didn’t always follow their own rules, Mozannar says.

    That is one question that leaves the researchers scratching their heads — even if people know the AI should be right, why won’t they listen to their own mental model? They want to explore this question in the future, as well as refine the onboarding process to reduce the amount of time it takes. They are also interested in running user studies with more complex AI models, particularly in health care settings.

    “When humans collaborate with other humans, we rely heavily on knowing what our collaborators’ strengths and weaknesses are — it helps us know when (and when not) to lean on the other person for assistance. I’m glad to see this research applying that principle to humans and AI,” says Carrie Cai, a staff research scientist in the People + AI Research and Responsible AI groups at Google, who was not involved with this research. “Teaching users about an AI’s strengths and weaknesses is essential to producing positive human-AI joint outcomes.” 

    This research was supported, in part, by the National Science Foundation. More

  • in

    Physics and the machine-learning “black box”

    Machine-learning algorithms are often referred to as a “black box.” Once data are put into an algorithm, it’s not always known exactly how the algorithm arrives at its prediction. This can be particularly frustrating when things go wrong. A new mechanical engineering (MechE) course at MIT teaches students how to tackle the “black box” problem, through a combination of data science and physics-based engineering.

    In class 2.C161 (Physical Systems Modeling and Design Using Machine Learning), Professor George Barbastathis demonstrates how mechanical engineers can use their unique knowledge of physical systems to keep algorithms in check and develop more accurate predictions.

    “I wanted to take 2.C161 because machine-learning models are usually a “black box,” but this class taught us how to construct a system model that is informed by physics so we can peek inside,” explains Crystal Owens, a mechanical engineering graduate student who took the course in spring 2021.

    As chair of the Committee on the Strategic Integration of Data Science into Mechanical Engineering, Barbastathis has had many conversations with mechanical engineering students, researchers, and faculty to better understand the challenges and successes they’ve had using machine learning in their work.

    “One comment we heard frequently was that these colleagues can see the value of data science methods for problems they are facing in their mechanical engineering-centric research; yet they are lacking the tools to make the most out of it,” says Barbastathis. “Mechanical, civil, electrical, and other types of engineers want a fundamental understanding of data principles without having to convert themselves to being full-time data scientists or AI researchers.”

    Additionally, as mechanical engineering students move on from MIT to their careers, many will need to manage data scientists on their teams someday. Barbastathis hopes to set these students up for success with class 2.C161.

    Bridging MechE and the MIT Schwartzman College of Computing

    Class 2.C161 is part of the MIT Schwartzman College of Computing “Computing Core.” The goal of these classes is to connect data science and physics-based engineering disciplines, like mechanical engineering. Students take the course alongside 6.C402 (Modeling with Machine Learning: from Algorithms to Applications), taught by professors of electrical engineering and computer science Regina Barzilay and Tommi Jaakkola.

    The two classes are taught concurrently during the semester, exposing students to both fundamentals in machine learning and domain-specific applications in mechanical engineering.

    In 2.C161, Barbastathis highlights how complementary physics-based engineering and data science are. Physical laws present a number of ambiguities and unknowns, ranging from temperature and humidity to electromagnetic forces. Data science can be used to predict these physical phenomena. Meanwhile, having an understanding of physical systems helps ensure the resulting output of an algorithm is accurate and explainable.

    “What’s needed is a deeper combined understanding of the associated physical phenomena and the principles of data science, machine learning in particular, to close the gap,” adds Barbastathis. “By combining data with physical principles, the new revolution in physics-based engineering is relatively immune to the “black box” problem facing other types of machine learning.”

    Equipped with a working knowledge of machine-learning topics covered in class 6.C402 and a deeper understanding of how to pair data science with physics, students are charged with developing a final project that solves for an actual physical system.

    Developing solutions for real-world physical systems

    For their final project, students in 2.C161 are asked to identify a real-world problem that requires data science to address the ambiguity inherent in physical systems. After obtaining all relevant data, students are asked to select a machine-learning method, implement their chosen solution, and present and critique the results.

    Topics this past semester ranged from weather forecasting to the flow of gas in combustion engines, with two student teams drawing inspiration from the ongoing Covid-19 pandemic.

    Owens and her teammates, fellow graduate students Arun Krishnadas and Joshua David John Rathinaraj, set out to develop a model for the Covid-19 vaccine rollout.

    “We developed a method of combining a neural network with a susceptible-infected-recovered (SIR) epidemiological model to create a physics-informed prediction system for the spread of Covid-19 after vaccinations started,” explains Owens.

    The team accounted for various unknowns including population mobility, weather, and political climate. This combined approach resulted in a prediction of Covid-19’s spread during the vaccine rollout that was more reliable than using either the SIR model or a neural network alone.

    Another team, including graduate student Yiwen Hu, developed a model to predict mutation rates in Covid-19, a topic that became all too pertinent as the delta variant began its global spread.

    “We used machine learning to predict the time-series-based mutation rate of Covid-19, and then incorporated that as an independent parameter into the prediction of pandemic dynamics to see if it could help us better predict the trend of the Covid-19 pandemic,” says Hu.

    Hu, who had previously conducted research into how vibrations on coronavirus protein spikes affect infection rates, hopes to apply the physics-based machine-learning approaches he learned in 2.C161 to his research on de novo protein design.

    Whatever the physical system students addressed in their final projects, Barbastathis was careful to stress one unifying goal: the need to assess ethical implications in data science. While more traditional computing methods like face or voice recognition have proven to be rife with ethical issues, there is an opportunity to combine physical systems with machine learning in a fair, ethical way.

    “We must ensure that collection and use of data are carried out equitably and inclusively, respecting the diversity in our society and avoiding well-known problems that computer scientists in the past have run into,” says Barbastathis.

    Barbastathis hopes that by encouraging mechanical engineering students to be both ethics-literate and well-versed in data science, they can move on to develop reliable, ethically sound solutions and predictions for physical-based engineering challenges. More

  • in

    Tackling hard computational problems

    The notion that some computational problems in math and computer science can be hard should come as no surprise. There is, in fact, an entire class of problems deemed impossible to solve algorithmically. Just below this class lie slightly “easier” problems that are less well-understood — and may be impossible, too.

    David Gamarnik, professor of operations research at the MIT Sloan School of Management and the Institute for Data, Systems, and Society, is focusing his attention on the latter, less-studied category of problems, which are more relevant to the everyday world because they involve randomness — an integral feature of natural systems. He and his colleagues have developed a potent tool for analyzing these problems called the overlap gap property (or OGP). Gamarnik described the new methodology in a recent paper in the Proceedings of the National Academy of Sciences.

    P ≠ NP

    Fifty years ago, the most famous problem in theoretical computer science was formulated. Labeled “P ≠ NP,” it asks if problems involving vast datasets exist for which an answer can be verified relatively quickly, but whose solution — even if worked out on the fastest available computers — would take an absurdly long time.

    The P ≠ NP conjecture is still unproven, yet most computer scientists believe that many familiar problems — including, for instance, the traveling salesman problem — fall into this impossibly hard category. The challenge in the salesman example is to find the shortest route, in terms of distance or time, through N different cities. The task is easily managed when N=4, because there are only six possible routes to consider. But for 30 cities, there are more than 1030 possible routes, and the numbers rise dramatically from there. The biggest difficulty comes in designing an algorithm that quickly solves the problem in all cases, for all integer values of N. Computer scientists are confident, based on algorithmic complexity theory, that no such algorithm exists, thus affirming that P ≠ NP.

    There are many other examples of intractable problems like this. Suppose, for instance, you have a giant table of numbers with thousands of rows and thousands of columns. Can you find, among all possible combinations, the precise arrangement of 10 rows and 10 columns such that its 100 entries will have the highest sum attainable? “We call them optimization tasks,” Gamarnik says, “because you’re always trying to find the biggest or best — the biggest sum of numbers, the best route through cities, and so forth.”

    Computer scientists have long recognized that you can’t create a fast algorithm that can, in all cases, efficiently solve problems like the saga of the traveling salesman. “Such a thing is likely impossible for reasons that are well-understood,” Gamarnik notes. “But in real life, nature doesn’t generate problems from an adversarial perspective. It’s not trying to thwart you with the most challenging, hand-picked problem conceivable.” In fact, people normally encounter problems under more random, less contrived circumstances, and those are the problems the OGP is intended to address.

    Peaks and valleys

    To understand what the OGP is all about, it might first be instructive to see how the idea arose. Since the 1970s, physicists have been studying spin glasses — materials with properties of both liquids and solids that have unusual magnetic behaviors. Research into spin glasses has given rise to a general theory of complex systems that’s relevant to problems in physics, math, computer science, materials science, and other fields. (This work earned Giorgio Parisi a 2021 Nobel Prize in Physics.)

    One vexing issue physicists have wrestled with is trying to predict the energy states, and particularly the lowest energy configurations, of different spin glass structures. The situation is sometimes depicted by a “landscape” of countless mountain peaks separated by valleys, where the goal is to identify the highest peak. In this case, the highest peak actually represents the lowest energy state (though one could flip the picture around and instead look for the deepest hole). This turns out to be an optimization problem similar in form to the traveling salesman’s dilemma, Gamarnik explains: “You’ve got this huge collection of mountains, and the only way to find the highest appears to be by climbing up each one” — a Sisyphean chore comparable to finding a needle in a haystack.

    Physicists have shown that you can simplify this picture, and take a step toward a solution, by slicing the mountains at a certain, predetermined elevation and ignoring everything below that cutoff level. You’d then be left with a collection of peaks protruding above a uniform layer of clouds, with each point on those peaks representing a potential solution to the original problem.

    In a 2014 paper, Gamarnik and his coauthors noticed something that had previously been overlooked. In some cases, they realized, the diameter of each peak will be much smaller than the distances between different peaks. Consequently, if one were to pick any two points on this sprawling landscape — any two possible “solutions” — they would either be very close (if they came from the same peak) or very far apart (if drawn from different peaks). In other words, there would be a telltale “gap” in these distances — either small or large, but nothing in-between. A system in this state, Gamarnik and colleagues proposed, is characterized by the OGP.

    “We discovered that all known problems of a random nature that are algorithmically hard have a version of this property” — namely, that the mountain diameter in the schematic model is much smaller than the space between mountains, Gamarnik asserts. “This provides a more precise measure of algorithmic hardness.”

    Unlocking the secrets of algorithmic complexity

    The emergence of the OGP can help researchers assess the difficulty of creating fast algorithms to tackle particular problems. And it has already enabled them “to mathematically [and] rigorously rule out a large class of algorithms as potential contenders,” Gamarnik says. “We’ve learned, specifically, that stable algorithms — those whose output won’t change much if the input only changes a little — will fail at solving this type of optimization problem.” This negative result applies not only to conventional computers but also to quantum computers and, specifically, to so-called “quantum approximation optimization algorithms” (QAOAs), which some investigators had hoped could solve these same optimization problems. Now, owing to Gamarnik and his co-authors’ findings, those hopes have been moderated by the recognition that many layers of operations would be required for QAOA-type algorithms to succeed, which could be technically challenging.

    “Whether that’s good news or bad news depends on your perspective,” he says. “I think it’s good news in the sense that it helps us unlock the secrets of algorithmic complexity and enhances our knowledge as to what is in the realm of possibility and what is not. It’s bad news in the sense that it tells us that these problems are hard, even if nature produces them, and even if they’re generated in a random way.” The news is not really surprising, he adds. “Many of us expected it all along, but we now we have a more solid basis upon which to make this claim.”

    That still leaves researchers light-years away from being able to prove the nonexistence of fast algorithms that could solve these optimization problems in random settings. Having such a proof would provide a definitive answer to the P ≠ NP problem. “If we could show that we can’t have an algorithm that works most of the time,” he says, “that would tell us we certainly can’t have an algorithm that works all the time.”

    Predicting how long it will take before the P ≠ NP problem is resolved appears to be an intractable problem in itself. It’s likely there will be many more peaks to climb, and valleys to traverse, before researchers gain a clearer perspective on the situation. More

  • in

    Q&A: Cathy Wu on developing algorithms to safely integrate robots into our world

    Cathy Wu is the Gilbert W. Winslow Assistant Professor of Civil and Environmental Engineering and a member of the MIT Institute for Data, Systems, and Society. As an undergraduate, Wu won MIT’s toughest robotics competition, and as a graduate student took the University of California at Berkeley’s first-ever course on deep reinforcement learning. Now back at MIT, she’s working to improve the flow of robots in Amazon warehouses under the Science Hub, a new collaboration between the tech giant and the MIT Schwarzman College of Computing. Outside of the lab and classroom, Wu can be found running, drawing, pouring lattes at home, and watching YouTube videos on math and infrastructure via 3Blue1Brown and Practical Engineering. She recently took a break from all of that to talk about her work.

    Q: What put you on the path to robotics and self-driving cars?

    A: My parents always wanted a doctor in the family. However, I’m bad at following instructions and became the wrong kind of doctor! Inspired by my physics and computer science classes in high school, I decided to study engineering. I wanted to help as many people as a medical doctor could.

    At MIT, I looked for applications in energy, education, and agriculture, but the self-driving car was the first to grab me. It has yet to let go! Ninety-four percent of serious car crashes are caused by human error and could potentially be prevented by self-driving cars. Autonomous vehicles could also ease traffic congestion, save energy, and improve mobility.

    I first learned about self-driving cars from Seth Teller during his guest lecture for the course Mobile Autonomous Systems Lab (MASLAB), in which MIT undergraduates compete to build the best full-functioning robot from scratch. Our ball-fetching bot, Putzputz, won first place. From there, I took more classes in machine learning, computer vision, and transportation, and joined Teller’s lab. I also competed in several mobility-related hackathons, including one sponsored by Hubway, now known as Blue Bike.

    Q: You’ve explored ways to help humans and autonomous vehicles interact more smoothly. What makes this problem so hard?

    A: Both systems are highly complex, and our classical modeling tools are woefully insufficient. Integrating autonomous vehicles into our existing mobility systems is a huge undertaking. For example, we don’t know whether autonomous vehicles will cut energy use by 40 percent, or double it. We need more powerful tools to cut through the uncertainty. My PhD thesis at Berkeley tried to do this. I developed scalable optimization methods in the areas of robot control, state estimation, and system design. These methods could help decision-makers anticipate future scenarios and design better systems to accommodate both humans and robots.

    Q: How is deep reinforcement learning, combining deep and reinforcement learning algorithms, changing robotics?

    A: I took John Schulman and Pieter Abbeel’s reinforcement learning class at Berkeley in 2015 shortly after Deepmind published their breakthrough paper in Nature. They had trained an agent via deep learning and reinforcement learning to play “Space Invaders” and a suite of Atari games at superhuman levels. That created quite some buzz. A year later, I started to incorporate reinforcement learning into problems involving mixed traffic systems, in which only some cars are automated. I realized that classical control techniques couldn’t handle the complex nonlinear control problems I was formulating.

    Deep RL is now mainstream but it’s by no means pervasive in robotics, which still relies heavily on classical model-based control and planning methods. Deep learning continues to be important for processing raw sensor data like camera images and radio waves, and reinforcement learning is gradually being incorporated. I see traffic systems as gigantic multi-robot systems. I’m excited for an upcoming collaboration with Utah’s Department of Transportation to apply reinforcement learning to coordinate cars with traffic signals, reducing congestion and thus carbon emissions.

    Q: You’ve talked about the MIT course, 6.007 (Signals and Systems), and its impact on you. What about it spoke to you?

    A: The mindset. That problems that look messy can be analyzed with common, and sometimes simple, tools. Signals are transformed by systems in various ways, but what do these abstract terms mean, anyway? A mechanical system can take a signal like gears turning at some speed and transform it into a lever turning at another speed. A digital system can take binary digits and turn them into other binary digits or a string of letters or an image. Financial systems can take news and transform it via millions of trading decisions into stock prices. People take in signals every day through advertisements, job offers, gossip, and so on, and translate them into actions that in turn influence society and other people. This humble class on signals and systems linked mechanical, digital, and societal systems and showed me how foundational tools can cut through the noise.

    Q: In your project with Amazon you’re training warehouse robots to pick up, sort, and deliver goods. What are the technical challenges?

    A: This project involves assigning robots to a given task and routing them there. [Professor] Cynthia Barnhart’s team is focused on task assignment, and mine, on path planning. Both problems are considered combinatorial optimization problems because the solution involves a combination of choices. As the number of tasks and robots increases, the number of possible solutions grows exponentially. It’s called the curse of dimensionality. Both problems are what we call NP Hard; there may not be an efficient algorithm to solve them. Our goal is to devise a shortcut.

    Routing a single robot for a single task isn’t difficult. It’s like using Google Maps to find the shortest path home. It can be solved efficiently with several algorithms, including Dijkstra’s. But warehouses resemble small cities with hundreds of robots. When traffic jams occur, customers can’t get their packages as quickly. Our goal is to develop algorithms that find the most efficient paths for all of the robots.

    Q: Are there other applications?

    A: Yes. The algorithms we test in Amazon warehouses might one day help to ease congestion in real cities. Other potential applications include controlling planes on runways, swarms of drones in the air, and even characters in video games. These algorithms could also be used for other robotic planning tasks like scheduling and routing.

    Q: AI is evolving rapidly. Where do you hope to see the big breakthroughs coming?

    A: I’d like to see deep learning and deep RL used to solve societal problems involving mobility, infrastructure, social media, health care, and education. Deep RL now has a toehold in robotics and industrial applications like chip design, but we still need to be careful in applying it to systems with humans in the loop. Ultimately, we want to design systems for people. Currently, we simply don’t have the right tools.

    Q: What worries you most about AI taking on more and more specialized tasks?

    A: AI has the potential for tremendous good, but it could also help to accelerate the widening gap between the haves and the have-nots. Our political and regulatory systems could help to integrate AI into society and minimize job losses and income inequality, but I worry that they’re not equipped yet to handle the firehose of AI.

    Q: What’s the last great book you read?

    A: “How to Avoid a Climate Disaster,” by Bill Gates. I absolutely loved the way that Gates was able to take an overwhelmingly complex topic and distill it down into words that everyone can understand. His optimism inspires me to keep pushing on applications of AI and robotics to help avoid a climate disaster. More

  • in

    Nonsense can make sense to machine-learning models

    For all that neural networks can accomplish, we still don’t really understand how they operate. Sure, we can program them to learn, but making sense of a machine’s decision-making process remains much like a fancy puzzle with a dizzying, complex pattern where plenty of integral pieces have yet to be fitted. 

    If a model was trying to classify an image of said puzzle, for example, it could encounter well-known, but annoying adversarial attacks, or even more run-of-the-mill data or processing issues. But a new, more subtle type of failure recently identified by MIT scientists is another cause for concern: “overinterpretation,” where algorithms make confident predictions based on details that don’t make sense to humans, like random patterns or image borders. 

    This could be particularly worrisome for high-stakes environments, like split-second decisions for self-driving cars, and medical diagnostics for diseases that need more immediate attention. Autonomous vehicles in particular rely heavily on systems that can accurately understand surroundings and then make quick, safe decisions. The network used specific backgrounds, edges, or particular patterns of the sky to classify traffic lights and street signs — irrespective of what else was in the image. 

    The team found that neural networks trained on popular datasets like CIFAR-10 and ImageNet suffered from overinterpretation. Models trained on CIFAR-10, for example, made confident predictions even when 95 percent of input images were missing, and the remainder is senseless to humans. 

    “Overinterpretation is a dataset problem that’s caused by these nonsensical signals in datasets. Not only are these high-confidence images unrecognizable, but they contain less than 10 percent of the original image in unimportant areas, such as borders. We found that these images were meaningless to humans, yet models can still classify them with high confidence,” says Brandon Carter, MIT Computer Science and Artificial Intelligence Laboratory PhD student and lead author on a paper about the research. 

    Deep-image classifiers are widely used. In addition to medical diagnosis and boosting autonomous vehicle technology, there are use cases in security, gaming, and even an app that tells you if something is or isn’t a hot dog, because sometimes we need reassurance. The tech in discussion works by processing individual pixels from tons of pre-labeled images for the network to “learn.” 

    Image classification is hard, because machine-learning models have the ability to latch onto these nonsensical subtle signals. Then, when image classifiers are trained on datasets such as ImageNet, they can make seemingly reliable predictions based on those signals. 

    Although these nonsensical signals can lead to model fragility in the real world, the signals are actually valid in the datasets, meaning overinterpretation can’t be diagnosed using typical evaluation methods based on that accuracy. 

    To find the rationale for the model’s prediction on a particular input, the methods in the present study start with the full image and repeatedly ask, what can I remove from this image? Essentially, it keeps covering up the image, until you’re left with the smallest piece that still makes a confident decision. 

    To that end, it could also be possible to use these methods as a type of validation criteria. For example, if you have an autonomously driving car that uses a trained machine-learning method for recognizing stop signs, you could test that method by identifying the smallest input subset that constitutes a stop sign. If that consists of a tree branch, a particular time of day, or something that’s not a stop sign, you could be concerned that the car might come to a stop at a place it’s not supposed to.

    While it may seem that the model is the likely culprit here, the datasets are more likely to blame. “There’s the question of how we can modify the datasets in a way that would enable models to be trained to more closely mimic how a human would think about classifying images and therefore, hopefully, generalize better in these real-world scenarios, like autonomous driving and medical diagnosis, so that the models don’t have this nonsensical behavior,” says Carter. 

    This may mean creating datasets in more controlled environments. Currently, it’s just pictures that are extracted from public domains that are then classified. But if you want to do object identification, for example, it might be necessary to train models with objects with an uninformative background. 

    This work was supported by Schmidt Futures and the National Institutes of Health. Carter wrote the paper alongside Siddhartha Jain and Jonas Mueller, scientists at Amazon, and MIT Professor David Gifford. They are presenting the work at the 2021 Conference on Neural Information Processing Systems. More