More stories

  • in

    How machine learning models can amplify inequities in medical diagnosis and treatment

    Prior to receiving a PhD in computer science from MIT in 2017, Marzyeh Ghassemi had already begun to wonder whether the use of AI techniques might enhance the biases that already existed in health care. She was one of the early researchers to take up this issue, and she’s been exploring it ever since. In a new paper, Ghassemi, now an assistant professor in MIT’s Department of Electrical Science and Engineering (EECS), and three collaborators based at the Computer Science and Artificial Intelligence Laboratory, have probed the roots of the disparities that can arise in machine learning, often causing models that perform well overall to falter when it comes to subgroups for which relatively few data have been collected and utilized in the training process. The paper — written by two MIT PhD students, Yuzhe Yang and Haoran Zhang, EECS computer scientist Dina Katabi (the Thuan and Nicole Pham Professor), and Ghassemi — was presented last month at the 40th International Conference on Machine Learning in Honolulu, Hawaii.

    In their analysis, the researchers focused on “subpopulation shifts” — differences in the way machine learning models perform for one subgroup as compared to another. “We want the models to be fair and work equally well for all groups, but instead we consistently observe the presence of shifts among different groups that can lead to inferior medical diagnosis and treatment,” says Yang, who along with Zhang are the two lead authors on the paper. The main point of their inquiry is to determine the kinds of subpopulation shifts that can occur and to uncover the mechanisms behind them so that, ultimately, more equitable models can be developed.

    The new paper “significantly advances our understanding” of the subpopulation shift phenomenon, claims Stanford University computer scientist Sanmi Koyejo. “This research contributes valuable insights for future advancements in machine learning models’ performance on underrepresented subgroups.”

    Camels and cattle

    The MIT group has identified four principal types of shifts — spurious correlations, attribute imbalance, class imbalance, and attribute generalization — which, according to Yang, “have never been put together into a coherent and unified framework. We’ve come up with a single equation that shows you where biases can come from.”

    Biases can, in fact, stem from what the researchers call the class, or from the attribute, or both. To pick a simple example, suppose the task assigned to the machine learning model is to sort images of objects — animals in this case — into two classes: cows and camels. Attributes are descriptors that don’t specifically relate to the class itself. It might turn out, for instance, that all the images used in the analysis show cows standing on grass and camels on sand — grass and sand serving as the attributes here. Given the data available to it, the machine could reach an erroneous conclusion — namely that cows can only be found on grass, not on sand, with the opposite being true for camels. Such a finding would be incorrect, however, giving rise to a spurious correlation, which, Yang explains, is a “special case” among subpopulation shifts — “one in which you have a bias in both the class and the attribute.”

    In a medical setting, one could rely on machine learning models to determine whether a person has pneumonia or not based on an examination of X-ray images. There would be two classes in this situation, one consisting of people who have the lung ailment, another for those who are infection-free. A relatively straightforward case would involve just two attributes: the people getting X-rayed are either female or male. If, in this particular dataset, there were 100 males diagnosed with pneumonia for every one female diagnosed with pneumonia, that could lead to an attribute imbalance, and the model would likely do a better job of correctly detecting pneumonia for a man than for a woman. Similarly, having 1,000 times more healthy (pneumonia-free) subjects than sick ones would lead to a class imbalance, with the model biased toward healthy cases. Attribute generalization is the last shift highlighted in the new study. If your sample contained 100 male patients with pneumonia and zero female subjects with the same illness, you still would like the model to be able to generalize and make predictions about female subjects even though there are no samples in the training data for females with pneumonia.

    The team then took 20 advanced algorithms, designed to carry out classification tasks, and tested them on a dozen datasets to see how they performed across different population groups. They reached some unexpected conclusions: By improving the “classifier,” which is the last layer of the neural network, they were able to reduce the occurrence of spurious correlations and class imbalance, but the other shifts were unaffected. Improvements to the “encoder,” one of the uppermost layers in the neural network, could reduce the problem of attribute imbalance. “However, no matter what we did to the encoder or classifier, we did not see any improvements in terms of attribute generalization,” Yang says, “and we don’t yet know how to address that.”

    Precisely accurate

    There is also the question of assessing how well your model actually works in terms of evenhandedness among different population groups. The metric normally used, called worst-group accuracy or WGA, is based on the assumption that if you can improve the accuracy — of, say, medical diagnosis — for the group that has the worst model performance, you would have improved the model as a whole. “The WGA is considered the gold standard in subpopulation evaluation,” the authors contend, but they made a surprising discovery: boosting worst-group accuracy results in a decrease in what they call “worst-case precision.” In medical decision-making of all sorts, one needs both accuracy — which speaks to the validity of the findings — and precision, which relates to the reliability of the methodology. “Precision and accuracy are both very important metrics in classification tasks, and that is especially true in medical diagnostics,” Yang explains. “You should never trade precision for accuracy. You always need to balance the two.”

    The MIT scientists are putting their theories into practice. In a study they’re conducting with a medical center, they’re looking at public datasets for tens of thousands of patients and hundreds of thousands of chest X-rays, trying to see whether it’s possible for machine learning models to work in an unbiased manner for all populations. That’s still far from the case, even though more awareness has been drawn to this problem, Yang says. “We are finding many disparities across different ages, gender, ethnicity, and intersectional groups.”

    He and his colleagues agree on the eventual goal, which is to achieve fairness in health care among all populations. But before we can reach that point, they maintain, we still need a better understanding of the sources of unfairness and how they permeate our current system. Reforming the system as a whole will not be easy, they acknowledge. In fact, the title of the paper they introduced at the Honolulu conference, “Change is Hard,” gives some indications as to the challenges that they and like-minded researchers face. More

  • in

    The curse of variety in transportation systems

    Cathy Wu has always delighted in systems that run smoothly. In high school, she designed a project to optimize the best route for getting to class on time. Her research interests and career track are evidence of a propensity for organizing and optimizing, coupled with a strong sense of responsibility to contribute to society instilled by her parents at a young age.

    As an undergraduate at MIT, Wu explored domains like agriculture, energy, and education, eventually homing in on transportation. “Transportation touches each of our lives,” she says. “Every day, we experience the inefficiencies and safety issues as well as the environmental harms associated with our transportation systems. I believe we can and should do better.”

    But doing so is complicated. Consider the long-standing issue of traffic systems control. Wu explains that it is not one problem, but more accurately a family of control problems impacted by variables like time of day, weather, and vehicle type — not to mention the types of sensing and communication technologies used to measure roadway information. Every differentiating factor introduces an exponentially larger set of control problems. There are thousands of control-problem variations and hundreds, if not thousands, of studies and papers dedicated to each problem. Wu refers to the sheer number of variations as the curse of variety — and it is hindering innovation.

    Play video

    “To prove that a new control strategy can be safely deployed on our streets can take years. As time lags, we lose opportunities to improve safety and equity while mitigating environmental impacts. Accelerating this process has huge potential,” says Wu.  

    Which is why she and her group in the MIT Laboratory for Information and Decision Systems are devising machine learning-based methods to solve not just a single control problem or a single optimization problem, but families of control and optimization problems at scale. “In our case, we’re examining emerging transportation problems that people have spent decades trying to solve with classical approaches. It seems to me that we need a different approach.”

    Optimizing intersections

    Currently, Wu’s largest research endeavor is called Project Greenwave. There are many sectors that directly contribute to climate change, but transportation is responsible for the largest share of greenhouse gas emissions — 29 percent, of which 81 percent is due to land transportation. And while much of the conversation around mitigating environmental impacts related to mobility is focused on electric vehicles (EVs), electrification has its drawbacks. EV fleet turnover is time-consuming (“on the order of decades,” says Wu), and limited global access to the technology presents a significant barrier to widespread adoption.

    Wu’s research, on the other hand, addresses traffic control problems by leveraging deep reinforcement learning. Specifically, she is looking at traffic intersections — and for good reason. In the United States alone, there are more than 300,000 signalized intersections where vehicles must stop or slow down before re-accelerating. And every re-acceleration burns fossil fuels and contributes to greenhouse gas emissions.

    Highlighting the magnitude of the issue, Wu says, “We have done preliminary analysis indicating that up to 15 percent of land transportation CO2 is wasted through energy spent idling and re-accelerating at intersections.”

    To date, she and her group have modeled 30,000 different intersections across 10 major metropolitan areas in the United States. That is 30,000 different configurations, roadway topologies (e.g., grade of road or elevation), different weather conditions, and variations in travel demand and fuel mix. Each intersection and its corresponding scenarios represents a unique multi-agent control problem.

    Wu and her team are devising techniques that can solve not just one, but a whole family of problems comprised of tens of thousands of scenarios. Put simply, the idea is to coordinate the timing of vehicles so they arrive at intersections when traffic lights are green, thereby eliminating the start, stop, re-accelerate conundrum. Along the way, they are building an ecosystem of tools, datasets, and methods to enable roadway interventions and impact assessments of strategies to significantly reduce carbon-intense urban driving.

    Play video

    Their collaborator on the project is the Utah Department of Transportation, which Wu says has played an essential role, in part by sharing data and practical knowledge that she and her group otherwise would not have been able to access publicly.

    “I appreciate industry and public sector collaborations,” says Wu. “When it comes to important societal problems, one really needs grounding with practitioners. One needs to be able to hear the perspectives in the field. My interactions with practitioners expand my horizons and help ground my research. You never know when you’ll hear the perspective that is the key to the solution, or perhaps the key to understanding the problem.”

    Finding the best routes

    In a similar vein, she and her research group are tackling large coordination problems. For example, vehicle routing. “Every day, delivery trucks route more than a hundred thousand packages for the city of Boston alone,” says Wu. Accomplishing the task requires, among other things, figuring out which trucks to use, which packages to deliver, and the order in which to deliver them as efficiently as possible. If and when the trucks are electrified, they will need to be charged, adding another wrinkle to the process and further complicating route optimization.

    The vehicle routing problem, and therefore the scope of Wu’s work, extends beyond truck routing for package delivery. Ride-hailing cars may need to pick up objects as well as drop them off; and what if delivery is done by bicycle or drone? In partnership with Amazon, for example, Wu and her team addressed routing and path planning for hundreds of robots (up to 800) in their warehouses.

    Every variation requires custom heuristics that are expensive and time-consuming to develop. Again, this is really a family of problems — each one complicated, time-consuming, and currently unsolved by classical techniques — and they are all variations of a central routing problem. The curse of variety meets operations and logistics.

    By combining classical approaches with modern deep-learning methods, Wu is looking for a way to automatically identify heuristics that can effectively solve all of these vehicle routing problems. So far, her approach has proved successful.

    “We’ve contributed hybrid learning approaches that take existing solution methods for small problems and incorporate them into our learning framework to scale and accelerate that existing solver for large problems. And we’re able to do this in a way that can automatically identify heuristics for specialized variations of the vehicle routing problem.” The next step, says Wu, is applying a similar approach to multi-agent robotics problems in automated warehouses.

    Wu and her group are making big strides, in part due to their dedication to use-inspired basic research. Rather than applying known methods or science to a problem, they develop new methods, new science, to address problems. The methods she and her team employ are necessitated by societal problems with practical implications. The inspiration for the approach? None other than Louis Pasteur, who described his research style in a now-famous article titled “Pasteur’s Quadrant.” Anthrax was decimating the sheep population, and Pasteur wanted to better understand why and what could be done about it. The tools of the time could not solve the problem, so he invented a new field, microbiology, not out of curiosity but out of necessity. More

  • in

    Making sense of cell fate

    Despite the proliferation of novel therapies such as immunotherapy or targeted therapies, radiation and chemotherapy remain the frontline treatment for cancer patients. About half of all patients still receive radiation and 60-80 percent receive chemotherapy.

    Both radiation and chemotherapy work by damaging DNA, taking advantage of a vulnerability specific to cancer cells. Healthy cells are more likely to survive radiation and chemotherapy since their mechanisms for identifying and repairing DNA damage are intact. In cancer cells, these repair mechanisms are compromised by mutations. When cancer cells cannot adequately respond to the DNA damage caused by radiation and chemotherapy, ideally, they undergo apoptosis or die by other means.

    However, there is another fate for cells after DNA damage: senescence — a state where cells survive, but stop dividing. Senescent cells’ DNA has not been damaged enough to induce apoptosis but is too damaged to support cell division. While senescent cancer cells themselves are unable to proliferate and spread, they are bad actors in the fight against cancer because they seem to enable other cancer cells to develop more aggressively. Although a cancer cell’s fate is not apparent until a few days after treatment, the decision to survive, die, or enter senescence is made much earlier. But, precisely when and how that decision is made has not been well understood.

    In an open-access study of ovarian and osteosarcoma cancer cells appearing July 19 in Cell Systems, MIT researchers show that cell signaling proteins commonly associated with cell proliferation and apoptosis instead commit cancer cells to senescence within 12 hours of treatment with low doses of certain kinds of chemotherapy.

    “When it comes to treating cancer, this study underscores that it’s important not to think too linearly about cell signaling,” says Michael Yaffe, who is a David H. Koch Professor of Science at MIT, the director of the MIT Center for Precision Cancer Medicine, a member of MIT’s Koch Institute for Integrative Cancer Research, and the senior author of the study. “If you assume that a particular treatment will always affect cancer cell signaling in the same way — you may be setting yourself up for many surprises, and treating cancers with the wrong combination of drugs.”

    Using a combination of experiments with cancer cells and computational modeling, the team investigated the cell signaling mechanisms that prompt cancer cells to enter senescence after treatment with a commonly used anti-cancer agent. Their efforts singled out two protein kinases and a component of the AP-1 transcription factor complex as highly associated with the induction of senescence after DNA damage, despite the well-established roles for all of these molecules in promoting cell proliferation in cancer.

    The researchers treated cancer cells with low and high doses of doxorubicin, a chemotherapy that interferes with the function with topoisomerase II, an enzyme that breaks and then repairs DNA strands during replication to fix tangles and other topological problems.

    By measuring the effects of DNA damage on single cells at several time points ranging from six hours to four days after the initial exposure, the team created two datasets. In one dataset, the researchers tracked cell fate over time. For the second set, researchers measured relative cell signaling activity levels across a variety of proteins associated with responses to DNA damage or cellular stress, determination of cell fate, and progress through cell growth and division.

    The two datasets were used to build a computational model that identifies correlations between time, dosage, signal, and cell fate. The model identified the activities of the MAP kinases Erk and JNK, and the transcription factor c-Jun as key components of the AP-1 protein likewise understood to involved in the induction of senescence. The researchers then validated these computational findings by showing that inhibition of JNK and Erk after DNA damage successfully prevented cells from entering senescence.

    The researchers leveraged JNK and Erk inhibition to pinpoint exactly when cells made the decision to enter senescence. Surprisingly, they found that the decision to enter senescence was made within 12 hours of DNA damage, even though it took days to actually see the senescent cells accumulate. The team also found that with the passage of more time, these MAP kinases took on a different function: promoting the secretion of proinflammatory proteins called cytokines that are responsible for making other cancer cells proliferate and develop resistance to chemotherapy.

    “Proteins like cytokines encourage ‘bad behavior’ in neighboring tumor cells that lead to more aggressive cancer progression,” says Tatiana Netterfield, a graduate student in the Yaffe lab and the lead author of the study. “Because of this, it is thought that senescent cells that stay near the tumor for long periods of time are detrimental to treating cancer.”

    This study’s findings apply to cancer cells treated with a commonly used type of chemotherapy that stalls DNA replication after repair. But more broadly, the study emphasizes that “when treating cancer, it’s extremely important to understand the molecular characteristics of cancer cells and the contextual factors such as time and dosing that determine cell fate,” explains Netterfield.

    The study, however, has more immediate implications for treatments that are already in use. One class of Erk inhibitors, MEK inhibitors, are used in the clinic with the expectation that they will curb cancer growth.

    “We must be cautious about administering MEK inhibitors together with chemotherapies,” says Yaffe. “The combination may have the unintended effect of driving cells into proliferation, rather than senescence.”

    In future work, the team will perform studies to understand how and why individual cells choose to proliferate instead of enter senescence. Additionally, the team is employing next-generation sequencing to understand which genes c-Jun is regulating in order to push cells toward senescence.

    This study was funded, in part, by the Charles and Marjorie Holloway Foundation and the MIT Center for Precision Cancer Medicine. More

  • in

    A simpler method for learning to control a robot

    Researchers from MIT and Stanford University have devised a new machine-learning approach that could be used to control a robot, such as a drone or autonomous vehicle, more effectively and efficiently in dynamic environments where conditions can change rapidly.

    This technique could help an autonomous vehicle learn to compensate for slippery road conditions to avoid going into a skid, allow a robotic free-flyer to tow different objects in space, or enable a drone to closely follow a downhill skier despite being buffeted by strong winds.

    The researchers’ approach incorporates certain structure from control theory into the process for learning a model in such a way that leads to an effective method of controlling complex dynamics, such as those caused by impacts of wind on the trajectory of a flying vehicle. One way to think about this structure is as a hint that can help guide how to control a system.

    “The focus of our work is to learn intrinsic structure in the dynamics of the system that can be leveraged to design more effective, stabilizing controllers,” says Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS). “By jointly learning the system’s dynamics and these unique control-oriented structures from data, we’re able to naturally create controllers that function much more effectively in the real world.”

    Using this structure in a learned model, the researchers’ technique immediately extracts an effective controller from the model, as opposed to other machine-learning methods that require a controller to be derived or learned separately with additional steps. With this structure, their approach is also able to learn an effective controller using fewer data than other approaches. This could help their learning-based control system achieve better performance faster in rapidly changing environments.

    “This work tries to strike a balance between identifying structure in your system and just learning a model from data,” says lead author Spencer M. Richards, a graduate student at Stanford University. “Our approach is inspired by how roboticists use physics to derive simpler models for robots. Physical analysis of these models often yields a useful structure for the purposes of control — one that you might miss if you just tried to naively fit a model to data. Instead, we try to identify similarly useful structure from data that indicates how to implement your control logic.”

    Additional authors of the paper are Jean-Jacques Slotine, professor of mechanical engineering and of brain and cognitive sciences at MIT, and Marco Pavone, associate professor of aeronautics and astronautics at Stanford. The research will be presented at the International Conference on Machine Learning (ICML).

    Learning a controller

    Determining the best way to control a robot to accomplish a given task can be a difficult problem, even when researchers know how to model everything about the system.

    A controller is the logic that enables a drone to follow a desired trajectory, for example. This controller would tell the drone how to adjust its rotor forces to compensate for the effect of winds that can knock it off a stable path to reach its goal.

    This drone is a dynamical system — a physical system that evolves over time. In this case, its position and velocity change as it flies through the environment. If such a system is simple enough, engineers can derive a controller by hand. 

    Modeling a system by hand intrinsically captures a certain structure based on the physics of the system. For instance, if a robot were modeled manually using differential equations, these would capture the relationship between velocity, acceleration, and force. Acceleration is the rate of change in velocity over time, which is determined by the mass of and forces applied to the robot.

    But often the system is too complex to be exactly modeled by hand. Aerodynamic effects, like the way swirling wind pushes a flying vehicle, are notoriously difficult to derive manually, Richards explains. Researchers would instead take measurements of the drone’s position, velocity, and rotor speeds over time, and use machine learning to fit a model of this dynamical system to the data. But these approaches typically don’t learn a control-based structure. This structure is useful in determining how to best set the rotor speeds to direct the motion of the drone over time.

    Once they have modeled the dynamical system, many existing approaches also use data to learn a separate controller for the system.

    “Other approaches that try to learn dynamics and a controller from data as separate entities are a bit detached philosophically from the way we normally do it for simpler systems. Our approach is more reminiscent of deriving models by hand from physics and linking that to control,” Richards says.

    Identifying structure

    The team from MIT and Stanford developed a technique that uses machine learning to learn the dynamics model, but in such a way that the model has some prescribed structure that is useful for controlling the system.

    With this structure, they can extract a controller directly from the dynamics model, rather than using data to learn an entirely separate model for the controller.

    “We found that beyond learning the dynamics, it’s also essential to learn the control-oriented structure that supports effective controller design. Our approach of learning state-dependent coefficient factorizations of the dynamics has outperformed the baselines in terms of data efficiency and tracking capability, proving to be successful in efficiently and effectively controlling the system’s trajectory,” Azizan says. 

    When they tested this approach, their controller closely followed desired trajectories, outpacing all the baseline methods. The controller extracted from their learned model nearly matched the performance of a ground-truth controller, which is built using the exact dynamics of the system.

    “By making simpler assumptions, we got something that actually worked better than other complicated baseline approaches,” Richards adds.

    The researchers also found that their method was data-efficient, which means it achieved high performance even with few data. For instance, it could effectively model a highly dynamic rotor-driven vehicle using only 100 data points. Methods that used multiple learned components saw their performance drop much faster with smaller datasets.

    This efficiency could make their technique especially useful in situations where a drone or robot needs to learn quickly in rapidly changing conditions.

    Plus, their approach is general and could be applied to many types of dynamical systems, from robotic arms to free-flying spacecraft operating in low-gravity environments.

    In the future, the researchers are interested in developing models that are more physically interpretable, and that would be able to identify very specific information about a dynamical system, Richards says. This could lead to better-performing controllers.

    “Despite its ubiquity and importance, nonlinear feedback control remains an art, making it especially suitable for data-driven and learning-based methods. This paper makes a significant contribution to this area by proposing a method that jointly learns system dynamics, a controller, and control-oriented structure,” says Nikolai Matni, an assistant professor in the Department of Electrical and Systems Engineering at the University of Pennsylvania, who was not involved with this work. “What I found particularly exciting and compelling was the integration of these components into a joint learning algorithm, such that control-oriented structure acts as an inductive bias in the learning process. The result is a data-efficient learning process that outputs dynamic models that enjoy intrinsic structure that enables effective, stable, and robust control. While the technical contributions of the paper are excellent themselves, it is this conceptual contribution that I view as most exciting and significant.”

    This research is supported, in part, by the NASA University Leadership Initiative and the Natural Sciences and Engineering Research Council of Canada. More

  • in

    A new dataset of Arctic images will spur artificial intelligence research

    As the U.S. Coast Guard (USCG) icebreaker Healy takes part in a voyage across the North Pole this summer, it is capturing images of the Arctic to further the study of this rapidly changing region. Lincoln Laboratory researchers installed a camera system aboard the Healy while at port in Seattle before it embarked on a three-month science mission on July 11. The resulting dataset, which will be one of the first of its kind, will be used to develop artificial intelligence tools that can analyze Arctic imagery.

    “This dataset not only can help mariners navigate more safely and operate more efficiently, but also help protect our nation by providing critical maritime domain awareness and an improved understanding of how AI analysis can be brought to bear in this challenging and unique environment,” says Jo Kurucar, a researcher in Lincoln Laboratory’s AI Software Architectures and Algorithms Group, which led this project.

    As the planet warms and sea ice melts, Arctic passages are opening up to more traffic, both to military vessels and ships conducting illegal fishing. These movements may pose national security challenges to the United States. The opening Arctic also leaves questions about how its climate, wildlife, and geography are changing.

    Today, very few imagery datasets of the Arctic exist to study these changes. Overhead images from satellites or aircraft can only provide limited information about the environment. An outward-looking camera attached to a ship can capture more details of the setting and different angles of objects, such as other ships, in the scene. These types of images can then be used to train AI computer-vision tools, which can help the USCG plan naval missions and automate analysis. According to Kurucar, USCG assets in the Arctic are spread thin and can benefit greatly from AI tools, which can act as a force multiplier.

    The Healy is the USCG’s largest and most technologically advanced icebreaker. Given its current mission, it was a fitting candidate to be equipped with a new sensor to gather this dataset. The laboratory research team collaborated with the USCG Research and Development Center to determine the sensor requirements. Together, they developed the Cold Region Imaging and Surveillance Platform (CRISP).

    “Lincoln Laboratory has an excellent relationship with the Coast Guard, especially with the Research and Development Center. Over a decade, we’ve established ties that enabled the deployment of the CRISP system,” says Amna Greaves, the CRISP project lead and an assistant leader in the AI Software Architectures and Algorithms Group. “We have strong ties not only because of the USCG veterans working at the laboratory and in our group, but also because our technology missions are complementary. Today it was deploying infrared sensing in the Arctic; tomorrow it could be operating quadruped robot dogs on a fast-response cutter.”

    The CRISP system comprises a long-wave infrared camera, manufactured by Teledyne FLIR (for forward-looking infrared), that is designed for harsh maritime environments. The camera can stabilize itself during rough seas and image in complete darkness, fog, and glare. It is paired with a GPS-enabled time-synchronized clock and a network video recorder to record both video and still imagery along with GPS-positional data.  

    The camera is mounted at the front of the ship’s fly bridge, and the electronics are housed in a ruggedized rack on the bridge. The system can be operated manually from the bridge or be placed into an autonomous surveillance mode, in which it slowly pans back and forth, recording 15 minutes of video every three hours and a still image once every 15 seconds.

    “The installation of the equipment was a unique and fun experience. As with any good project, our expectations going into the install did not meet reality,” says Michael Emily, the project’s IT systems administrator who traveled to Seattle for the install. Working with the ship’s crew, the laboratory team had to quickly adjust their route for running cables from the camera to the observation station after they discovered that the expected access points weren’t in fact accessible. “We had 100-foot cables made for this project just in case of this type of scenario, which was a good thing because we only had a few inches to spare,” Emily says.

    The CRISP project team plans to publicly release the dataset, anticipated to be about 4 terabytes in size, once the USCG science mission concludes in the fall.

    The goal in releasing the dataset is to enable the wider research community to develop better tools for those operating in the Arctic, especially as this region becomes more navigable. “Collecting and publishing the data allows for faster and greater progress than what we could accomplish on our own,” Kurucar adds. “It also enables the laboratory to engage in more advanced AI applications while others make more incremental advances using the dataset.”

    On top of providing the dataset, the laboratory team plans to provide a baseline object-detection model, from which others can make progress on their own models. More advanced AI applications planned for development are classifiers for specific objects in the scene and the ability to identify and track objects across images.

    Beyond assisting with USCG missions, this project could create an influential dataset for researchers looking to apply AI to data from the Arctic to help combat climate change, says Paul Metzger, who leads the AI Software Architectures and Algorithms Group.

    Metzger adds that the group was honored to be a part of this project and is excited to see the advances that come from applying AI to novel challenges facing the United States: “I’m extremely proud of how our group applies AI to the highest-priority challenges in our nation, from predicting outbreaks of Covid-19 and assisting the U.S. European Command in their support of Ukraine to now employing AI in the Arctic for maritime awareness.”

    Once the dataset is available, it will be free to download on the Lincoln Laboratory dataset website. More

  • in

    A faster way to teach a robot

    Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

    “Right now, the way we train these robots, when they fail, we don’t really know why. So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that is missing from this system is enabling the robot to demonstrate why it is failing so the user can give it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

    Peng and her collaborators at MIT, New York University, and the University of California at Berkeley created a framework that enables humans to quickly teach a robot what they want it to do, with a minimal amount of effort.

    When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

    Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it can perform a second, similar task.

    The researchers tested this technique in simulations and found that it could teach a robot more efficiently than other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

    This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

    Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

    On-the-job training

    Robots often fail due to distribution shift — the robot is presented with objects and spaces it did not see during training, and it doesn’t understand what to do in this new environment.

    One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

    Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

    “I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color,” Peng says.

    To accomplish this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

    The framework has three steps. First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

    The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts do not impact the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

    In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

    Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

    From human reasoning to robot reasoning

    Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

    “It was so clear right off the bat. Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” she says.

    Then they applied their framework to three simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

    Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes the system to create new data using generative machine-learning models.

    “We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level,” Peng says.

    This research is supported, in part, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. More

  • in

    System tracks movement of food through global humanitarian supply chain

    Although more than enough food is produced to feed everyone in the world, as many as 828 million people face hunger today. Poverty, social inequity, climate change, natural disasters, and political conflicts all contribute to inhibiting access to food. For decades, the U.S. Agency for International Development (USAID) Bureau for Humanitarian Assistance (BHA) has been a leader in global food assistance, supplying millions of metric tons of food to recipients worldwide. Alleviating hunger — and the conflict and instability hunger causes — is critical to U.S. national security.

    But BHA is only one player within a large, complex supply chain in which food gets handed off between more than 100 partner organizations before reaching its final destination. Traditionally, the movement of food through the supply chain has been a black-box operation, with stakeholders largely out of the loop about what happens to the food once it leaves their custody. This lack of direct visibility into operations is due to siloed data repositories, insufficient data sharing among stakeholders, and different data formats that operators must manually sort through and standardize. As a result, accurate, real-time information — such as where food shipments are at any given time, which shipments are affected by delays or food recalls, and when shipments have arrived at their final destination — is lacking. A centralized system capable of tracing food along its entire journey, from manufacture through delivery, would enable a more effective humanitarian response to food-aid needs.

    In 2020, a team from MIT Lincoln Laboratory began engaging with BHA to create an intelligent dashboard for their supply-chain operations. This dashboard brings together the expansive food-aid datasets from BHA’s existing systems into a single platform, with tools for visualizing and analyzing the data. When the team started developing the dashboard, they quickly realized the need for considerably more data than BHA had access to.

    “That’s where traceability comes in, with each handoff partner contributing key pieces of information as food moves through the supply chain,” explains Megan Richardson, a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group.

    Richardson and the rest of the team have been working with BHA and their partners to scope, build, and implement such an end-to-end traceability system. This system consists of serialized, unique identifiers (IDs) — akin to fingerprints — that are assigned to individual food items at the time they are produced. These individual IDs remain linked to items as they are aggregated along the supply chain, first domestically and then internationally. For example, individually tagged cans of vegetable oil get packaged into cartons; cartons are placed onto pallets and transported via railway and truck to warehouses; pallets are loaded onto shipping containers at U.S. ports; and pallets are unloaded and cartons are unpackaged overseas.

    With a trace

    Today, visibility at the single-item level doesn’t exist. Most suppliers mark pallets with a lot number (a lot is a batch of items produced in the same run), but this is for internal purposes (i.e., to track issues stemming back to their production supply, like over-enriched ingredients or machinery malfunction), not data sharing. So, organizations know which supplier lot a pallet and carton are associated with, but they can’t track the unique history of an individual carton or item within that pallet. As the lots move further downstream toward their final destination, they are often mixed with lots from other productions, and possibly other commodity types altogether, because of space constraints. On the international side, such mixing and the lack of granularity make it difficult to quickly pull commodities out of the supply chain if food safety concerns arise. Current response times can span several months.

    “Commodities are grouped differently at different stages of the supply chain, so it is logical to track them in those groupings where needed,” Richardson says. “Our item-level granularity serves as a form of Rosetta Stone to enable stakeholders to efficiently communicate throughout these stages. We’re trying to enable a way to track not only the movement of commodities, including through their lot information, but also any problems arising independent of lot, like exposure to high humidity levels in a warehouse. Right now, we have no way to associate commodities with histories that may have resulted in an issue.”

    “You can now track your checked luggage across the world and the fish on your dinner plate,” adds Brice MacLaren, also a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group. “So, this technology isn’t new, but it’s new to BHA as they evolve their methodology for commodity tracing. The traceability system needs to be versatile, working across a wide variety of operators who take custody of the commodity along the supply chain and fitting into their existing best practices.”

    As food products make their way through the supply chain, operators at each receiving point would be able to scan these IDs via a Lincoln Laboratory-developed mobile application (app) to indicate a product’s current location and transaction status — for example, that it is en route on a particular shipping container or stored in a certain warehouse. This information would get uploaded to a secure traceability server. By scanning a product, operators would also see its history up until that point.   

    Hitting the mark

    At the laboratory, the team tested the feasibility of their traceability technology, exploring different ways to mark and scan items. In their testing, they considered barcodes and radio-frequency identification (RFID) tags and handheld and fixed scanners. Their analysis revealed 2D barcodes (specifically data matrices) and smartphone-based scanners were the most feasible options in terms of how the technology works and how it fits into existing operations and infrastructure.

    “We needed to come up with a solution that would be practical and sustainable in the field,” MacLaren says. “While scanners can automatically read any RFID tags in close proximity as someone is walking by, they can’t discriminate exactly where the tags are coming from. RFID is expensive, and it’s hard to read commodities in bulk. On the other hand, a phone can scan a barcode on a particular box and tell you that code goes with that box. The challenge then becomes figuring out how to present the codes for people to easily scan without significantly interrupting their usual processes for handling and moving commodities.” 

    As the team learned from partner representatives in Kenya and Djibouti, offloading at the ports is a chaotic, fast operation. At manual warehouses, porters fling bags over their shoulders or stack cartons atop their heads any which way they can and run them to a drop point; at bagging terminals, commodities come down a conveyor belt and land this way or that way. With this variability comes several questions: How many barcodes do you need on an item? Where should they be placed? What size should they be? What will they cost? The laboratory team is considering these questions, keeping in mind that the answers will vary depending on the type of commodity; vegetable oil cartons will have different specifications than, say, 50-kilogram bags of wheat or peas.

    Leaving a mark

    Leveraging results from their testing and insights from international partners, the team has been running a traceability pilot evaluating how their proposed system meshes with real-world domestic and international operations. The current pilot features a domestic component in Houston, Texas, and an international component in Ethiopia, and focuses on tracking individual cartons of vegetable oil and identifying damaged cans. The Ethiopian team with Catholic Relief Services recently received a container filled with pallets of uniquely barcoded cartons of vegetable oil cans (in the next pilot, the cans will be barcoded, too). They are now scanning items and collecting data on product damage by using smartphones with the laboratory-developed mobile traceability app on which they were trained. 

    “The partners in Ethiopia are comparing a couple lid types to determine whether some are more resilient than others,” Richardson says. “With the app — which is designed to scan commodities, collect transaction data, and keep history — the partners can take pictures of damaged cans and see if a trend with the lid type emerges.”

    Next, the team will run a series of pilots with the World Food Program (WFP), the world’s largest humanitarian organization. The first pilot will focus on data connectivity and interoperability, and the team will engage with suppliers to directly print barcodes on individual commodities instead of applying barcode labels to packaging, as they did in the initial feasibility testing. The WFP will provide input on which of their operations are best suited for testing the traceability system, considering factors like the network bandwidth of WFP staff and local partners, the commodity types being distributed, and the country context for scanning. The BHA will likely also prioritize locations for system testing.

    “Our goal is to provide an infrastructure to enable as close to real-time data exchange as possible between all parties, given intermittent power and connectivity in these environments,” MacLaren says.

    In subsequent pilots, the team will try to integrate their approach with existing systems that partners rely on for tracking procurements, inventory, and movement of commodities under their custody so that this information is automatically pushed to the traceability server. The team also hopes to add a capability for real-time alerting of statuses, like the departure and arrival of commodities at a port or the exposure of unclaimed commodities to the elements. Real-time alerts would enable stakeholders to more efficiently respond to food-safety events. Currently, partners are forced to take a conservative approach, pulling out more commodities from the supply chain than are actually suspect, to reduce risk of harm. Both BHA and WHP are interested in testing out a food-safety event during one of the pilots to see how the traceability system works in enabling rapid communication response.

    To implement this technology at scale will require some standardization for marking different commodity types as well as give and take among the partners on best practices for handling commodities. It will also require an understanding of country regulations and partner interactions with subcontractors, government entities, and other stakeholders.

    “Within several years, I think it’s possible for BHA to use our system to mark and trace all their food procured in the United States and sent internationally,” MacLaren says.

    Once collected, the trove of traceability data could be harnessed for other purposes, among them analyzing historical trends, predicting future demand, and assessing the carbon footprint of commodity transport. In the future, a similar traceability system could scale for nonfood items, including medical supplies distributed to disaster victims, resources like generators and water trucks localized in emergency-response scenarios, and vaccines administered during pandemics. Several groups at the laboratory are also interested in such a system to track items such as tools deployed in space or equipment people carry through different operational environments.

    “When we first started this program, colleagues were asking why the laboratory was involved in simple tasks like making a dashboard, marking items with barcodes, and using hand scanners,” MacLaren says. “Our impact here isn’t about the technology; it’s about providing a strategy for coordinated food-aid response and successfully implementing that strategy. Most importantly, it’s about people getting fed.” More

  • in

    A new way to look at data privacy

    Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.

    But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.

    MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.

    The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.

    In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings.

    “PAC Privacy exploits the uncertainty or entropy of the sensitive data in a meaningful way,  and this allows us to add, in many cases, an order of magnitude less noise. This framework allows us to understand the characteristics of arbitrary data processing and privatize it automatically without artificial modifications. While we are in the early days and we are doing simple examples, we are excited about the promise of this technique,” says Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and co-author of a new paper on PAC Privacy.

    Devadas wrote the paper with lead author Hanshen Xiao, an electrical engineering and computer science graduate student. The research will be presented at the International Cryptography Conference (Crypto 2023).

    Defining privacy

    A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?

    Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.

    PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem.

    For instance, if the sensitive data are images of human faces, differential privacy would focus on whether the adversary can tell if someone’s face was in the dataset. PAC Privacy, on the other hand, could look at whether an adversary could extract a silhouette — an approximation — that someone could recognize as a particular individual’s face.

    Once they established the definition of PAC Privacy, the researchers created an algorithm that automatically tells the user how much noise to add to a model to prevent an adversary from confidently reconstructing a close approximation of the sensitive data. This algorithm guarantees privacy even if the adversary has infinite computing power, Xiao says.

    To find the optimal amount of noise, the PAC Privacy algorithm relies on the uncertainty, or entropy, in the original data from the viewpoint of the adversary.

    This automatic technique takes samples randomly from a data distribution or a large data pool and runs the user’s machine-learning training algorithm on that subsampled data to produce an output learned model. It does this many times on different subsamplings and compares the variance across all outputs. This variance determines how much noise one must add — a smaller variance means less noise is needed.

    Algorithm advantages

    Different from other privacy approaches, the PAC Privacy algorithm does not need knowledge of the inner workings of a model, or the training process.

    When implementing PAC Privacy, a user can specify their desired level of confidence at the outset. For instance, perhaps the user wants a guarantee that an adversary will not be more than 1 percent confident that they have successfully reconstructed the sensitive data to within 5 percent of its actual value. The PAC Privacy algorithm automatically tells the user the optimal amount of noise that needs to be added to the output model before it is shared publicly, in order to achieve those goals.

    “The noise is optimal, in the sense that if you add less than we tell you, all bets could be off. But the effect of adding noise to neural network parameters is complicated, and we are making no promises on the utility drop the model may experience with the added noise,” Xiao says.

    This points to one limitation of PAC Privacy — the technique does not tell the user how much accuracy the model will lose once the noise is added. PAC Privacy also involves repeatedly training a machine-learning model on many subsamplings of data, so it can be computationally expensive.  

    To improve PAC Privacy, one approach is to modify a user’s machine-learning training process so it is more stable, meaning that the output model it produces does not change very much when the input data is subsampled from a data pool.  This stability would create smaller variances between subsample outputs, so not only would the PAC Privacy algorithm need to be run fewer times to identify the optimal amount of noise, but it would also need to add less noise.

    An added benefit of stabler models is that they often have less generalization error, which means they can make more accurate predictions on previously unseen data, a win-win situation between machine learning and privacy, Devadas adds.

    “In the next few years, we would love to look a little deeper into this relationship between stability and privacy, and the relationship between privacy and generalization error. We are knocking on a door here, but it is not clear yet where the door leads,” he says.

    “Obfuscating the usage of an individual’s data in a model is paramount to protecting their privacy. However, to do so can come at the cost of the datas’ and therefore model’s utility,” says Jeremy Goodsitt, senior machine learning engineer at Capital One, who was not involved with this research. “PAC provides an empirical, black-box solution, which can reduce the added noise compared to current practices while maintaining equivalent privacy guarantees. In addition, its empirical approach broadens its reach to more data consuming applications.”

    This research is funded, in part, by DSTA Singapore, Cisco Systems, Capital One, and a MathWorks Fellowship. More