More stories

  • in

    How machine learning models can amplify inequities in medical diagnosis and treatment

    Prior to receiving a PhD in computer science from MIT in 2017, Marzyeh Ghassemi had already begun to wonder whether the use of AI techniques might enhance the biases that already existed in health care. She was one of the early researchers to take up this issue, and she’s been exploring it ever since. In a new paper, Ghassemi, now an assistant professor in MIT’s Department of Electrical Science and Engineering (EECS), and three collaborators based at the Computer Science and Artificial Intelligence Laboratory, have probed the roots of the disparities that can arise in machine learning, often causing models that perform well overall to falter when it comes to subgroups for which relatively few data have been collected and utilized in the training process. The paper — written by two MIT PhD students, Yuzhe Yang and Haoran Zhang, EECS computer scientist Dina Katabi (the Thuan and Nicole Pham Professor), and Ghassemi — was presented last month at the 40th International Conference on Machine Learning in Honolulu, Hawaii.

    In their analysis, the researchers focused on “subpopulation shifts” — differences in the way machine learning models perform for one subgroup as compared to another. “We want the models to be fair and work equally well for all groups, but instead we consistently observe the presence of shifts among different groups that can lead to inferior medical diagnosis and treatment,” says Yang, who along with Zhang are the two lead authors on the paper. The main point of their inquiry is to determine the kinds of subpopulation shifts that can occur and to uncover the mechanisms behind them so that, ultimately, more equitable models can be developed.

    The new paper “significantly advances our understanding” of the subpopulation shift phenomenon, claims Stanford University computer scientist Sanmi Koyejo. “This research contributes valuable insights for future advancements in machine learning models’ performance on underrepresented subgroups.”

    Camels and cattle

    The MIT group has identified four principal types of shifts — spurious correlations, attribute imbalance, class imbalance, and attribute generalization — which, according to Yang, “have never been put together into a coherent and unified framework. We’ve come up with a single equation that shows you where biases can come from.”

    Biases can, in fact, stem from what the researchers call the class, or from the attribute, or both. To pick a simple example, suppose the task assigned to the machine learning model is to sort images of objects — animals in this case — into two classes: cows and camels. Attributes are descriptors that don’t specifically relate to the class itself. It might turn out, for instance, that all the images used in the analysis show cows standing on grass and camels on sand — grass and sand serving as the attributes here. Given the data available to it, the machine could reach an erroneous conclusion — namely that cows can only be found on grass, not on sand, with the opposite being true for camels. Such a finding would be incorrect, however, giving rise to a spurious correlation, which, Yang explains, is a “special case” among subpopulation shifts — “one in which you have a bias in both the class and the attribute.”

    In a medical setting, one could rely on machine learning models to determine whether a person has pneumonia or not based on an examination of X-ray images. There would be two classes in this situation, one consisting of people who have the lung ailment, another for those who are infection-free. A relatively straightforward case would involve just two attributes: the people getting X-rayed are either female or male. If, in this particular dataset, there were 100 males diagnosed with pneumonia for every one female diagnosed with pneumonia, that could lead to an attribute imbalance, and the model would likely do a better job of correctly detecting pneumonia for a man than for a woman. Similarly, having 1,000 times more healthy (pneumonia-free) subjects than sick ones would lead to a class imbalance, with the model biased toward healthy cases. Attribute generalization is the last shift highlighted in the new study. If your sample contained 100 male patients with pneumonia and zero female subjects with the same illness, you still would like the model to be able to generalize and make predictions about female subjects even though there are no samples in the training data for females with pneumonia.

    The team then took 20 advanced algorithms, designed to carry out classification tasks, and tested them on a dozen datasets to see how they performed across different population groups. They reached some unexpected conclusions: By improving the “classifier,” which is the last layer of the neural network, they were able to reduce the occurrence of spurious correlations and class imbalance, but the other shifts were unaffected. Improvements to the “encoder,” one of the uppermost layers in the neural network, could reduce the problem of attribute imbalance. “However, no matter what we did to the encoder or classifier, we did not see any improvements in terms of attribute generalization,” Yang says, “and we don’t yet know how to address that.”

    Precisely accurate

    There is also the question of assessing how well your model actually works in terms of evenhandedness among different population groups. The metric normally used, called worst-group accuracy or WGA, is based on the assumption that if you can improve the accuracy — of, say, medical diagnosis — for the group that has the worst model performance, you would have improved the model as a whole. “The WGA is considered the gold standard in subpopulation evaluation,” the authors contend, but they made a surprising discovery: boosting worst-group accuracy results in a decrease in what they call “worst-case precision.” In medical decision-making of all sorts, one needs both accuracy — which speaks to the validity of the findings — and precision, which relates to the reliability of the methodology. “Precision and accuracy are both very important metrics in classification tasks, and that is especially true in medical diagnostics,” Yang explains. “You should never trade precision for accuracy. You always need to balance the two.”

    The MIT scientists are putting their theories into practice. In a study they’re conducting with a medical center, they’re looking at public datasets for tens of thousands of patients and hundreds of thousands of chest X-rays, trying to see whether it’s possible for machine learning models to work in an unbiased manner for all populations. That’s still far from the case, even though more awareness has been drawn to this problem, Yang says. “We are finding many disparities across different ages, gender, ethnicity, and intersectional groups.”

    He and his colleagues agree on the eventual goal, which is to achieve fairness in health care among all populations. But before we can reach that point, they maintain, we still need a better understanding of the sources of unfairness and how they permeate our current system. Reforming the system as a whole will not be easy, they acknowledge. In fact, the title of the paper they introduced at the Honolulu conference, “Change is Hard,” gives some indications as to the challenges that they and like-minded researchers face. More

  • in

    A simpler method for learning to control a robot

    Researchers from MIT and Stanford University have devised a new machine-learning approach that could be used to control a robot, such as a drone or autonomous vehicle, more effectively and efficiently in dynamic environments where conditions can change rapidly.

    This technique could help an autonomous vehicle learn to compensate for slippery road conditions to avoid going into a skid, allow a robotic free-flyer to tow different objects in space, or enable a drone to closely follow a downhill skier despite being buffeted by strong winds.

    The researchers’ approach incorporates certain structure from control theory into the process for learning a model in such a way that leads to an effective method of controlling complex dynamics, such as those caused by impacts of wind on the trajectory of a flying vehicle. One way to think about this structure is as a hint that can help guide how to control a system.

    “The focus of our work is to learn intrinsic structure in the dynamics of the system that can be leveraged to design more effective, stabilizing controllers,” says Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS). “By jointly learning the system’s dynamics and these unique control-oriented structures from data, we’re able to naturally create controllers that function much more effectively in the real world.”

    Using this structure in a learned model, the researchers’ technique immediately extracts an effective controller from the model, as opposed to other machine-learning methods that require a controller to be derived or learned separately with additional steps. With this structure, their approach is also able to learn an effective controller using fewer data than other approaches. This could help their learning-based control system achieve better performance faster in rapidly changing environments.

    “This work tries to strike a balance between identifying structure in your system and just learning a model from data,” says lead author Spencer M. Richards, a graduate student at Stanford University. “Our approach is inspired by how roboticists use physics to derive simpler models for robots. Physical analysis of these models often yields a useful structure for the purposes of control — one that you might miss if you just tried to naively fit a model to data. Instead, we try to identify similarly useful structure from data that indicates how to implement your control logic.”

    Additional authors of the paper are Jean-Jacques Slotine, professor of mechanical engineering and of brain and cognitive sciences at MIT, and Marco Pavone, associate professor of aeronautics and astronautics at Stanford. The research will be presented at the International Conference on Machine Learning (ICML).

    Learning a controller

    Determining the best way to control a robot to accomplish a given task can be a difficult problem, even when researchers know how to model everything about the system.

    A controller is the logic that enables a drone to follow a desired trajectory, for example. This controller would tell the drone how to adjust its rotor forces to compensate for the effect of winds that can knock it off a stable path to reach its goal.

    This drone is a dynamical system — a physical system that evolves over time. In this case, its position and velocity change as it flies through the environment. If such a system is simple enough, engineers can derive a controller by hand. 

    Modeling a system by hand intrinsically captures a certain structure based on the physics of the system. For instance, if a robot were modeled manually using differential equations, these would capture the relationship between velocity, acceleration, and force. Acceleration is the rate of change in velocity over time, which is determined by the mass of and forces applied to the robot.

    But often the system is too complex to be exactly modeled by hand. Aerodynamic effects, like the way swirling wind pushes a flying vehicle, are notoriously difficult to derive manually, Richards explains. Researchers would instead take measurements of the drone’s position, velocity, and rotor speeds over time, and use machine learning to fit a model of this dynamical system to the data. But these approaches typically don’t learn a control-based structure. This structure is useful in determining how to best set the rotor speeds to direct the motion of the drone over time.

    Once they have modeled the dynamical system, many existing approaches also use data to learn a separate controller for the system.

    “Other approaches that try to learn dynamics and a controller from data as separate entities are a bit detached philosophically from the way we normally do it for simpler systems. Our approach is more reminiscent of deriving models by hand from physics and linking that to control,” Richards says.

    Identifying structure

    The team from MIT and Stanford developed a technique that uses machine learning to learn the dynamics model, but in such a way that the model has some prescribed structure that is useful for controlling the system.

    With this structure, they can extract a controller directly from the dynamics model, rather than using data to learn an entirely separate model for the controller.

    “We found that beyond learning the dynamics, it’s also essential to learn the control-oriented structure that supports effective controller design. Our approach of learning state-dependent coefficient factorizations of the dynamics has outperformed the baselines in terms of data efficiency and tracking capability, proving to be successful in efficiently and effectively controlling the system’s trajectory,” Azizan says. 

    When they tested this approach, their controller closely followed desired trajectories, outpacing all the baseline methods. The controller extracted from their learned model nearly matched the performance of a ground-truth controller, which is built using the exact dynamics of the system.

    “By making simpler assumptions, we got something that actually worked better than other complicated baseline approaches,” Richards adds.

    The researchers also found that their method was data-efficient, which means it achieved high performance even with few data. For instance, it could effectively model a highly dynamic rotor-driven vehicle using only 100 data points. Methods that used multiple learned components saw their performance drop much faster with smaller datasets.

    This efficiency could make their technique especially useful in situations where a drone or robot needs to learn quickly in rapidly changing conditions.

    Plus, their approach is general and could be applied to many types of dynamical systems, from robotic arms to free-flying spacecraft operating in low-gravity environments.

    In the future, the researchers are interested in developing models that are more physically interpretable, and that would be able to identify very specific information about a dynamical system, Richards says. This could lead to better-performing controllers.

    “Despite its ubiquity and importance, nonlinear feedback control remains an art, making it especially suitable for data-driven and learning-based methods. This paper makes a significant contribution to this area by proposing a method that jointly learns system dynamics, a controller, and control-oriented structure,” says Nikolai Matni, an assistant professor in the Department of Electrical and Systems Engineering at the University of Pennsylvania, who was not involved with this work. “What I found particularly exciting and compelling was the integration of these components into a joint learning algorithm, such that control-oriented structure acts as an inductive bias in the learning process. The result is a data-efficient learning process that outputs dynamic models that enjoy intrinsic structure that enables effective, stable, and robust control. While the technical contributions of the paper are excellent themselves, it is this conceptual contribution that I view as most exciting and significant.”

    This research is supported, in part, by the NASA University Leadership Initiative and the Natural Sciences and Engineering Research Council of Canada. More

  • in

    A faster way to teach a robot

    Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

    “Right now, the way we train these robots, when they fail, we don’t really know why. So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that is missing from this system is enabling the robot to demonstrate why it is failing so the user can give it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

    Peng and her collaborators at MIT, New York University, and the University of California at Berkeley created a framework that enables humans to quickly teach a robot what they want it to do, with a minimal amount of effort.

    When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

    Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it can perform a second, similar task.

    The researchers tested this technique in simulations and found that it could teach a robot more efficiently than other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

    This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

    Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

    On-the-job training

    Robots often fail due to distribution shift — the robot is presented with objects and spaces it did not see during training, and it doesn’t understand what to do in this new environment.

    One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

    Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

    “I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color,” Peng says.

    To accomplish this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

    The framework has three steps. First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

    The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts do not impact the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

    In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

    Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

    From human reasoning to robot reasoning

    Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

    “It was so clear right off the bat. Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” she says.

    Then they applied their framework to three simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

    Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes the system to create new data using generative machine-learning models.

    “We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level,” Peng says.

    This research is supported, in part, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. More

  • in

    A new way to look at data privacy

    Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.

    But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.

    MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.

    The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.

    In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings.

    “PAC Privacy exploits the uncertainty or entropy of the sensitive data in a meaningful way,  and this allows us to add, in many cases, an order of magnitude less noise. This framework allows us to understand the characteristics of arbitrary data processing and privatize it automatically without artificial modifications. While we are in the early days and we are doing simple examples, we are excited about the promise of this technique,” says Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and co-author of a new paper on PAC Privacy.

    Devadas wrote the paper with lead author Hanshen Xiao, an electrical engineering and computer science graduate student. The research will be presented at the International Cryptography Conference (Crypto 2023).

    Defining privacy

    A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?

    Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.

    PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem.

    For instance, if the sensitive data are images of human faces, differential privacy would focus on whether the adversary can tell if someone’s face was in the dataset. PAC Privacy, on the other hand, could look at whether an adversary could extract a silhouette — an approximation — that someone could recognize as a particular individual’s face.

    Once they established the definition of PAC Privacy, the researchers created an algorithm that automatically tells the user how much noise to add to a model to prevent an adversary from confidently reconstructing a close approximation of the sensitive data. This algorithm guarantees privacy even if the adversary has infinite computing power, Xiao says.

    To find the optimal amount of noise, the PAC Privacy algorithm relies on the uncertainty, or entropy, in the original data from the viewpoint of the adversary.

    This automatic technique takes samples randomly from a data distribution or a large data pool and runs the user’s machine-learning training algorithm on that subsampled data to produce an output learned model. It does this many times on different subsamplings and compares the variance across all outputs. This variance determines how much noise one must add — a smaller variance means less noise is needed.

    Algorithm advantages

    Different from other privacy approaches, the PAC Privacy algorithm does not need knowledge of the inner workings of a model, or the training process.

    When implementing PAC Privacy, a user can specify their desired level of confidence at the outset. For instance, perhaps the user wants a guarantee that an adversary will not be more than 1 percent confident that they have successfully reconstructed the sensitive data to within 5 percent of its actual value. The PAC Privacy algorithm automatically tells the user the optimal amount of noise that needs to be added to the output model before it is shared publicly, in order to achieve those goals.

    “The noise is optimal, in the sense that if you add less than we tell you, all bets could be off. But the effect of adding noise to neural network parameters is complicated, and we are making no promises on the utility drop the model may experience with the added noise,” Xiao says.

    This points to one limitation of PAC Privacy — the technique does not tell the user how much accuracy the model will lose once the noise is added. PAC Privacy also involves repeatedly training a machine-learning model on many subsamplings of data, so it can be computationally expensive.  

    To improve PAC Privacy, one approach is to modify a user’s machine-learning training process so it is more stable, meaning that the output model it produces does not change very much when the input data is subsampled from a data pool.  This stability would create smaller variances between subsample outputs, so not only would the PAC Privacy algorithm need to be run fewer times to identify the optimal amount of noise, but it would also need to add less noise.

    An added benefit of stabler models is that they often have less generalization error, which means they can make more accurate predictions on previously unseen data, a win-win situation between machine learning and privacy, Devadas adds.

    “In the next few years, we would love to look a little deeper into this relationship between stability and privacy, and the relationship between privacy and generalization error. We are knocking on a door here, but it is not clear yet where the door leads,” he says.

    “Obfuscating the usage of an individual’s data in a model is paramount to protecting their privacy. However, to do so can come at the cost of the datas’ and therefore model’s utility,” says Jeremy Goodsitt, senior machine learning engineer at Capital One, who was not involved with this research. “PAC provides an empirical, black-box solution, which can reduce the added noise compared to current practices while maintaining equivalent privacy guarantees. In addition, its empirical approach broadens its reach to more data consuming applications.”

    This research is funded, in part, by DSTA Singapore, Cisco Systems, Capital One, and a MathWorks Fellowship. More

  • in

    Learning the language of molecules to predict their properties

    Discovering new materials and drugs typically involves a manual, trial-and-error process that can take decades and cost millions of dollars. To streamline this process, scientists often use machine learning to predict molecular properties and narrow down the molecules they need to synthesize and test in the lab.

    Researchers from MIT and the MIT-Watson AI Lab have developed a new, unified framework that can simultaneously predict molecular properties and generate new molecules much more efficiently than these popular deep-learning approaches.

    To teach a machine-learning model to predict a molecule’s biological or mechanical properties, researchers must show it millions of labeled molecular structures — a process known as training. Due to the expense of discovering molecules and the challenges of hand-labeling millions of structures, large training datasets are often hard to come by, which limits the effectiveness of machine-learning approaches.

    By contrast, the system created by the MIT researchers can effectively predict molecular properties using only a small amount of data. Their system has an underlying understanding of the rules that dictate how building blocks combine to produce valid molecules. These rules capture the similarities between molecular structures, which helps the system generate new molecules and predict their properties in a data-efficient manner.

    This method outperformed other machine-learning approaches on both small and large datasets, and was able to accurately predict molecular properties and generate viable molecules when given a dataset with fewer than 100 samples.

    “Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments,” says lead author Minghao Guo, a computer science and electrical engineering (EECS) graduate student.

    Guo’s co-authors include MIT-IBM Watson AI Lab research staff members Veronika Thost, Payel Das, and Jie Chen; recent MIT graduates Samuel Song ’23 and Adithya Balachandran ’23; and senior author Wojciech Matusik, a professor of electrical engineering and computer science and a member of the MIT-IBM Watson AI Lab, who leads the Computational Design and Fabrication Group within the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference for Machine Learning.

    Learning the language of molecules

    To achieve the best results with machine-learning models, scientists need training datasets with millions of molecules that have similar properties to those they hope to discover. In reality, these domain-specific datasets are usually very small. So, researchers use models that have been pretrained on large datasets of general molecules, which they apply to a much smaller, targeted dataset. However, because these models haven’t acquired much domain-specific knowledge, they tend to perform poorly.

    The MIT team took a different approach. They created a machine-learning system that automatically learns the “language” of molecules — what is known as a molecular grammar — using only a small, domain-specific dataset. It uses this grammar to construct viable molecules and predict their properties.

    In language theory, one generates words, sentences, or paragraphs based on a set of grammar rules. You can think of a molecular grammar the same way. It is a set of production rules that dictate how to generate molecules or polymers by combining atoms and substructures.

    Just like a language grammar, which can generate a plethora of sentences using the same rules, one molecular grammar can represent a vast number of molecules. Molecules with similar structures use the same grammar production rules, and the system learns to understand these similarities.

    Since structurally similar molecules often have similar properties, the system uses its underlying knowledge of molecular similarity to predict properties of new molecules more efficiently. 

    “Once we have this grammar as a representation for all the different molecules, we can use it to boost the process of property prediction,” Guo says.

    The system learns the production rules for a molecular grammar using reinforcement learning — a trial-and-error process where the model is rewarded for behavior that gets it closer to achieving a goal.

    But because there could be billions of ways to combine atoms and substructures, the process to learn grammar production rules would be too computationally expensive for anything but the tiniest dataset.

    The researchers decoupled the molecular grammar into two parts. The first part, called a metagrammar, is a general, widely applicable grammar they design manually and give the system at the outset. Then it only needs to learn a much smaller, molecule-specific grammar from the domain dataset. This hierarchical approach speeds up the learning process.

    Big results, small datasets

    In experiments, the researchers’ new system simultaneously generated viable molecules and polymers, and predicted their properties more accurately than several popular machine-learning approaches, even when the domain-specific datasets had only a few hundred samples. Some other methods also required a costly pretraining step that the new system avoids.

    The technique was especially effective at predicting physical properties of polymers, such as the glass transition temperature, which is the temperature required for a material to transition from solid to liquid. Obtaining this information manually is often extremely costly because the experiments require extremely high temperatures and pressures.

    To push their approach further, the researchers cut one training set down by more than half — to just 94 samples. Their model still achieved results that were on par with methods trained using the entire dataset.

    “This grammar-based representation is very powerful. And because the grammar itself is a very general representation, it can be deployed to different kinds of graph-form data. We are trying to identify other applications beyond chemistry or material science,” Guo says.

    In the future, they also want to extend their current molecular grammar to include the 3D geometry of molecules and polymers, which is key to understanding the interactions between polymer chains. They are also developing an interface that would show a user the learned grammar production rules and solicit feedback to correct rules that may be wrong, boosting the accuracy of the system.

    This work is funded, in part, by the MIT-IBM Watson AI Lab and its member company, Evonik. More

  • in

    Researchers teach an AI to write better chart captions

    Chart captions that explain complex trends and patterns are important for improving a reader’s ability to comprehend and retain the data being presented. And for people with visual disabilities, the information in a caption often provides their only means of understanding the chart.

    But writing effective, detailed captions is a labor-intensive process. While autocaptioning techniques can alleviate this burden, they often struggle to describe cognitive features that provide additional context.

    To help people author high-quality chart captions, MIT researchers have developed a dataset to improve automatic captioning systems. Using this tool, researchers could teach a machine-learning model to vary the level of complexity and type of content included in a chart caption based on the needs of users.

    The MIT researchers found that machine-learning models trained for autocaptioning with their dataset consistently generated captions that were precise, semantically rich, and described data trends and complex patterns. Quantitative and qualitative analyses revealed that their models captioned charts more effectively than other autocaptioning systems.  

    The team’s goal is to provide the dataset, called VisText, as a tool researchers can use as they work on the thorny problem of chart autocaptioning. These automatic systems could help provide captions for uncaptioned online charts and improve accessibility for people with visual disabilities, says co-lead author Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    “We’ve tried to embed a lot of human values into our dataset so that when we and other researchers are building automatic chart-captioning systems, we don’t end up with models that aren’t what people want or need,” she says.

    Boggust is joined on the paper by co-lead author and fellow graduate student Benny J. Tang and senior author Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

    Human-centered analysis

    The researchers were inspired to develop VisText from prior work in the Visualization Group that explored what makes a good chart caption. In that study, researchers found that sighted users and blind or low-vision users had different preferences for the complexity of semantic content in a caption. 

    The group wanted to bring that human-centered analysis into autocaptioning research. To do that, they developed VisText, a dataset of charts and associated captions that could be used to train machine-learning models to generate accurate, semantically rich, customizable captions.

    Developing effective autocaptioning systems is no easy task. Existing machine-learning methods often try to caption charts the way they would an image, but people and models interpret natural images differently from how we read charts. Other techniques skip the visual content entirely and caption a chart using its underlying data table. However, such data tables are often not available after charts are published.

    Given the shortfalls of using images and data tables, VisText also represents charts as scene graphs. Scene graphs, which can be extracted from a chart image, contain all the chart data but also include additional image context.

    “A scene graph is like the best of both worlds — it contains almost all the information present in an image while being easier to extract from images than data tables. As it’s also text, we can leverage advances in modern large language models for captioning,” Tang explains.

    They compiled a dataset that contains more than 12,000 charts — each represented as a data table, image, and scene graph — as well as associated captions. Each chart has two separate captions: a low-level caption that describes the chart’s construction (like its axis ranges) and a higher-level caption that describes statistics, relationships in the data, and complex trends.

    The researchers generated low-level captions using an automated system and crowdsourced higher-level captions from human workers.

    “Our captions were informed by two key pieces of prior research: existing guidelines on accessible descriptions of visual media and a conceptual model from our group for categorizing semantic content. This ensured that our captions featured important low-level chart elements like axes, scales, and units for readers with visual disabilities, while retaining human variability in how captions can be written,” says Tang.

    Translating charts

    Once they had gathered chart images and captions, the researchers used VisText to train five machine-learning models for autocaptioning. They wanted to see how each representation — image, data table, and scene graph — and combinations of the representations affected the quality of the caption.

    “You can think about a chart captioning model like a model for language translation. But instead of saying, translate this German text to English, we are saying translate this ‘chart language’ to English,” Boggust says.

    Their results showed that models trained with scene graphs performed as well or better than those trained using data tables. Since scene graphs are easier to extract from existing charts, the researchers argue that they might be a more useful representation.

    They also trained models with low-level and high-level captions separately. This technique, known as semantic prefix tuning, enabled them to teach the model to vary the complexity of the caption’s content.

    In addition, they conducted a qualitative examination of captions produced by their best-performing method and categorized six types of common errors. For instance, a directional error occurs if a model says a trend is decreasing when it is actually increasing.

    This fine-grained, robust qualitative evaluation was important for understanding how the model was making its errors. For example, using quantitative methods, a directional error might incur the same penalty as a repetition error, where the model repeats the same word or phrase. But a directional error could be more misleading to a user than a repetition error. The qualitative analysis helped them understand these types of subtleties, Boggust says.

    These sorts of errors also expose limitations of current models and raise ethical considerations that researchers must consider as they work to develop autocaptioning systems, she adds.

    Generative machine-learning models, such as those that power ChatGPT, have been shown to hallucinate or give incorrect information that can be misleading. While there is a clear benefit to using these models for autocaptioning existing charts, it could lead to the spread of misinformation if charts are captioned incorrectly.

    “Maybe this means that we don’t just caption everything in sight with AI. Instead, perhaps we provide these autocaptioning systems as authorship tools for people to edit. It is important to think about these ethical implications throughout the research process, not just at the end when we have a model to deploy,” she says.

    Boggust, Tang, and their colleagues want to continue optimizing the models to reduce some common errors. They also want to expand the VisText dataset to include more charts, and more complex charts, such as those with stacked bars or multiple lines. And they would also like to gain insights into what these autocaptioning models are actually learning about chart data.

    This research was supported, in part, by a Google Research Scholar Award, the National Science Foundation, the MLA@CSAIL Initiative, and the United States Air Force Research Laboratory. More

  • in

    Day of AI curriculum meets the moment

    MIT Responsible AI for Social Empowerment and Education (RAISE) recently celebrated the second annual Day of AI with two flagship local events. The Edward M. Kennedy Institute for the U.S. Senate in Boston hosted a human rights and data policy-focused event that was streamed worldwide. Dearborn STEM Academy in Roxbury, Massachusetts, hosted a student workshop in collaboration with Amazon Future Engineer. With over 8,000 registrations across all 50 U.S. states and 108 countries in 2023, participation in Day of AI has more than doubled since its inaugural year.

    Day of AI is a free curriculum of lessons and hands-on activities designed to teach kids of all ages and backgrounds the basics and responsible use of artificial intelligence, designed by researchers at MIT RAISE. This year, resources were available for educators to run at any time and in any increments they chose. The curriculum included five new modules to address timely topics like ChatGPT in School, Teachable Machines, AI and Social Media, Data Science and Me, and more. A collaboration with the International Society for Technology in Education also introduced modules for early elementary students. Educators across the world shared photos, videos, and stories of their students’ engagement, expressing excitement and even relief over the accessible lessons.

    Professor Cynthia Breazeal, director of RAISE, dean for digital learning at MIT, and head of the MIT Media Lab’s Personal Robots research group, said, “It’s been a year of extraordinary advancements in AI, and with that comes necessary conversations and concerns about who and what this technology is for. With our Day of AI events, we want to celebrate the teachers and students who are putting in the work to make sure that AI is for everyone.”

    Reflecting community values and protecting digital citizens

    Play video

    On May 18, 2023, MIT RAISE hosted a global Day of AI celebration featuring a flagship local event focused on human rights and data policy at the Edward M. Kennedy Institute for the U.S. Senate. Students from the Warren Prescott Middle School and New Mission High School heard from speakers the City of Boston, Liberty Mutual, and MIT to discuss the many benefits and challenges of artificial intelligence education. Video: MIT Open Learning

    MIT President Sally Kornbluth welcomed students from Warren Prescott Middle School and New Mission High School to the Day of AI program at the Edward M. Kennedy Institute. Kornbluth reflected on the exciting potential of AI, along with the ethical considerations society needs to be responsible for.

    “AI has the potential to do all kinds of fantastic things, including driving a car, helping us with the climate crisis, improving health care, and designing apps that we can’t even imagine yet. But what we have to make sure it doesn’t do is cause harm to individuals, to communities, to us — society as a whole,” she said.

    This theme resonated with each of the event speakers, whose jobs spanned the sectors of education, government, and business. Yo Deshpande, technologist for the public realm, and Michael Lawrence Evans, program director of new urban mechanics from the Boston Mayor’s Office, shared how Boston thinks about using AI to improve city life in ways that are “equitable, accessible, and delightful.” Deshpande said, “We have the opportunity to explore not only how AI works, but how using AI can line up with our values, the way we want to be in the world, and the way we want to be in our community.”

    Adam L’Italien, chief innovation officer at Liberty Mutual Insurance (one of Day of AI’s founding sponsors), compared our present moment with AI technologies to the early days of personal computers and internet connection. “Exposure to emerging technologies can accelerate progress in the world and in your own lives,” L’Italien said, while recognizing that the AI development process needs to be inclusive and mitigate biases.

    Human policies for artificial intelligence

    So how does society address these human rights concerns about AI? Marc Aidinoff ’21, former White House Office of Science and Technology Policy chief of staff, led a discussion on how government policy can influence the parameters of how technology is developed and used, like the Blueprint for an AI Bill of Rights. Aidinoff said, “The work of building the world you want to see is far harder than building the technical AI system … How do you work with other people and create a collective vision for what we want to do?” Warren Prescott Middle School students described how AI could be used to solve problems that humans couldn’t. But they also shared their concerns that AI could affect data privacy, learning deficits, social media addiction, job displacement, and propaganda.

    In a mock U.S. Senate trial activity designed by Daniella DiPaola, PhD student at the MIT Media Lab, the middle schoolers investigated what rights might be undermined by AI in schools, hospitals, law enforcement, and corporations. Meanwhile, New Mission High School students workshopped the ideas behind bill S.2314, the Social Media Addiction Reduction Technology (SMART) Act, in an activity designed by Raechel Walker, graduate research assistant in the Personal Robots Group, and Matt Taylor, research assistant at the Media Lab. They discussed what level of control could or should be introduced at the parental, educational, and governmental levels to reduce the risks of internet addiction.

    “Alexa, how do I program AI?”

    Play video

    The 2023 Day of AI celebration featured a flagship local event at the Dearborn STEM Academy in Roxbury in collaboration with Amazon Future Engineer. Students participated in a hands-on activity using MIT App Inventor as part of Day of AI’s Alexa lesson. Video: MIT Open Learning

    At Dearborn STEM Academy, Amazon Future Engineer helped students work through the Intro to Voice AI curriculum module in real-time. Students used MIT App Inventor to code basic commands for Alexa. In an interview with WCVB, Principal Darlene Marcano said, “It’s important that we expose our students to as many different experiences as possible. The students that are participating are on track to be future computer scientists and engineers.”

    Breazeal told Dearborn students, “We want you to have an informed voice about how you want AI to be used in society. We want you to feel empowered that you can shape the world. You can make things with AI to help make a better world and a better community.”

    Rohit Prasad ’08, senior vice president and head scientist for Alexa at Amazon, and Victor Reinoso ’97, global director of philanthropic education initiatives at Amazon, also joined the event. “Amazon and MIT share a commitment to helping students discover a world of possibilities through STEM and AI education,” said Reinoso. “There’s a lot of current excitement around the technological revolution with generative AI and large language models, so we’re excited to help students explore careers of the future and navigate the pathways available to them.” To highlight their continued investment in the local community and the school program, Amazon donated a $25,000 Innovation and Early College Pathways Program Grant to the Boston Public School system.

    Day of AI down under

    Not only was the Day of AI program widely adopted across the globe, Australian educators were inspired to adapt their own regionally specific curriculum. An estimated 161,000 AI professionals will be needed in Australia by 2030, according to the National Artificial Intelligence Center in the Commonwealth Scientific and Industrial Research Organization (CSIRO), an Australian government agency and Day of AI Australia project partner. CSIRO worked with the University of New South Wales to develop supplementary educational resources on AI ethics and machine learning. Day of AI Australia reached 85,000 students at 400-plus secondary schools this year, sparking curiosity in the next generation of AI experts.

    The interest in AI is accelerating as fast as the technology is being developed. Day of AI offers a unique opportunity for K-12 students to shape our world’s digital future and their own.

    “I hope that some of you will decide to be part of this bigger effort to help us figure out the best possible answers to questions that are raised by AI,” Kornbluth told students at the Edward M. Kennedy Institute. “We’re counting on you, the next generation, to learn how AI works and help make sure it’s for everyone.” More

  • in

    MIT-Pillar AI Collective announces first seed grant recipients

    The MIT-Pillar AI Collective has announced its first six grant recipients. Students, alumni, and postdocs working on a broad range of topics in artificial intelligence, machine learning, and data science will receive funding and support for research projects that could translate into commercially viable products or companies. These grants are intended to help students explore commercial applications for their research, and eventually drive that commercialization through the creation of a startup.

    “These tremendous students and postdocs are working on projects that have the potential to be truly transformative across a diverse range of industries. It’s thrilling to think that the novel research these teams are conducting could lead to the founding of startups that revolutionize everything from drug delivery to video conferencing,” says Anantha Chandrakasan, dean of the School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science.

    Launched in September 2022, the MIT-Pillar AI Collective is a pilot program funded by a $1 million gift from Pillar VC that aims to cultivate prospective entrepreneurs and drive innovation in areas related to AI. Administered by the MIT Deshpande Center for Technological Innovation, the AI Collective centers on the market discovery process, advancing projects through market research, customer discovery, and prototyping. Graduate students and postdocs supported by the program work toward the development of minimum viable products.

    “In addition to funding, the MIT-Pillar AI Collective provides grant recipients with mentorship and guidance. With the rapid advancement of AI technologies, this type of support is critical to ensure students and postdocs are able to access the resources required to move quickly in this fast-pace environment,” says Jinane Abounadi, managing director of the MIT-Pillar AI Collective.

    The six inaugural recipients will receive support in identifying key milestones and advice from experienced entrepreneurs. The AI Collective assists seed grant recipients in gathering feedback from potential end-users, as well as getting insights from early-stage investors. The program also organizes community events, including a “Founder Talks” speaker series, and other team-building activities.   

    “Each one of these grant recipients exhibits an entrepreneurial spirit. It is exciting to provide support and guidance as they start a journey that could one day see them as founders and leaders of successful companies,” adds Jamie Goldstein ’89, founder of Pillar VC.

    The first cohort of grant recipients include the following projects:

    Predictive query interface

    Abdullah Alomar SM ’21, a PhD candidate studying electrical engineering and computer science, is building a predictive query interface for time series databases to better forecast demand and financial data. This user-friendly interface can help alleviate some of the bottlenecks and issues related to unwieldy data engineering processes while providing state-of-the-art statistical accuracy. Alomar is advised by Devavrat Shah, the Andrew (1956) and Erna Viterbi Professor at MIT.

    Design of light-activated drugs

    Simon Axelrod, a PhD candidate studying chemical physics at Harvard University, is combining AI with physics simulations to design light-activated drugs that could reduce side effects and improve effectiveness. Patients would receive an inactive form of a drug, which is then activated by light in a specific area of the body containing diseased tissue. This localized use of photoactive drugs would minimize the side effects from drugs targeting healthy cells. Axelrod is developing novel computational models that predict properties of photoactive drugs with high speed and accuracy, allowing researchers to focus on only the highest-quality drug candidates. He is advised by Rafael Gomez-Bombarelli, the Jeffrey Cheah Career Development Chair in Engineering in the MIT Department of Materials Science and Engineering. 

    Low-cost 3D perception

    Arjun Balasingam, a PhD student in electrical engineering and computer science and a member of the Computer Science and Artificial Intelligence Laboratory’s (CSAIL) Networks and Mobile Systems group, is developing a technology, called MobiSee, that enables real-time 3D reconstruction in challenging dynamic environments. MobiSee uses self-supervised AI methods along with video and lidar to provide low-cost, state-of-the-art 3D perception on consumer mobile devices like smartphones. This technology could have far-reaching applications across mixed reality, navigation, safety, and sports streaming, in addition to unlocking opportunities for new real-time and immersive experiences. He is advised by Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence at MIT and member of CSAIL.

    Sleep therapeutics

    Guillermo Bernal SM ’14, PhD ’23, a recent PhD graduate in media arts and sciences, is developing a sleep therapeutic platform that would enable sleep specialists and researchers to conduct robust sleep studies and develop therapy plans remotely, while the patient is comfortable in their home. Called Fascia, the three-part system consists of a polysomnogram with a sleep mask form factor that collects data, a hub that enables researchers to provide stimulation and feedback via olfactory, auditory, and visual stimuli, and a web portal that enables researchers to read a patient’s signals in real time with machine learning analysis. Bernal was advised by Pattie Maes, professor of media arts and sciences at the MIT Media Lab.

    Autonomous manufacturing assembly with human-like tactile perception

    Michael Foshey, a mechanical engineer and project manager with MIT CSAIL’s Computational Design and Fabrication Group, is developing an AI-enabled tactile perception system that can be used to give robots human-like dexterity. With this new technology platform, Foshey and his team hope to enable industry-changing applications in manufacturing. Currently, assembly tasks in manufacturing are largely done by hand and are typically repetitive and tedious. As a result, these jobs are being largely left unfilled. These labor shortages can cause supply chain shortages and increases in the cost of production. Foshey’s new technology platform aims to address this by automating assembly tasks to reduce reliance on manual labor. Foshey is supervised by Wojciech Matusik, MIT professor of electrical engineering and computer science and member of CSAIL.  

    Generative AI for video conferencing

    Vibhaalakshmi Sivaraman SM ’19, a PhD candidate in electrical engineering and computer science who is a member of CSAIL’s Networking and Mobile Systems Group, is developing a generative technology, Gemino, to facilitate video conferencing in high-latency and low-bandwidth network environments. Gemino is a neural compression system for video conferencing that overcomes the robustness concerns and compute complexity challenges that limit current face-image-synthesis models. This technology could enable sustained video conferencing calls in regions and scenarios that cannot reliably support video calls today. Sivaraman is advised by Mohammad Alizadeh, MIT associate professor of electrical engineering and computer science and member of CSAIL.  More