More stories

  • in

    Computing for the health of the planet

    The health of the planet is one of the most important challenges facing humankind today. From climate change to unsafe levels of air and water pollution to coastal and agricultural land erosion, a number of serious challenges threaten human and ecosystem health.

    Ensuring the health and safety of our planet necessitates approaches that connect scientific, engineering, social, economic, and political aspects. New computational methods can play a critical role by providing data-driven models and solutions for cleaner air, usable water, resilient food, efficient transportation systems, better-preserved biodiversity, and sustainable sources of energy.

    The MIT Schwarzman College of Computing is committed to hiring multiple new faculty in computing for climate and the environment, as part of MIT’s plan to recruit 20 climate-focused faculty under its climate action plan. This year the college undertook searches with several departments in the schools of Engineering and Science for shared faculty in computing for health of the planet, one of the six strategic areas of inquiry identified in an MIT-wide planning process to help focus shared hiring efforts. The college also undertook searches for core computing faculty in the Department of Electrical Engineering and Computer Science (EECS).

    The searches are part of an ongoing effort by the MIT Schwarzman College of Computing to hire 50 new faculty — 25 shared with other academic departments and 25 in computer science and artificial intelligence and decision-making. The goal is to build capacity at MIT to help more deeply infuse computing and other disciplines in departments.

    Four interdisciplinary scholars were hired in these searches. They will join the MIT faculty in the coming year to engage in research and teaching that will advance physical understanding of low-carbon energy solutions, Earth-climate modeling, biodiversity monitoring and conservation, and agricultural management through high-performance computing, transformational numerical methods, and machine-learning techniques.

    “By coordinating hiring efforts with multiple departments and schools, we were able to attract a cohort of exceptional scholars in this area to MIT. Each of them is developing and using advanced computational methods and tools to help find solutions for a range of climate and environmental issues,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Warren Ellis Professor of Electrical Engineering and Computer Science. “They will also help strengthen cross-departmental ties in computing across an important, critical area for MIT and the world.”

    “These strategic hires in the area of computing for climate and the environment are an incredible opportunity for the college to deepen its academic offerings and create new opportunity for collaboration across MIT,” says Anantha P. Chandrakasan, dean of the MIT School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. “The college plays a pivotal role in MIT’s overarching effort to hire climate-focused faculty — introducing the critical role of computing to address the health of the planet through innovative research and curriculum.”

    The four new faculty members are:

    Sara Beery will join MIT as an assistant professor in the Faculty of Artificial Intelligence and Decision-Making in EECS in September 2023. Beery received her PhD in computing and mathematical sciences at Caltech in 2022, where she was advised by Pietro Perona. Her research focuses on building computer vision methods that enable global-scale environmental and biodiversity monitoring across data modalities, tackling real-world challenges including strong spatiotemporal correlations, imperfect data quality, fine-grained categories, and long-tailed distributions. She partners with nongovernmental organizations and government agencies to deploy her methods in the wild worldwide and works toward increasing the diversity and accessibility of academic research in artificial intelligence through interdisciplinary capacity building and education.

    Priya Donti will join MIT as an assistant professor in the faculties of Electrical Engineering and Artificial Intelligence and Decision-Making in EECS in academic year 2023-24. Donti recently finished her PhD in the Computer Science Department and the Department of Engineering and Public Policy at Carnegie Mellon University, co-advised by Zico Kolter and Inês Azevedo. Her work focuses on machine learning for forecasting, optimization, and control in high-renewables power grids. Specifically, her research explores methods to incorporate the physics and hard constraints associated with electric power systems into deep learning models. Donti is also co-founder and chair of Climate Change AI, a nonprofit initiative to catalyze impactful work at the intersection of climate change and machine learning that is currently running through the Cornell Tech Runway Startup Postdoc Program.

    Ericmoore Jossou will join MIT as an assistant professor in a shared position between the Department of Nuclear Science and Engineering and the faculty of electrical engineering in EECS in July 2023. He is currently an assistant scientist at the Brookhaven National Laboratory, a U.S. Department of Energy-affiliated lab that conducts research in nuclear and high energy physics, energy science and technology, environmental and bioscience, nanoscience, and national security. His research at MIT will focus on understanding the processing-structure-properties correlation of materials for nuclear energy applications through advanced experiments, multiscale simulations, and data science. Jossou obtained his PhD in mechanical engineering in 2019 from the University of Saskatchewan.

    Sherrie Wang will join MIT as an assistant professor in a shared position between the Department of Mechanical Engineering and the Institute for Data, Systems, and Society in academic year 2023-24. Wang is currently a Ciriacy-Wantrup Postdoctoral Fellow at the University of California at Berkeley, hosted by Solomon Hsiang and the Global Policy Lab. She develops machine learning for Earth observation data. Her primary application areas are improving agricultural management and forecasting climate phenomena. She obtained her PhD in computational and mathematical engineering from Stanford University in 2021, where she was advised by David Lobell. More

  • in

    AI that can learn the patterns of human language

    Human languages are notoriously complex, and linguists have long thought it would be impossible to teach a machine how to analyze speech sounds and word structures in the way human investigators do.

    But researchers at MIT, Cornell University, and McGill University have taken a step in this direction. They have demonstrated an artificial intelligence system that can learn the rules and patterns of human languages on its own.

    When given words and examples of how those words change to express different grammatical functions (like tense, case, or gender) in one language, this machine-learning model comes up with rules that explain why the forms of those words change. For instance, it might learn that the letter “a” must be added to end of a word to make the masculine form feminine in Serbo-Croatian.

    This model can also automatically learn higher-level language patterns that can apply to many languages, enabling it to achieve better results.

    The researchers trained and tested the model using problems from linguistics textbooks that featured 58 different languages. Each problem had a set of words and corresponding word-form changes. The model was able to come up with a correct set of rules to describe those word-form changes for 60 percent of the problems.

    This system could be used to study language hypotheses and investigate subtle similarities in the way diverse languages transform words. It is especially unique because the system discovers models that can be readily understood by humans, and it acquires these models from small amounts of data, such as a few dozen words. And instead of using one massive dataset for a single task, the system utilizes many small datasets, which is closer to how scientists propose hypotheses — they look at multiple related datasets and come up with models to explain phenomena across those datasets.

    “One of the motivations of this work was our desire to study systems that learn models of datasets that is represented in a way that humans can understand. Instead of learning weights, can the model learn expressions or rules? And we wanted to see if we could build this system so it would learn on a whole battery of interrelated datasets, to make the system learn a little bit about how to better model each one,” says Kevin Ellis ’14, PhD ’20, an assistant professor of computer science at Cornell University and lead author of the paper.

    Joining Ellis on the paper are MIT faculty members Adam Albright, a professor of linguistics; Armando Solar-Lezama, a professor and associate director of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences and a member of CSAIL; as well as senior author

    Timothy J. O’Donnell, assistant professor in the Department of Linguistics at McGill University, and Canada CIFAR AI Chair at the Mila – Quebec Artificial Intelligence Institute.

    The research is published today in Nature Communications.

    Looking at language 

    In their quest to develop an AI system that could automatically learn a model from multiple related datasets, the researchers chose to explore the interaction of phonology (the study of sound patterns) and morphology (the study of word structure).

    Data from linguistics textbooks offered an ideal testbed because many languages share core features, and textbook problems showcase specific linguistic phenomena. Textbook problems can also be solved by college students in a fairly straightforward way, but those students typically have prior knowledge about phonology from past lessons they use to reason about new problems.

    Ellis, who earned his PhD at MIT and was jointly advised by Tenenbaum and Solar-Lezama, first learned about morphology and phonology in an MIT class co-taught by O’Donnell, who was a postdoc at the time, and Albright.

    “Linguists have thought that in order to really understand the rules of a human language, to empathize with what it is that makes the system tick, you have to be human. We wanted to see if we can emulate the kinds of knowledge and reasoning that humans (linguists) bring to the task,” says Albright.

    To build a model that could learn a set of rules for assembling words, which is called a grammar, the researchers used a machine-learning technique known as Bayesian Program Learning. With this technique, the model solves a problem by writing a computer program.

    In this case, the program is the grammar the model thinks is the most likely explanation of the words and meanings in a linguistics problem. They built the model using Sketch, a popular program synthesizer which was developed at MIT by Solar-Lezama.

    But Sketch can take a lot of time to reason about the most likely program. To get around this, the researchers had the model work one piece at a time, writing a small program to explain some data, then writing a larger program that modifies that small program to cover more data, and so on.

    They also designed the model so it learns what “good” programs tend to look like. For instance, it might learn some general rules on simple Russian problems that it would apply to a more complex problem in Polish because the languages are similar. This makes it easier for the model to solve the Polish problem.

    Tackling textbook problems

    When they tested the model using 70 textbook problems, it was able to find a grammar that matched the entire set of words in the problem in 60 percent of cases, and correctly matched most of the word-form changes in 79 percent of problems.

    The researchers also tried pre-programming the model with some knowledge it “should” have learned if it was taking a linguistics course, and showed that it could solve all problems better.

    “One challenge of this work was figuring out whether what the model was doing was reasonable. This isn’t a situation where there is one number that is the single right answer. There is a range of possible solutions which you might accept as right, close to right, etc.,” Albright says.

    The model often came up with unexpected solutions. In one instance, it discovered the expected answer to a Polish language problem, but also another correct answer that exploited a mistake in the textbook. This shows that the model could “debug” linguistics analyses, Ellis says.

    The researchers also conducted tests that showed the model was able to learn some general templates of phonological rules that could be applied across all problems.

    “One of the things that was most surprising is that we could learn across languages, but it didn’t seem to make a huge difference,” says Ellis. “That suggests two things. Maybe we need better methods for learning across problems. And maybe, if we can’t come up with those methods, this work can help us probe different ideas we have about what knowledge to share across problems.”

    In the future, the researchers want to use their model to find unexpected solutions to problems in other domains. They could also apply the technique to more situations where higher-level knowledge can be applied across interrelated datasets. For instance, perhaps they could develop a system to infer differential equations from datasets on the motion of different objects, says Ellis.

    “This work shows that we have some methods which can, to some extent, learn inductive biases. But I don’t think we’ve quite figured out, even for these textbook problems, the inductive bias that lets a linguist accept the plausible grammars and reject the ridiculous ones,” he adds.

    “This work opens up many exciting venues for future research. I am particularly intrigued by the possibility that the approach explored by Ellis and colleagues (Bayesian Program Learning, BPL) might speak to how infants acquire language,” says T. Florian Jaeger, a professor of brain and cognitive sciences and computer science at the University of Rochester, who was not an author of this paper. “Future work might ask, for example, under what additional induction biases (assumptions about universal grammar) the BPL approach can successfully achieve human-like learning behavior on the type of data infants observe during language acquisition. I think it would be fascinating to see whether inductive biases that are even more abstract than those considered by Ellis and his team — such as biases originating in the limits of human information processing (e.g., memory constraints on dependency length or capacity limits in the amount of information that can be processed per time) — would be sufficient to induce some of the patterns observed in human languages.”

    This work was funded, in part, by the Air Force Office of Scientific Research, the Center for Brains, Minds, and Machines, the MIT-IBM Watson AI Lab, the Natural Science and Engineering Research Council of Canada, the Fonds de Recherche du Québec – Société et Culture, the Canada CIFAR AI Chairs Program, the National Science Foundation (NSF), and an NSF graduate fellowship. More

  • in

    Taking a magnifying glass to data center operations

    When the MIT Lincoln Laboratory Supercomputing Center (LLSC) unveiled its TX-GAIA supercomputer in 2019, it provided the MIT community a powerful new resource for applying artificial intelligence to their research. Anyone at MIT can submit a job to the system, which churns through trillions of operations per second to train models for diverse applications, such as spotting tumors in medical images, discovering new drugs, or modeling climate effects. But with this great power comes the great responsibility of managing and operating it in a sustainable manner — and the team is looking for ways to improve.

    “We have these powerful computational tools that let researchers build intricate models to solve problems, but they can essentially be used as black boxes. What gets lost in there is whether we are actually using the hardware as effectively as we can,” says Siddharth Samsi, a research scientist in the LLSC. 

    To gain insight into this challenge, the LLSC has been collecting detailed data on TX-GAIA usage over the past year. More than a million user jobs later, the team has released the dataset open source to the computing community.

    Their goal is to empower computer scientists and data center operators to better understand avenues for data center optimization — an important task as processing needs continue to grow. They also see potential for leveraging AI in the data center itself, by using the data to develop models for predicting failure points, optimizing job scheduling, and improving energy efficiency. While cloud providers are actively working on optimizing their data centers, they do not often make their data or models available for the broader high-performance computing (HPC) community to leverage. The release of this dataset and associated code seeks to fill this space.

    “Data centers are changing. We have an explosion of hardware platforms, the types of workloads are evolving, and the types of people who are using data centers is changing,” says Vijay Gadepally, a senior researcher at the LLSC. “Until now, there hasn’t been a great way to analyze the impact to data centers. We see this research and dataset as a big step toward coming up with a principled approach to understanding how these variables interact with each other and then applying AI for insights and improvements.”

    Papers describing the dataset and potential applications have been accepted to a number of venues, including the IEEE International Symposium on High-Performance Computer Architecture, the IEEE International Parallel and Distributed Processing Symposium, the Annual Conference of the North American Chapter of the Association for Computational Linguistics, the IEEE High-Performance and Embedded Computing Conference, and International Conference for High Performance Computing, Networking, Storage and Analysis. 

    Workload classification

    Among the world’s TOP500 supercomputers, TX-GAIA combines traditional computing hardware (central processing units, or CPUs) with nearly 900 graphics processing unit (GPU) accelerators. These NVIDIA GPUs are specialized for deep learning, the class of AI that has given rise to speech recognition and computer vision.

    The dataset covers CPU, GPU, and memory usage by job; scheduling logs; and physical monitoring data. Compared to similar datasets, such as those from Google and Microsoft, the LLSC dataset offers “labeled data, a variety of known AI workloads, and more detailed time series data compared with prior datasets. To our knowledge, it’s one of the most comprehensive and fine-grained datasets available,” Gadepally says. 

    Notably, the team collected time-series data at an unprecedented level of detail: 100-millisecond intervals on every GPU and 10-second intervals on every CPU, as the machines processed more than 3,000 known deep-learning jobs. One of the first goals is to use this labeled dataset to characterize the workloads that different types of deep-learning jobs place on the system. This process would extract features that reveal differences in how the hardware processes natural language models versus image classification or materials design models, for example.   

    The team has now launched the MIT Datacenter Challenge to mobilize this research. The challenge invites researchers to use AI techniques to identify with 95 percent accuracy the type of job that was run, using their labeled time-series data as ground truth.

    Such insights could enable data centers to better match a user’s job request with the hardware best suited for it, potentially conserving energy and improving system performance. Classifying workloads could also allow operators to quickly notice discrepancies resulting from hardware failures, inefficient data access patterns, or unauthorized usage.

    Too many choices

    Today, the LLSC offers tools that let users submit their job and select the processors they want to use, “but it’s a lot of guesswork on the part of users,” Samsi says. “Somebody might want to use the latest GPU, but maybe their computation doesn’t actually need it and they could get just as impressive results on CPUs, or lower-powered machines.”

    Professor Devesh Tiwari at Northeastern University is working with the LLSC team to develop techniques that can help users match their workloads to appropriate hardware. Tiwari explains that the emergence of different types of AI accelerators, GPUs, and CPUs has left users suffering from too many choices. Without the right tools to take advantage of this heterogeneity, they are missing out on the benefits: better performance, lower costs, and greater productivity.

    “We are fixing this very capability gap — making users more productive and helping users do science better and faster without worrying about managing heterogeneous hardware,” says Tiwari. “My PhD student, Baolin Li, is building new capabilities and tools to help HPC users leverage heterogeneity near-optimally without user intervention, using techniques grounded in Bayesian optimization and other learning-based optimization methods. But, this is just the beginning. We are looking into ways to introduce heterogeneity in our data centers in a principled approach to help our users achieve the maximum advantage of heterogeneity autonomously and cost-effectively.”

    Workload classification is the first of many problems to be posed through the Datacenter Challenge. Others include developing AI techniques to predict job failures, conserve energy, or create job scheduling approaches that improve data center cooling efficiencies.

    Energy conservation 

    To mobilize research into greener computing, the team is also planning to release an environmental dataset of TX-GAIA operations, containing rack temperature, power consumption, and other relevant data.

    According to the researchers, huge opportunities exist to improve the power efficiency of HPC systems being used for AI processing. As one example, recent work in the LLSC determined that simple hardware tuning, such as limiting the amount of power an individual GPU can draw, could reduce the energy cost of training an AI model by 20 percent, with only modest increases in computing time. “This reduction translates to approximately an entire week’s worth of household energy for a mere three-hour time increase,” Gadepally says.

    They have also been developing techniques to predict model accuracy, so that users can quickly terminate experiments that are unlikely to yield meaningful results, saving energy. The Datacenter Challenge will share relevant data to enable researchers to explore other opportunities to conserve energy.

    The team expects that lessons learned from this research can be applied to the thousands of data centers operated by the U.S. Department of Defense. The U.S. Air Force is a sponsor of this work, which is being conducted under the USAF-MIT AI Accelerator.

    Other collaborators include researchers at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). Professor Charles Leiserson’s Supertech Research Group is investigating performance-enhancing techniques for parallel computing, and research scientist Neil Thompson is designing studies on ways to nudge data center users toward climate-friendly behavior.

    Samsi presented this work at the inaugural AI for Datacenter Optimization (ADOPT’22) workshop last spring as part of the IEEE International Parallel and Distributed Processing Symposium. The workshop officially introduced their Datacenter Challenge to the HPC community.

    “We hope this research will allow us and others who run supercomputing centers to be more responsive to user needs while also reducing the energy consumption at the center level,” Samsi says. More

  • in

    New leadership at MIT’s Center for Biomedical Innovation

    As it continues in its mission to improve global health through the development and implementation of biomedical innovation, the MIT Center for Biomedical Innovation (CBI) today announced changes to its leadership team: Stacy Springs has been named executive director, and Professor Richard Braatz has joined as the center’s new associate faculty director.

    The change in leadership comes at a time of rapid development in new therapeutic modalities, growing concern over global access to biologic medicines and healthy food, and widespread interest in applying computational tools and multi-disciplinary approaches to address long-standing biomedical challenges.

    “This marks an exciting new chapter for the CBI,” says faculty director Anthony J. Sinskey, professor of biology, who cofounded CBI in 2005. “As I look back at almost 20 years of CBI history, I see an exponential growth in our activities, educational offerings, and impact.”

    The center’s collaborative research model accelerates innovation in biotechnology and biomedical research, drawing on the expertise of faculty and researchers in MIT’s schools of Engineering and Science, the MIT Schwarzman College of Computing, and the MIT Sloan School of Management.

    Springs steps into the role of executive director having previously served as senior director of programs for CBI and as executive director of CBI’s Biomanufacturing Program and its Consortium on Adventitious Agent Contamination in Biomanufacturing (CAACB). She succeeds Gigi Hirsch, who founded the NEW Drug Development ParadIGmS (NEWDIGS) Initiative at CBI in 2009. Hirsch and NEWDIGS have now moved to Tufts Medical Center, establishing a headquarters at the new Center for Biomedical System Design within the Institute for Clinical Research and Health Policy Studies there.

    Braatz, a chemical engineer whose work is informed by mathematical modeling and computational techniques, conducts research in process data analytics, design, and control of advanced manufacturing systems.

    “It’s been great to interact with faculty from across the Institute who have complementary expertise,” says Braatz, the Edwin R. Gilliland Professor in the Department of Chemical Engineering. “Participating in CBI’s workshops has led to fruitful partnerships with companies in tackling industry-wide challenges.”

    CBI is housed under the Institute for Data Systems and Society and, specifically, the Sociotechnical Systems Research Center in the MIT Schwarzman College of Computing. CBI is home to two biomanufacturing consortia: the CAACB and the Biomanufacturing Consortium (BioMAN). Through these precompetitive collaborations, CBI researchers work with biomanufacturers and regulators to advance shared interests in biomanufacturing.

    In addition, CBI researchers are engaged in several sponsored research programs focused on integrated continuous biomanufacturing capabilities for monoclonal antibodies and vaccines, analytical technologies to measure quality and safety attributes of a variety of biologics, including gene and cell therapies, and rapid-cycle development of virus-like particle vaccines for SARS-CoV-2.

    In another significant initiative, CBI researchers are applying data analytics strategies to biomanufacturing problems. “In our smart data analytics project, we are creating new decision support tools and algorithms for biomanufacturing process control and plant-level decision-making. Further, we are leveraging machine learning and natural language processing to improve post-market surveillance studies,” says Springs.

    CBI is also working on advanced manufacturing for cell and gene therapies, among other new modalities, and is a part of the Singapore-MIT Alliance for Research and Technology – Critical Analytics for Manufacturing Personalized-Medicine (SMART CAMP). SMART CAMP is an international research effort focused on developing the analytical tools and biological understanding of critical quality attributes that will enable the manufacture and delivery of improved cell therapies to patients.

    “This is a crucial time for biomanufacturing and for innovation across the health-care value chain. The collaborative efforts of MIT researchers and consortia members will drive fundamental discovery and inform much-needed progress in industry,” says MIT Vice President for Research Maria Zuber.

    “CBI has a track record of engaging with health-care ecosystem challenges. I am confident that under the new leadership, it will continue to inspire MIT, the United States, and the entire world to improve the health of all people,” adds Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing. More

  • in

    Caspar Hare, Georgia Perakis named associate deans of Social and Ethical Responsibilities of Computing

    Caspar Hare and Georgia Perakis have been appointed the new associate deans of the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Stephen A. Schwarzman College of Computing. Their new roles will take effect on Sept. 1.

    “Infusing social and ethical aspects of computing in academic research and education is a critical component of the college mission,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “I look forward to working with Caspar and Georgia on continuing to develop and advance SERC and its reach across MIT. Their complementary backgrounds and their broad connections across MIT will be invaluable to this next chapter of SERC.”

    Caspar Hare

    Hare is a professor of philosophy in the Department of Linguistics and Philosophy. A member of the MIT faculty since 2003, his main interests are in ethics, metaphysics, and epistemology. The general theme of his recent work has been to bring ideas about practical rationality and metaphysics to bear on issues in normative ethics and epistemology. He is the author of two books: “On Myself, and Other, Less Important Subjects” (Princeton University Press 2009), about the metaphysics of perspective, and “The Limits of Kindness” (Oxford University Press 2013), about normative ethics.

    Georgia Perakis

    Perakis is the William F. Pounds Professor of Management and professor of operations research, statistics, and operations management at the MIT Sloan School of Management, where she has been a faculty member since 1998. She investigates the theory and practice of analytics and its role in operations problems and is particularly interested in how to solve complex and practical problems in pricing, revenue management, supply chains, health care, transportation, and energy applications, among other areas. Since 2019, she has been the co-director of the Operations Research Center, an interdepartmental PhD program that jointly reports to MIT Sloan and the MIT Schwarzman College of Computing, a role in which she will remain. Perakis will also assume an associate dean role at MIT Sloan in recognition of her leadership.

    Hare and Perakis succeed David Kaiser, the Germeshausen Professor of the History of Science and professor of physics, and Julie Shah, the H.N. Slater Professor of Aeronautics and Astronautics, who will be stepping down from their roles at the conclusion of their three-year term on Aug. 31.

    “My deepest thanks to Dave and Julie for their tremendous leadership of SERC and contributions to the college as associate deans,” says Huttenlocher.

    SERC impact

    As the inaugural associate deans of SERC, Kaiser and Shah have been responsible for advancing a mission to incorporate humanist, social science, social responsibility, and civic perspectives into MIT’s teaching, research, and implementation of computing. In doing so, they have engaged dozens of faculty members and thousands of students from across MIT during these first three years of the initiative.

    They have brought together people from a broad array of disciplines to collaborate on crafting original materials such as active learning projects, homework assignments, and in-class demonstrations. A collection of these materials was recently published and is now freely available to the world via MIT OpenCourseWare.

    In February 2021, they launched the MIT Case Studies in Social and Ethical Responsibilities of Computing for undergraduate instruction across a range of classes and fields of study. The specially commissioned and peer-reviewed cases are based on original research and are brief by design. Three issues have been published to date and a fourth will be released later this summer. Kaiser will continue to oversee the successful new series as editor.

    Last year, 60 undergraduates, graduate students, and postdocs joined a community of SERC Scholars to help advance SERC efforts in the college. The scholars participate in unique opportunities throughout, such as the summer Experiential Ethics program. A multidisciplinary team of graduate students last winter worked with the instructors and teaching assistants of class 6.036 (Introduction to Machine Learning), MIT’s largest machine learning course, to infuse weekly labs with material covering ethical computing, data and model bias, and fairness in machine learning through SERC.

    Through efforts such as these, SERC has had a substantial impact at MIT and beyond. Over the course of their tenure, Kaiser and Shah have engaged about 80 faculty members, and more than 2,100 students took courses that included new SERC content in the last year alone. SERC’s reach extended well beyond engineering students, with about 500 exposed to SERC content through courses offered in the School of Humanities, Arts, and Social Sciences, the MIT Sloan School of Management, and the School of Architecture and Planning. More

  • in

    A technique to improve both fairness and accuracy in artificial intelligence

    For workers who use machine-learning models to help them make decisions, knowing when to trust a model’s predictions is not always an easy task, especially since these models are often so complex that their inner workings remain a mystery.

    Users sometimes employ a technique, known as selective regression, in which the model estimates its confidence level for each prediction and will reject predictions when its confidence is too low. Then a human can examine those cases, gather additional information, and make a decision about each one manually.

    But while selective regression has been shown to improve the overall performance of a model, researchers at MIT and the MIT-IBM Watson AI Lab have discovered that the technique can have the opposite effect for underrepresented groups of people in a dataset. As the model’s confidence increases with selective regression, its chance of making the right prediction also increases, but this does not always happen for all subgroups.

    For instance, a model suggesting loan approvals might make fewer errors on average, but it may actually make more wrong predictions for Black or female applicants. One reason this can occur is due to the fact that the model’s confidence measure is trained using overrepresented groups and may not be accurate for these underrepresented groups.

    Once they had identified this problem, the MIT researchers developed two algorithms that can remedy the issue. Using real-world datasets, they show that the algorithms reduce performance disparities that had affected marginalized subgroups.

    “Ultimately, this is about being more intelligent about which samples you hand off to a human to deal with. Rather than just minimizing some broad error rate for the model, we want to make sure the error rate across groups is taken into account in a smart way,” says senior MIT author Greg Wornell, the Sumitomo Professor in Engineering in the Department of Electrical Engineering and Computer Science (EECS) who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory of Electronics (RLE) and is a member of the MIT-IBM Watson AI Lab.

    Joining Wornell on the paper are co-lead authors Abhin Shah, an EECS graduate student, and Yuheng Bu, a postdoc in RLE; as well as Joshua Ka-Wing Lee SM ’17, ScD ’21 and Subhro Das, Rameswar Panda, and Prasanna Sattigeri, research staff members at the MIT-IBM Watson AI Lab. The paper will be presented this month at the International Conference on Machine Learning.

    To predict or not to predict

    Regression is a technique that estimates the relationship between a dependent variable and independent variables. In machine learning, regression analysis is commonly used for prediction tasks, such as predicting the price of a home given its features (number of bedrooms, square footage, etc.) With selective regression, the machine-learning model can make one of two choices for each input — it can make a prediction or abstain from a prediction if it doesn’t have enough confidence in its decision.

    When the model abstains, it reduces the fraction of samples it is making predictions on, which is known as coverage. By only making predictions on inputs that it is highly confident about, the overall performance of the model should improve. But this can also amplify biases that exist in a dataset, which occur when the model does not have sufficient data from certain subgroups. This can lead to errors or bad predictions for underrepresented individuals.

    The MIT researchers aimed to ensure that, as the overall error rate for the model improves with selective regression, the performance for every subgroup also improves. They call this monotonic selective risk.

    “It was challenging to come up with the right notion of fairness for this particular problem. But by enforcing this criteria, monotonic selective risk, we can make sure the model performance is actually getting better across all subgroups when you reduce the coverage,” says Shah.

    Focus on fairness

    The team developed two neural network algorithms that impose this fairness criteria to solve the problem.

    One algorithm guarantees that the features the model uses to make predictions contain all information about the sensitive attributes in the dataset, such as race and sex, that is relevant to the target variable of interest. Sensitive attributes are features that may not be used for decisions, often due to laws or organizational policies. The second algorithm employs a calibration technique to ensure the model makes the same prediction for an input, regardless of whether any sensitive attributes are added to that input.

    The researchers tested these algorithms by applying them to real-world datasets that could be used in high-stakes decision making. One, an insurance dataset, is used to predict total annual medical expenses charged to patients using demographic statistics; another, a crime dataset, is used to predict the number of violent crimes in communities using socioeconomic information. Both datasets contain sensitive attributes for individuals.

    When they implemented their algorithms on top of a standard machine-learning method for selective regression, they were able to reduce disparities by achieving lower error rates for the minority subgroups in each dataset. Moreover, this was accomplished without significantly impacting the overall error rate.

    “We see that if we don’t impose certain constraints, in cases where the model is really confident, it could actually be making more errors, which could be very costly in some applications, like health care. So if we reverse the trend and make it more intuitive, we will catch a lot of these errors. A major goal of this work is to avoid errors going silently undetected,” Sattigeri says.

    The researchers plan to apply their solutions to other applications, such as predicting house prices, student GPA, or loan interest rate, to see if the algorithms need to be calibrated for those tasks, says Shah. They also want to explore techniques that use less sensitive information during the model training process to avoid privacy issues.

    And they hope to improve the confidence estimates in selective regression to prevent situations where the model’s confidence is low, but its prediction is correct. This could reduce the workload on humans and further streamline the decision-making process, Sattigeri says.

    This research was funded, in part, by the MIT-IBM Watson AI Lab and its member companies Boston Scientific, Samsung, and Wells Fargo, and by the National Science Foundation. More

  • in

    Costis Daskalakis appointed inaugural Avanessians Professor in the MIT Schwarzman College of Computing

    The MIT Stephen A. Schwarzman College of Computing has named Costis Daskalakis as the inaugural holder of the Avanessians Professorship. His chair began on July 1.

    Daskalakis is the first person appointed to this position generously endowed by Armen Avanessians ’81. Established in the MIT Schwarzman College of Computing, the new chair provides Daskalakis with additional support to pursue his research and develop his career.

    “I’m delighted to recognize Costis for his scholarship and extraordinary achievements with this distinguished professorship,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science.

    A professor in the MIT Department of Electrical Engineering and Computer Science, Daskalakis is a theoretical computer scientist who works at the interface of game theory, economics, probability theory, statistics, and machine learning. He has resolved long-standing open problems about the computational complexity of the Nash equilibrium, the mathematical structure and computational complexity of multi-item auctions, and the behavior of machine-learning methods such as the expectation-maximization algorithm. He has obtained computationally and statistically efficient methods for statistical hypothesis testing and learning in high-dimensional settings, as well as results characterizing the structure and concentration properties of high-dimensional distributions. His current work focuses on multi-agent learning, learning from biased and dependent data, causal inference, and econometrics.

    A native of Greece, Daskalakis joined the MIT faculty in 2009. He is a member of the Computer Science and Artificial Intelligence Laboratory and is affiliated with the Laboratory for Information and Decision Systems and the Operations Research Center. He is also an investigator in the Foundations of Data Science Institute.

    He has previously received such honors as the 2018 Nevanlinna Prize from the International Mathematical Union, the 2018 ACM Grace Murray Hopper Award, the Kalai Game Theory and Computer Science Prize from the Game Theory Society, and the 2008 ACM Doctoral Dissertation Award. More

  • in

    Teaching AI to ask clinical questions

    Physicians often query a patient’s electronic health record for information that helps them make treatment decisions, but the cumbersome nature of these records hampers the process. Research has shown that even when a doctor has been trained to use an electronic health record (EHR), finding an answer to just one question can take, on average, more than eight minutes.

    The more time physicians must spend navigating an oftentimes clunky EHR interface, the less time they have to interact with patients and provide treatment.

    Researchers have begun developing machine-learning models that can streamline the process by automatically finding information physicians need in an EHR. However, training effective models requires huge datasets of relevant medical questions, which are often hard to come by due to privacy restrictions. Existing models struggle to generate authentic questions — those that would be asked by a human doctor — and are often unable to successfully find correct answers.

    To overcome this data shortage, researchers at MIT partnered with medical experts to study the questions physicians ask when reviewing EHRs. Then, they built a publicly available dataset of more than 2,000 clinically relevant questions written by these medical experts.

    When they used their dataset to train a machine-learning model to generate clinical questions, they found that the model asked high-quality and authentic questions, as compared to real questions from medical experts, more than 60 percent of the time.

    With this dataset, they plan to generate vast numbers of authentic medical questions and then use those questions to train a machine-learning model which would help doctors find sought-after information in a patient’s record more efficiently.

    “Two thousand questions may sound like a lot, but when you look at machine-learning models being trained nowadays, they have so much data, maybe billions of data points. When you train machine-learning models to work in health care settings, you have to be really creative because there is such a lack of data,” says lead author Eric Lehman, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    The senior author is Peter Szolovits, a professor in the Department of Electrical Engineering and Computer Science (EECS) who heads the Clinical Decision-Making Group in CSAIL and is also a member of the MIT-IBM Watson AI Lab. The research paper, a collaboration between co-authors at MIT, the MIT-IBM Watson AI Lab, IBM Research, and the doctors and medical experts who helped create questions and participated in the study, will be presented at the annual conference of the North American Chapter of the Association for Computational Linguistics.

    “Realistic data is critical for training models that are relevant to the task yet difficult to find or create,” Szolovits says. “The value of this work is in carefully collecting questions asked by clinicians about patient cases, from which we are able to develop methods that use these data and general language models to ask further plausible questions.”

    Data deficiency

    The few large datasets of clinical questions the researchers were able to find had a host of issues, Lehman explains. Some were composed of medical questions asked by patients on web forums, which are a far cry from physician questions. Other datasets contained questions produced from templates, so they are mostly identical in structure, making many questions unrealistic.

    “Collecting high-quality data is really important for doing machine-learning tasks, especially in a health care context, and we’ve shown that it can be done,” Lehman says.

    To build their dataset, the MIT researchers worked with practicing physicians and medical students in their last year of training. They gave these medical experts more than 100 EHR discharge summaries and told them to read through a summary and ask any questions they might have. The researchers didn’t put any restrictions on question types or structures in an effort to gather natural questions. They also asked the medical experts to identify the “trigger text” in the EHR that led them to ask each question.

    For instance, a medical expert might read a note in the EHR that says a patient’s past medical history is significant for prostate cancer and hypothyroidism. The trigger text “prostate cancer” could lead the expert to ask questions like “date of diagnosis?” or “any interventions done?”

    They found that most questions focused on symptoms, treatments, or the patient’s test results. While these findings weren’t unexpected, quantifying the number of questions about each broad topic will help them build an effective dataset for use in a real, clinical setting, says Lehman.

    Once they had compiled their dataset of questions and accompanying trigger text, they used it to train machine-learning models to ask new questions based on the trigger text.

    Then the medical experts determined whether those questions were “good” using four metrics: understandability (Does the question make sense to a human physician?), triviality (Is the question too easily answerable from the trigger text?), medical relevance (Does it makes sense to ask this question based on the context?), and relevancy to the trigger (Is the trigger related to the question?).

    Cause for concern

    The researchers found that when a model was given trigger text, it was able to generate a good question 63 percent of the time, whereas a human physician would ask a good question 80 percent of the time.

    They also trained models to recover answers to clinical questions using the publicly available datasets they had found at the outset of this project. Then they tested these trained models to see if they could find answers to “good” questions asked by human medical experts.

    The models were only able to recover about 25 percent of answers to physician-generated questions.

    “That result is really concerning. What people thought were good-performing models were, in practice, just awful because the evaluation questions they were testing on were not good to begin with,” Lehman says.

    The team is now applying this work toward their initial goal: building a model that can automatically answer physicians’ questions in an EHR. For the next step, they will use their dataset to train a machine-learning model that can automatically generate thousands or millions of good clinical questions, which can then be used to train a new model for automatic question answering.

    While there is still much work to do before that model could be a reality, Lehman is encouraged by the strong initial results the team demonstrated with this dataset.

    This research was supported, in part, by the MIT-IBM Watson AI Lab. Additional co-authors include Leo Anthony Celi of the MIT Institute for Medical Engineering and Science; Preethi Raghavan and Jennifer J. Liang of the MIT-IBM Watson AI Lab; Dana Moukheiber of the University of Buffalo; Vladislav Lialin and Anna Rumshisky of the University of Massachusetts at Lowell; Katelyn Legaspi, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, and Pia Gabrielle I. Alfonso of the University of the Philippines; Anne Janelle R. Sy and Patricia Therese S. Pile of the University of the East Ramon Magsaysay Memorial Medical Center; Marianne Taliño of the Ateneo de Manila University School of Medicine and Public Health; and Byron C. Wallace of Northeastern University. More