More stories

  • in

    New hope for early pancreatic cancer intervention via AI-based risk prediction

    The first documented case of pancreatic cancer dates back to the 18th century. Since then, researchers have undertaken a protracted and challenging odyssey to understand the elusive and deadly disease. To date, there is no better cancer treatment than early intervention. Unfortunately, the pancreas, nestled deep within the abdomen, is particularly elusive for early detection. 

    MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) scientists, alongside Limor Appelbaum, a staff scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), were eager to better identify potential high-risk patients. They set out to develop two machine-learning models for early detection of pancreatic ductal adenocarcinoma (PDAC), the most common form of the cancer. To access a broad and diverse database, the team synced up with a federated network company, using electronic health record data from various institutions across the United States. This vast pool of data helped ensure the models’ reliability and generalizability, making them applicable across a wide range of populations, geographical locations, and demographic groups.

    The two models — the “PRISM” neural network, and the logistic regression model (a statistical technique for probability), outperformed current methods. The team’s comparison showed that while standard screening criteria identify about 10 percent of PDAC cases using a five-times higher relative risk threshold, Prism can detect 35 percent of PDAC cases at this same threshold. 

    Using AI to detect cancer risk is not a new phenomena — algorithms analyze mammograms, CT scans for lung cancer, and assist in the analysis of Pap smear tests and HPV testing, to name a few applications. “The PRISM models stand out for their development and validation on an extensive database of over 5 million patients, surpassing the scale of most prior research in the field,” says Kai Jia, an MIT PhD student in electrical engineering and computer science (EECS), MIT CSAIL affiliate, and first author on an open-access paper in eBioMedicine outlining the new work. “The model uses routine clinical and lab data to make its predictions, and the diversity of the U.S. population is a significant advancement over other PDAC models, which are usually confined to specific geographic regions, like a few health-care centers in the U.S. Additionally, using a unique regularization technique in the training process enhanced the models’ generalizability and interpretability.” 

    “This report outlines a powerful approach to use big data and artificial intelligence algorithms to refine our approach to identifying risk profiles for cancer,” says David Avigan, a Harvard Medical School professor and the cancer center director and chief of hematology and hematologic malignancies at BIDMC, who was not involved in the study. “This approach may lead to novel strategies to identify patients with high risk for malignancy that may benefit from focused screening with the potential for early intervention.” 

    Prismatic perspectives

    The journey toward the development of PRISM began over six years ago, fueled by firsthand experiences with the limitations of current diagnostic practices. “Approximately 80-85 percent of pancreatic cancer patients are diagnosed at advanced stages, where cure is no longer an option,” says senior author Appelbaum, who is also a Harvard Medical School instructor as well as radiation oncologist. “This clinical frustration sparked the idea to delve into the wealth of data available in electronic health records (EHRs).”The CSAIL group’s close collaboration with Appelbaum made it possible to understand the combined medical and machine learning aspects of the problem better, eventually leading to a much more accurate and transparent model. “The hypothesis was that these records contained hidden clues — subtle signs and symptoms that could act as early warning signals of pancreatic cancer,” she adds. “This guided our use of federated EHR networks in developing these models, for a scalable approach for deploying risk prediction tools in health care.”Both PrismNN and PrismLR models analyze EHR data, including patient demographics, diagnoses, medications, and lab results, to assess PDAC risk. PrismNN uses artificial neural networks to detect intricate patterns in data features like age, medical history, and lab results, yielding a risk score for PDAC likelihood. PrismLR uses logistic regression for a simpler analysis, generating a probability score of PDAC based on these features. Together, the models offer a thorough evaluation of different approaches in predicting PDAC risk from the same EHR data.

    One paramount point for gaining the trust of physicians, the team notes, is better understanding how the models work, known in the field as interpretability. The scientists pointed out that while logistic regression models are inherently easier to interpret, recent advancements have made deep neural networks somewhat more transparent. This helped the team to refine the thousands of potentially predictive features derived from EHR of a single patient to approximately 85 critical indicators. These indicators, which include patient age, diabetes diagnosis, and an increased frequency of visits to physicians, are automatically discovered by the model but match physicians’ understanding of risk factors associated with pancreatic cancer. 

    The path forward

    Despite the promise of the PRISM models, as with all research, some parts are still a work in progress. U.S. data alone are the current diet for the models, necessitating testing and adaptation for global use. The path forward, the team notes, includes expanding the model’s applicability to international datasets and integrating additional biomarkers for more refined risk assessment.

    “A subsequent aim for us is to facilitate the models’ implementation in routine health care settings. The vision is to have these models function seamlessly in the background of health care systems, automatically analyzing patient data and alerting physicians to high-risk cases without adding to their workload,” says Jia. “A machine-learning model integrated with the EHR system could empower physicians with early alerts for high-risk patients, potentially enabling interventions well before symptoms manifest. We are eager to deploy our techniques in the real world to help all individuals enjoy longer, healthier lives.” 

    Jia wrote the paper alongside Applebaum and MIT EECS Professor and CSAIL Principal Investigator Martin Rinard, who are both senior authors of the paper. Researchers on the paper were supported during their time at MIT CSAIL, in part, by the Defense Advanced Research Projects Agency, Boeing, the National Science Foundation, and Aarno Labs. TriNetX provided resources for the project, and the Prevent Cancer Foundation also supported the team. More

  • in

    2023-24 Takeda Fellows: Advancing research at the intersection of AI and health

    The School of Engineering has selected 13 new Takeda Fellows for the 2023-24 academic year. With support from Takeda, the graduate students will conduct pathbreaking research ranging from remote health monitoring for virtual clinical trials to ingestible devices for at-home, long-term diagnostics.

    Now in its fourth year, the MIT-Takeda Program, a collaboration between MIT’s School of Engineering and Takeda, fuels the development and application of artificial intelligence capabilities to benefit human health and drug development. Part of the Abdul Latif Jameel Clinic for Machine Learning in Health, the program coalesces disparate disciplines, merges theory and practical implementation, combines algorithm and hardware innovations, and creates multidimensional collaborations between academia and industry.

    The 2023-24 Takeda Fellows are:

    Adam Gierlach

    Adam Gierlach is a PhD candidate in the Department of Electrical Engineering and Computer Science. Gierlach’s work combines innovative biotechnology with machine learning to create ingestible devices for advanced diagnostics and delivery of therapeutics. In his previous work, Gierlach developed a non-invasive, ingestible device for long-term gastric recordings in free-moving patients. With the support of a Takeda Fellowship, he will build on this pathbreaking work by developing smart, energy-efficient, ingestible devices powered by application-specific integrated circuits for at-home, long-term diagnostics. These revolutionary devices — capable of identifying, characterizing, and even correcting gastrointestinal diseases — represent the leading edge of biotechnology. Gierlach’s innovative contributions will help to advance fundamental research on the enteric nervous system and help develop a better understanding of gut-brain axis dysfunctions in Parkinson’s disease, autism spectrum disorder, and other prevalent disorders and conditions.

    Vivek Gopalakrishnan

    Vivek Gopalakrishnan is a PhD candidate in the Harvard-MIT Program in Health Sciences and Technology. Gopalakrishnan’s goal is to develop biomedical machine-learning methods to improve the study and treatment of human disease. Specifically, he employs computational modeling to advance new approaches for minimally invasive, image-guided neurosurgery, offering a safe alternative to open brain and spinal procedures. With the support of a Takeda Fellowship, Gopalakrishnan will develop real-time computer vision algorithms that deliver high-quality, 3D intraoperative image guidance by extracting and fusing information from multimodal neuroimaging data. These algorithms could allow surgeons to reconstruct 3D neurovasculature from X-ray angiography, thereby enhancing the precision of device deployment and enabling more accurate localization of healthy versus pathologic anatomy.

    Hao He

    Hao He is a PhD candidate in the Department of Electrical Engineering and Computer Science. His research interests lie at the intersection of generative AI, machine learning, and their applications in medicine and human health, with a particular emphasis on passive, continuous, remote health monitoring to support virtual clinical trials and health-care management. More specifically, He aims to develop trustworthy AI models that promote equitable access and deliver fair performance independent of race, gender, and age. In his past work, He has developed monitoring systems applied in clinical studies of Parkinson’s disease, Alzheimer’s disease, and epilepsy. Supported by a Takeda Fellowship, He will develop a novel technology for the passive monitoring of sleep stages (using radio signaling) that seeks to address existing gaps in performance across different demographic groups. His project will tackle the problem of imbalance in available datasets and account for intrinsic differences across subpopulations, using generative AI and multi-modality/multi-domain learning, with the goal of learning robust features that are invariant to different subpopulations. He’s work holds great promise for delivering advanced, equitable health-care services to all people and could significantly impact health care and AI.

    Chengyi Long

    Chengyi Long is a PhD candidate in the Department of Civil and Environmental Engineering. Long’s interdisciplinary research integrates the methodology of physics, mathematics, and computer science to investigate questions in ecology. Specifically, Long is developing a series of potentially groundbreaking techniques to explain and predict the temporal dynamics of ecological systems, including human microbiota, which are essential subjects in health and medical research. His current work, supported by a Takeda Fellowship, is focused on developing a conceptual, mathematical, and practical framework to understand the interplay between external perturbations and internal community dynamics in microbial systems, which may serve as a key step toward finding bio solutions to health management. A broader perspective of his research is to develop AI-assisted platforms to anticipate the changing behavior of microbial systems, which may help to differentiate between healthy and unhealthy hosts and design probiotics for the prevention and mitigation of pathogen infections. By creating novel methods to address these issues, Long’s research has the potential to offer powerful contributions to medicine and global health.

    Omar Mohd

    Omar Mohd is a PhD candidate in the Department of Electrical Engineering and Computer Science. Mohd’s research is focused on developing new technologies for the spatial profiling of microRNAs, with potentially important applications in cancer research. Through innovative combinations of micro-technologies and AI-enabled image analysis to measure the spatial variations of microRNAs within tissue samples, Mohd hopes to gain new insights into drug resistance in cancer. This work, supported by a Takeda Fellowship, falls within the emerging field of spatial transcriptomics, which seeks to understand cancer and other diseases by examining the relative locations of cells and their contents within tissues. The ultimate goal of Mohd’s current project is to find multidimensional patterns in tissues that may have prognostic value for cancer patients. One valuable component of his work is an open-source AI program developed with collaborators at Beth Israel Deaconess Medical Center and Harvard Medical School to auto-detect cancer epithelial cells from other cell types in a tissue sample and to correlate their abundance with the spatial variations of microRNAs. Through his research, Mohd is making innovative contributions at the interface of microsystem technology, AI-based image analysis, and cancer treatment, which could significantly impact medicine and human health.

    Sanghyun Park

    Sanghyun Park is a PhD candidate in the Department of Mechanical Engineering. Park specializes in the integration of AI and biomedical engineering to address complex challenges in human health. Drawing on his expertise in polymer physics, drug delivery, and rheology, his research focuses on the pioneering field of in-situ forming implants (ISFIs) for drug delivery. Supported by a Takeda Fellowship, Park is currently developing an injectable formulation designed for long-term drug delivery. The primary goal of his research is to unravel the compaction mechanism of drug particles in ISFI formulations through comprehensive modeling and in-vitro characterization studies utilizing advanced AI tools. He aims to gain a thorough understanding of this unique compaction mechanism and apply it to drug microcrystals to achieve properties optimal for long-term drug delivery. Beyond these fundamental studies, Park’s research also focuses on translating this knowledge into practical applications in a clinical setting through animal studies specifically aimed at extending drug release duration and improving mechanical properties. The innovative use of AI in developing advanced drug delivery systems, coupled with Park’s valuable insights into the compaction mechanism, could contribute to improving long-term drug delivery. This work has the potential to pave the way for effective management of chronic diseases, benefiting patients, clinicians, and the pharmaceutical industry.

    Huaiyao Peng

    Huaiyao Peng is a PhD candidate in the Department of Biological Engineering. Peng’s research interests are focused on engineered tissue, microfabrication platforms, cancer metastasis, and the tumor microenvironment. Specifically, she is advancing novel AI techniques for the development of pre-cancer organoid models of high-grade serous ovarian cancer (HGSOC), an especially lethal and difficult-to-treat cancer, with the goal of gaining new insights into progression and effective treatments. Peng’s project, supported by a Takeda Fellowship, will be one of the first to use cells from serous tubal intraepithelial carcinoma lesions found in the fallopian tubes of many HGSOC patients. By examining the cellular and molecular changes that occur in response to treatment with small molecule inhibitors, she hopes to identify potential biomarkers and promising therapeutic targets for HGSOC, including personalized treatment options for HGSOC patients, ultimately improving their clinical outcomes. Peng’s work has the potential to bring about important advances in cancer treatment and spur innovative new applications of AI in health care. 

    Priyanka Raghavan

    Priyanka Raghavan is a PhD candidate in the Department of Chemical Engineering. Raghavan’s research interests lie at the frontier of predictive chemistry, integrating computational and experimental approaches to build powerful new predictive tools for societally important applications, including drug discovery. Specifically, Raghavan is developing novel models to predict small-molecule substrate reactivity and compatibility in regimes where little data is available (the most realistic regimes). A Takeda Fellowship will enable Raghavan to push the boundaries of her research, making innovative use of low-data and multi-task machine learning approaches, synthetic chemistry, and robotic laboratory automation, with the goal of creating an autonomous, closed-loop system for the discovery of high-yielding organic small molecules in the context of underexplored reactions. Raghavan’s work aims to identify new, versatile reactions to broaden a chemist’s synthetic toolbox with novel scaffolds and substrates that could form the basis of essential drugs. Her work has the potential for far-reaching impacts in early-stage, small-molecule discovery and could help make the lengthy drug-discovery process significantly faster and cheaper.

    Zhiye Song

    Zhiye “Zoey” Song is a PhD candidate in the Department of Electrical Engineering and Computer Science. Song’s research integrates cutting-edge approaches in machine learning (ML) and hardware optimization to create next-generation, wearable medical devices. Specifically, Song is developing novel approaches for the energy-efficient implementation of ML computation in low-power medical devices, including a wearable ultrasound “patch” that captures and processes images for real-time decision-making capabilities. Her recent work, conducted in collaboration with clinicians, has centered on bladder volume monitoring; other potential applications include blood pressure monitoring, muscle diagnosis, and neuromodulation. With the support of a Takeda Fellowship, Song will build on that promising work and pursue key improvements to existing wearable device technologies, including developing low-compute and low-memory ML algorithms and low-power chips to enable ML on smart wearable devices. The technologies emerging from Song’s research could offer exciting new capabilities in health care, enabling powerful and cost-effective point-of-care diagnostics and expanding individual access to autonomous and continuous medical monitoring.

    Peiqi Wang

    Peiqi Wang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Wang’s research aims to develop machine learning methods for learning and interpretation from medical images and associated clinical data to support clinical decision-making. He is developing a multimodal representation learning approach that aligns knowledge captured in large amounts of medical image and text data to transfer this knowledge to new tasks and applications. Supported by a Takeda Fellowship, Wang will advance this promising line of work to build robust tools that interpret images, learn from sparse human feedback, and reason like doctors, with potentially major benefits to important stakeholders in health care.

    Oscar Wu

    Haoyang “Oscar” Wu is a PhD candidate in the Department of Chemical Engineering. Wu’s research integrates quantum chemistry and deep learning methods to accelerate the process of small-molecule screening in the development of new drugs. By identifying and automating reliable methods for finding transition state geometries and calculating barrier heights for new reactions, Wu’s work could make it possible to conduct the high-throughput ab initio calculations of reaction rates needed to screen the reactivity of large numbers of active pharmaceutical ingredients (APIs). A Takeda Fellowship will support his current project to: (1) develop open-source software for high-throughput quantum chemistry calculations, focusing on the reactivity of drug-like molecules, and (2) develop deep learning models that can quantitatively predict the oxidative stability of APIs. The tools and insights resulting from Wu’s research could help to transform and accelerate the drug-discovery process, offering significant benefits to the pharmaceutical and medical fields and to patients.

    Soojung Yang

    Soojung Yang is a PhD candidate in the Department of Materials Science and Engineering. Yang’s research applies cutting-edge methods in geometric deep learning and generative modeling, along with atomistic simulations, to better understand and model protein dynamics. Specifically, Yang is developing novel tools in generative AI to explore protein conformational landscapes that offer greater speed and detail than physics-based simulations at a substantially lower cost. With the support of a Takeda Fellowship, she will build upon her successful work on the reverse transformation of coarse-grained proteins to the all-atom resolution, aiming to build machine-learning models that bridge multiple size scales of protein conformation diversity (all-atom, residue-level, and domain-level). Yang’s research holds the potential to provide a powerful and widely applicable new tool for researchers who seek to understand the complex protein functions at work in human diseases and to design drugs to treat and cure those diseases.

    Yuzhe Yang

    Yuzhe Yang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Yang’s research interests lie at the intersection of machine learning and health care. In his past and current work, Yang has developed and applied innovative machine-learning models that address key challenges in disease diagnosis and tracking. His many notable achievements include the creation of one of the first machine learning-based solutions using nocturnal breathing signals to detect Parkinson’s disease (PD), estimate disease severity, and track PD progression. With the support of a Takeda Fellowship, Yang will expand this promising work to develop an AI-based diagnosis model for Alzheimer’s disease (AD) using sleep-breathing data that is significantly more reliable, flexible, and economical than current diagnostic tools. This passive, in-home, contactless monitoring system — resembling a simple home Wi-Fi router — will also enable remote disease assessment and continuous progression tracking. Yang’s groundbreaking work has the potential to advance the diagnosis and treatment of prevalent diseases like PD and AD, and it offers exciting possibilities for addressing many health challenges with reliable, affordable machine-learning tools.  More

  • in

    Making genetic prediction models more inclusive

    While any two human genomes are about 99.9 percent identical, genetic variation in the remaining 0.1 percent plays an important role in shaping human diversity, including a person’s risk for developing certain diseases.

    Measuring the cumulative effect of these small genetic differences can provide an estimate of an individual’s genetic risk for a particular disease or their likelihood of having a particular trait. However, the majority of models used to generate these “polygenic scores” are based on studies done in people of European descent, and do not accurately gauge the risk for people of non-European ancestry or people whose genomes contain a mixture of chromosome regions inherited from previously isolated populations, also known as admixed ancestry.

    In an effort to make these genetic scores more inclusive, MIT researchers have created a new model that takes into account genetic information from people from a wider diversity of genetic ancestries across the world. Using this model, they showed that they could increase the accuracy of genetics-based predictions for a variety of traits, especially for people from populations that have been traditionally underrepresented in genetic studies.

    “For people of African ancestry, our model proved to be about 60 percent more accurate on average,” says Manolis Kellis, a professor of computer science in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and a member of the Broad Institute of MIT and Harvard. “For people of admixed genetic backgrounds more broadly, who have been excluded from most previous models, the accuracy of our model increased by an average of about 18 percent.”

    The researchers hope their more inclusive modeling approach could help improve health outcomes for a wider range of people and promote health equity by spreading the benefits of genomic sequencing more widely across the globe.

    “What we have done is created a method that allows you to be much more accurate for admixed and ancestry-diverse individuals, and ensure the results and the benefits of human genetics research are equally shared by everyone,” says MIT postdoc Yosuke Tanigawa, the lead and co-corresponding author of the paper, which appears today in open-access form in the American Journal of Human Genetics. The researchers have made all of their data publicly available for the broader scientific community to use.

    More inclusive models

    The work builds on the Human Genome Project, which mapped all of the genes found in the human genome, and on subsequent large-scale, cohort-based studies of how genetic variants in the human genome are linked to disease risk and other differences between individuals.

    These studies showed that the effect of any individual genetic variant on its own is typically very small. Together, these small effects add up and influence the risk of developing heart disease or diabetes, having a stroke, or being diagnosed with psychiatric disorders such as schizophrenia.

    “We have hundreds of thousands of genetic variants that are associated with complex traits, each of which is individually playing a weak effect, but together they are beginning to be predictive for disease predispositions,” Kellis says.

    However, most of these genome-wide association studies included few people of non-European descent, so polygenic risk models based on them translate poorly to non-European populations. People from different geographic areas can have different patterns of genetic variation, shaped by stochastic drift, population history, and environmental factors — for example, in people of African descent, genetic variants that protect against malaria are more common than in other populations. Those variants also affect other traits involving the immune system, such as counts of neutrophils, a type of immune cell. That variation would not be well-captured in a model based on genetic analysis of people of European ancestry alone.

    “If you are an individual of African descent, of Latin American descent, of Asian descent, then you are currently being left out by the system,” Kellis says. “This inequity in the utilization of genetic information for predicting risk of patients can cause unnecessary burden, unnecessary deaths, and unnecessary lack of prevention, and that’s where our work comes in.”

    Some researchers have begun trying to address these disparities by creating distinct models for people of European descent, of African descent, or of Asian descent. These emerging approaches assign individuals to distinct genetic ancestry groups, aggregate the data to create an association summary, and make genetic prediction models. However, these approaches still don’t represent people of admixed genetic backgrounds well.

    “Our approach builds on the previous work without requiring researchers to assign individuals or local genomic segments of individuals to predefined distinct genetic ancestry groups,” Tanigawa says. “Instead, we develop a single model for everybody by directly working on individuals across the continuum of their genetic ancestries.”

    In creating their new model, the MIT team used computational and statistical techniques that enabled them to study each individual’s unique genetic profile instead of grouping individuals by population. This methodological advancement allowed the researchers to include people of admixed ancestry, who made up nearly 10 percent of the UK Biobank dataset used for this study and currently account for about one in seven newborns in the United States.

    “Because we work at the individual level, there is no need for computing summary-level data for different populations,” Kellis says. “Thus, we did not need to exclude individuals of admixed ancestry, increasing our power by including more individuals and representing contributions from all populations in our combined model.”

    Better predictions

    To create their new model, the researchers used genetic data from more than 280,000 people, which was collected by UK Biobank, a large-scale biomedical database and research resource containing de-identified genetic, lifestyle, and health information from half a million U.K. participants. Using another set of about 81,000 held-out individuals from the UK Biobank, the researchers evaluated their model across 60 traits, which included traits related to body size and shape, such as height and body mass index, as well as blood traits such as white blood cell count and red blood cell count, which also have a genetic basis.

    The researchers found that, compared to models trained only on European-ancestry individuals, their model’s predictions are more accurate for all genetic ancestry groups. The most notable gain was for people of African ancestry, who showed 61 percent average improvements, even though they only made up about 1.5 percent of samples in UK Biobank. The researchers also saw improvements of 11 percent for people of South Asian descent and 5 percent for white British people. Predictions for people of admixed ancestry improved by about 18 percent.

    “When you bring all the individuals together in the training set, everybody contributes to the training of the polygenic score modeling on equal footing,” Tanigawa says. “Combined with increasingly more inclusive data collection efforts, our method can help leverage these efforts to improve predictive accuracy for all.”

    The MIT team hopes its approach can eventually be incorporated into tests of an individual’s risk of a variety of diseases. Such tests could be combined with conventional risk factors and used to help doctors diagnose disease or to help people manage their risk for certain diseases before they develop.

    “Our work highlights the power of diversity, equity, and inclusion efforts in the context of genomics research,” Tanigawa says.

    The researchers now hope to add even more data to their model, including data from the United States, and to apply it to additional traits that they didn’t analyze in this study.

    “This is just the start,” Kellis says. “We can’t wait to see more people join our effort to propel inclusive human genetics research.”

    The research was funded by the National Institutes of Health. More

  • in

    How an archeological approach can help leverage biased data in AI to improve medicine

    The classic computer science adage “garbage in, garbage out” lacks nuance when it comes to understanding biased medical data, argue computer science and bioethics professors from MIT, Johns Hopkins University, and the Alan Turing Institute in a new opinion piece published in a recent edition of the New England Journal of Medicine (NEJM). The rising popularity of artificial intelligence has brought increased scrutiny to the matter of biased AI models resulting in algorithmic discrimination, which the White House Office of Science and Technology identified as a key issue in their recent Blueprint for an AI Bill of Rights. 

    When encountering biased data, particularly for AI models used in medical settings, the typical response is to either collect more data from underrepresented groups or generate synthetic data making up for missing parts to ensure that the model performs equally well across an array of patient populations. But the authors argue that this technical approach should be augmented with a sociotechnical perspective that takes both historical and current social factors into account. By doing so, researchers can be more effective in addressing bias in public health. 

    “The three of us had been discussing the ways in which we often treat issues with data from a machine learning perspective as irritations that need to be managed with a technical solution,” recalls co-author Marzyeh Ghassemi, an assistant professor in electrical engineering and computer science and an affiliate of the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Institute of Medical Engineering and Science (IMES). “We had used analogies of data as an artifact that gives a partial view of past practices, or a cracked mirror holding up a reflection. In both cases the information is perhaps not entirely accurate or favorable: Maybe we think that we behave in certain ways as a society — but when you actually look at the data, it tells a different story. We might not like what that story is, but once you unearth an understanding of the past you can move forward and take steps to address poor practices.” 

    Data as artifact 

    In the paper, titled “Considering Biased Data as Informative Artifacts in AI-Assisted Health Care,” Ghassemi, Kadija Ferryman, and Maxine Mackintosh make the case for viewing biased clinical data as “artifacts” in the same way anthropologists or archeologists would view physical objects: pieces of civilization-revealing practices, belief systems, and cultural values — in the case of the paper, specifically those that have led to existing inequities in the health care system. 

    For example, a 2019 study showed that an algorithm widely considered to be an industry standard used health-care expenditures as an indicator of need, leading to the erroneous conclusion that sicker Black patients require the same level of care as healthier white patients. What researchers found was algorithmic discrimination failing to account for unequal access to care.  

    In this instance, rather than viewing biased datasets or lack of data as problems that only require disposal or fixing, Ghassemi and her colleagues recommend the “artifacts” approach as a way to raise awareness around social and historical elements influencing how data are collected and alternative approaches to clinical AI development. 

    “If the goal of your model is deployment in a clinical setting, you should engage a bioethicist or a clinician with appropriate training reasonably early on in problem formulation,” says Ghassemi. “As computer scientists, we often don’t have a complete picture of the different social and historical factors that have gone into creating data that we’ll be using. We need expertise in discerning when models generalized from existing data may not work well for specific subgroups.” 

    When more data can actually harm performance 

    The authors acknowledge that one of the more challenging aspects of implementing an artifact-based approach is being able to assess whether data have been racially corrected: i.e., using white, male bodies as the conventional standard that other bodies are measured against. The opinion piece cites an example from the Chronic Kidney Disease Collaboration in 2021, which developed a new equation to measure kidney function because the old equation had previously been “corrected” under the blanket assumption that Black people have higher muscle mass. Ghassemi says that researchers should be prepared to investigate race-based correction as part of the research process. 

    In another recent paper accepted to this year’s International Conference on Machine Learning co-authored by Ghassemi’s PhD student Vinith Suriyakumar and University of California at San Diego Assistant Professor Berk Ustun, the researchers found that assuming the inclusion of personalized attributes like self-reported race improve the performance of ML models can actually lead to worse risk scores, models, and metrics for minority and minoritized populations.  

    “There’s no single right solution for whether or not to include self-reported race in a clinical risk score. Self-reported race is a social construct that is both a proxy for other information, and deeply proxied itself in other medical data. The solution needs to fit the evidence,” explains Ghassemi. 

    How to move forward 

    This is not to say that biased datasets should be enshrined, or biased algorithms don’t require fixing — quality training data is still key to developing safe, high-performance clinical AI models, and the NEJM piece highlights the role of the National Institutes of Health (NIH) in driving ethical practices.  

    “Generating high-quality, ethically sourced datasets is crucial for enabling the use of next-generation AI technologies that transform how we do research,” NIH acting director Lawrence Tabak stated in a press release when the NIH announced its $130 million Bridge2AI Program last year. Ghassemi agrees, pointing out that the NIH has “prioritized data collection in ethical ways that cover information we have not previously emphasized the value of in human health — such as environmental factors and social determinants. I’m very excited about their prioritization of, and strong investments towards, achieving meaningful health outcomes.” 

    Elaine Nsoesie, an associate professor at the Boston University of Public Health, believes there are many potential benefits to treating biased datasets as artifacts rather than garbage, starting with the focus on context. “Biases present in a dataset collected for lung cancer patients in a hospital in Uganda might be different from a dataset collected in the U.S. for the same patient population,” she explains. “In considering local context, we can train algorithms to better serve specific populations.” Nsoesie says that understanding the historical and contemporary factors shaping a dataset can make it easier to identify discriminatory practices that might be coded in algorithms or systems in ways that are not immediately obvious. She also notes that an artifact-based approach could lead to the development of new policies and structures ensuring that the root causes of bias in a particular dataset are eliminated. 

    “People often tell me that they are very afraid of AI, especially in health. They’ll say, ‘I’m really scared of an AI misdiagnosing me,’ or ‘I’m concerned it will treat me poorly,’” Ghassemi says. “I tell them, you shouldn’t be scared of some hypothetical AI in health tomorrow, you should be scared of what health is right now. If we take a narrow technical view of the data we extract from systems, we could naively replicate poor practices. That’s not the only option — realizing there is a problem is our first step towards a larger opportunity.”  More

  • in

    How machine learning models can amplify inequities in medical diagnosis and treatment

    Prior to receiving a PhD in computer science from MIT in 2017, Marzyeh Ghassemi had already begun to wonder whether the use of AI techniques might enhance the biases that already existed in health care. She was one of the early researchers to take up this issue, and she’s been exploring it ever since. In a new paper, Ghassemi, now an assistant professor in MIT’s Department of Electrical Science and Engineering (EECS), and three collaborators based at the Computer Science and Artificial Intelligence Laboratory, have probed the roots of the disparities that can arise in machine learning, often causing models that perform well overall to falter when it comes to subgroups for which relatively few data have been collected and utilized in the training process. The paper — written by two MIT PhD students, Yuzhe Yang and Haoran Zhang, EECS computer scientist Dina Katabi (the Thuan and Nicole Pham Professor), and Ghassemi — was presented last month at the 40th International Conference on Machine Learning in Honolulu, Hawaii.

    In their analysis, the researchers focused on “subpopulation shifts” — differences in the way machine learning models perform for one subgroup as compared to another. “We want the models to be fair and work equally well for all groups, but instead we consistently observe the presence of shifts among different groups that can lead to inferior medical diagnosis and treatment,” says Yang, who along with Zhang are the two lead authors on the paper. The main point of their inquiry is to determine the kinds of subpopulation shifts that can occur and to uncover the mechanisms behind them so that, ultimately, more equitable models can be developed.

    The new paper “significantly advances our understanding” of the subpopulation shift phenomenon, claims Stanford University computer scientist Sanmi Koyejo. “This research contributes valuable insights for future advancements in machine learning models’ performance on underrepresented subgroups.”

    Camels and cattle

    The MIT group has identified four principal types of shifts — spurious correlations, attribute imbalance, class imbalance, and attribute generalization — which, according to Yang, “have never been put together into a coherent and unified framework. We’ve come up with a single equation that shows you where biases can come from.”

    Biases can, in fact, stem from what the researchers call the class, or from the attribute, or both. To pick a simple example, suppose the task assigned to the machine learning model is to sort images of objects — animals in this case — into two classes: cows and camels. Attributes are descriptors that don’t specifically relate to the class itself. It might turn out, for instance, that all the images used in the analysis show cows standing on grass and camels on sand — grass and sand serving as the attributes here. Given the data available to it, the machine could reach an erroneous conclusion — namely that cows can only be found on grass, not on sand, with the opposite being true for camels. Such a finding would be incorrect, however, giving rise to a spurious correlation, which, Yang explains, is a “special case” among subpopulation shifts — “one in which you have a bias in both the class and the attribute.”

    In a medical setting, one could rely on machine learning models to determine whether a person has pneumonia or not based on an examination of X-ray images. There would be two classes in this situation, one consisting of people who have the lung ailment, another for those who are infection-free. A relatively straightforward case would involve just two attributes: the people getting X-rayed are either female or male. If, in this particular dataset, there were 100 males diagnosed with pneumonia for every one female diagnosed with pneumonia, that could lead to an attribute imbalance, and the model would likely do a better job of correctly detecting pneumonia for a man than for a woman. Similarly, having 1,000 times more healthy (pneumonia-free) subjects than sick ones would lead to a class imbalance, with the model biased toward healthy cases. Attribute generalization is the last shift highlighted in the new study. If your sample contained 100 male patients with pneumonia and zero female subjects with the same illness, you still would like the model to be able to generalize and make predictions about female subjects even though there are no samples in the training data for females with pneumonia.

    The team then took 20 advanced algorithms, designed to carry out classification tasks, and tested them on a dozen datasets to see how they performed across different population groups. They reached some unexpected conclusions: By improving the “classifier,” which is the last layer of the neural network, they were able to reduce the occurrence of spurious correlations and class imbalance, but the other shifts were unaffected. Improvements to the “encoder,” one of the uppermost layers in the neural network, could reduce the problem of attribute imbalance. “However, no matter what we did to the encoder or classifier, we did not see any improvements in terms of attribute generalization,” Yang says, “and we don’t yet know how to address that.”

    Precisely accurate

    There is also the question of assessing how well your model actually works in terms of evenhandedness among different population groups. The metric normally used, called worst-group accuracy or WGA, is based on the assumption that if you can improve the accuracy — of, say, medical diagnosis — for the group that has the worst model performance, you would have improved the model as a whole. “The WGA is considered the gold standard in subpopulation evaluation,” the authors contend, but they made a surprising discovery: boosting worst-group accuracy results in a decrease in what they call “worst-case precision.” In medical decision-making of all sorts, one needs both accuracy — which speaks to the validity of the findings — and precision, which relates to the reliability of the methodology. “Precision and accuracy are both very important metrics in classification tasks, and that is especially true in medical diagnostics,” Yang explains. “You should never trade precision for accuracy. You always need to balance the two.”

    The MIT scientists are putting their theories into practice. In a study they’re conducting with a medical center, they’re looking at public datasets for tens of thousands of patients and hundreds of thousands of chest X-rays, trying to see whether it’s possible for machine learning models to work in an unbiased manner for all populations. That’s still far from the case, even though more awareness has been drawn to this problem, Yang says. “We are finding many disparities across different ages, gender, ethnicity, and intersectional groups.”

    He and his colleagues agree on the eventual goal, which is to achieve fairness in health care among all populations. But before we can reach that point, they maintain, we still need a better understanding of the sources of unfairness and how they permeate our current system. Reforming the system as a whole will not be easy, they acknowledge. In fact, the title of the paper they introduced at the Honolulu conference, “Change is Hard,” gives some indications as to the challenges that they and like-minded researchers face. More

  • in

    Joining the battle against health care bias

    Medical researchers are awash in a tsunami of clinical data. But we need major changes in how we gather, share, and apply this data to bring its benefits to all, says Leo Anthony Celi, principal research scientist at the MIT Laboratory for Computational Physiology (LCP). 

    One key change is to make clinical data of all kinds openly available, with the proper privacy safeguards, says Celi, a practicing intensive care unit (ICU) physician at the Beth Israel Deaconess Medical Center (BIDMC) in Boston. Another key is to fully exploit these open data with multidisciplinary collaborations among clinicians, academic investigators, and industry. A third key is to focus on the varying needs of populations across every country, and to empower the experts there to drive advances in treatment, says Celi, who is also an associate professor at Harvard Medical School. 

    In all of this work, researchers must actively seek to overcome the perennial problem of bias in understanding and applying medical knowledge. This deeply damaging problem is only heightened with the massive onslaught of machine learning and other artificial intelligence technologies. “Computers will pick up all our unconscious, implicit biases when we make decisions,” Celi warns.

    Play video

    Sharing medical data 

    Founded by the LCP, the MIT Critical Data consortium builds communities across disciplines to leverage the data that are routinely collected in the process of ICU care to understand health and disease better. “We connect people and align incentives,” Celi says. “In order to advance, hospitals need to work with universities, who need to work with industry partners, who need access to clinicians and data.” 

    The consortium’s flagship project is the MIMIC (medical information marked for intensive care) ICU database built at BIDMC. With about 35,000 users around the world, the MIMIC cohort is the most widely analyzed in critical care medicine. 

    International collaborations such as MIMIC highlight one of the biggest obstacles in health care: most clinical research is performed in rich countries, typically with most clinical trial participants being white males. “The findings of these trials are translated into treatment recommendations for every patient around the world,” says Celi. “We think that this is a major contributor to the sub-optimal outcomes that we see in the treatment of all sorts of diseases in Africa, in Asia, in Latin America.” 

    To fix this problem, “groups who are disproportionately burdened by disease should be setting the research agenda,” Celi says. 

    That’s the rule in the “datathons” (health hackathons) that MIT Critical Data has organized in more than two dozen countries, which apply the latest data science techniques to real-world health data. At the datathons, MIT students and faculty both learn from local experts and share their own skill sets. Many of these several-day events are sponsored by the MIT Industrial Liaison Program, the MIT International Science and Technology Initiatives program, or the MIT Sloan Latin America Office. 

    Datathons are typically held in that country’s national language or dialect, rather than English, with representation from academia, industry, government, and other stakeholders. Doctors, nurses, pharmacists, and social workers join up with computer science, engineering, and humanities students to brainstorm and analyze potential solutions. “They need each other’s expertise to fully leverage and discover and validate the knowledge that is encrypted in the data, and that will be translated into the way they deliver care,” says Celi. 

    “Everywhere we go, there is incredible talent that is completely capable of designing solutions to their health-care problems,” he emphasizes. The datathons aim to further empower the professionals and students in the host countries to drive medical research, innovation, and entrepreneurship.

    Play video

    Fighting built-in bias 

    Applying machine learning and other advanced data science techniques to medical data reveals that “bias exists in the data in unimaginable ways” in every type of health product, Celi says. Often this bias is rooted in the clinical trials required to approve medical devices and therapies. 

    One dramatic example comes from pulse oximeters, which provide readouts on oxygen levels in a patient’s blood. It turns out that these devices overestimate oxygen levels for people of color. “We have been under-treating individuals of color because the nurses and the doctors have been falsely assured that their patients have adequate oxygenation,” he says. “We think that we have harmed, if not killed, a lot of individuals in the past, especially during Covid, as a result of a technology that was not designed with inclusive test subjects.” 

    Such dangers only increase as the universe of medical data expands. “The data that we have available now for research is maybe two or three levels of magnitude more than what we had even 10 years ago,” Celi says. MIMIC, for example, now includes terabytes of X-ray, echocardiogram, and electrocardiogram data, all linked with related health records. Such enormous sets of data allow investigators to detect health patterns that were previously invisible. 

    “But there is a caveat,” Celi says. “It is trivial for computers to learn sensitive attributes that are not very obvious to human experts.” In a study released last year, for instance, he and his colleagues showed that algorithms can tell if a chest X-ray image belongs to a white patient or person of color, even without looking at any other clinical data. 

    “More concerningly, groups including ours have demonstrated that computers can learn easily if you’re rich or poor, just from your imaging alone,” Celi says. “We were able to train a computer to predict if you are on Medicaid, or if you have private insurance, if you feed them with chest X-rays without any abnormality. So again, computers are catching features that are not visible to the human eye.” And these features may lead algorithms to advise against therapies for people who are Black or poor, he says. 

    Opening up industry opportunities 

    Every stakeholder stands to benefit when pharmaceutical firms and other health-care corporations better understand societal needs and can target their treatments appropriately, Celi says. 

    “We need to bring to the table the vendors of electronic health records and the medical device manufacturers, as well as the pharmaceutical companies,” he explains. “They need to be more aware of the disparities in the way that they perform their research. They need to have more investigators representing underrepresented groups of people, to provide that lens to come up with better designs of health products.” 

    Corporations could benefit by sharing results from their clinical trials, and could immediately see these potential benefits by participating in datathons, Celi says. “They could really witness the magic that happens when that data is curated and analyzed by students and clinicians with different backgrounds from different countries. So we’re calling out our partners in the pharmaceutical industry to organize these events with us!”  More

  • in

    Meet the 2022-23 Accenture Fellows

    Launched in October 2020, the MIT and Accenture Convergence Initiative for Industry and Technology underscores the ways in which industry and technology can collaborate to spur innovation. The five-year initiative aims to achieve its mission through research, education, and fellowships. To that end, Accenture has once again awarded five annual fellowships to MIT graduate students working on research in industry and technology convergence who are underrepresented, including by race, ethnicity, and gender.This year’s Accenture Fellows work across research areas including telemonitoring, human-computer interactions, operations research,  AI-mediated socialization, and chemical transformations. Their research covers a wide array of projects, including designing low-power processing hardware for telehealth applications; applying machine learning to streamline and improve business operations; improving mental health care through artificial intelligence; and using machine learning to understand the environmental and health consequences of complex chemical reactions.As part of the application process, student nominations were invited from each unit within the School of Engineering, as well as from the Institute’s four other schools and the MIT Schwarzman College of Computing. Five exceptional students were selected as fellows for the initiative’s third year.Drew Buzzell is a doctoral candidate in electrical engineering and computer science whose research concerns telemonitoring, a fast-growing sphere of telehealth in which information is collected through internet-of-things (IoT) connected devices and transmitted to the cloud. Currently, the high volume of information involved in telemonitoring — and the time and energy costs of processing it — make data analysis difficult. Buzzell’s work is focused on edge computing, a new computing architecture that seeks to address these challenges by managing data closer to the source, in a distributed network of IoT devices. Buzzell earned his BS in physics and engineering science and his MS in engineering science from the Pennsylvania State University.

    Mengying (Cathy) Fang is a master’s student in the MIT School of Architecture and Planning. Her research focuses on augmented reality and virtual reality platforms. Fang is developing novel sensors and machine components that combine computation, materials science, and engineering. Moving forward, she will explore topics including soft robotics techniques that could be integrated with clothes and wearable devices and haptic feedback in order to develop interactions with digital objects. Fang earned a BS in mechanical engineering and human-computer interaction from Carnegie Mellon University.

    Xiaoyue Gong is a doctoral candidate in operations research at the MIT Sloan School of Management. Her research aims to harness the power of machine learning and data science to reduce inefficiencies in the operation of businesses, organizations, and society. With the support of an Accenture Fellowship, Gong seeks to find solutions to operational problems by designing reinforcement learning methods and other machine learning techniques to embedded operational problems. Gong earned a BS in honors mathematics and interactive media arts from New York University.

    Ruby Liu is a doctoral candidate in medical engineering and medical physics. Their research addresses the growing pandemic of loneliness among older adults, which leads to poor health outcomes and presents particularly high risks for historically marginalized people, including members of the LGBTQ+ community and people of color. Liu is designing a network of interconnected AI agents that foster connections between user and agent, offering mental health care while strengthening and facilitating human-human connections. Liu received a BS in biomedical engineering from Johns Hopkins University.

    Joules Provenzano is a doctoral candidate in chemical engineering. Their work integrates machine learning and liquid chromatography-high resolution mass spectrometry (LC-HRMS) to improve our understanding of complex chemical reactions in the environment. As an Accenture Fellow, Provenzano will build upon recent advances in machine learning and LC-HRMS, including novel algorithms for processing real, experimental HR-MS data and new approaches in extracting structure-transformation rules and kinetics. Their research could speed the pace of discovery in the chemical sciences and benefits industries including oil and gas, pharmaceuticals, and agriculture. Provenzano earned a BS in chemical engineering and international and global studies from the Rochester Institute of Technology. More

  • in

    Large language models help decipher clinical notes

    Electronic health records (EHRs) need a new public relations manager. Ten years ago, the U.S. government passed a law that required hospitals to digitize their health records with the intent of improving and streamlining care. The enormous amount of information in these now-digital records could be used to answer very specific questions beyond the scope of clinical trials: What’s the right dose of this medication for patients with this height and weight? What about patients with a specific genomic profile?

    Unfortunately, most of the data that could answer these questions is trapped in doctor’s notes, full of jargon and abbreviations. These notes are hard for computers to understand using current techniques — extracting information requires training multiple machine learning models. Models trained for one hospital, also, don’t work well at others, and training each model requires domain experts to label lots of data, a time-consuming and expensive process. 

    An ideal system would use a single model that can extract many types of information, work well at multiple hospitals, and learn from a small amount of labeled data. But how? Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believed that to disentangle the data, they needed to call on something bigger: large language models. To pull that important medical information, they used a very big, GPT-3 style model to do tasks like expand overloaded jargon and acronyms and extract medication regimens. 

    For example, the system takes an input, which in this case is a clinical note, “prompts” the model with a question about the note, such as “expand this abbreviation, C-T-A.” The system returns an output such as “clear to auscultation,” as opposed to say, a CT angiography. The objective of extracting this clean data, the team says, is to eventually enable more personalized clinical recommendations. 

    Medical data is, understandably, a pretty tricky resource to navigate freely. There’s plenty of red tape around using public resources for testing the performance of large models because of data use restrictions, so the team decided to scrape together their own. Using a set of short, publicly available clinical snippets, they cobbled together a small dataset to enable evaluation of the extraction performance of large language models. 

    “It’s challenging to develop a single general-purpose clinical natural language processing system that will solve everyone’s needs and be robust to the huge variation seen across health datasets. As a result, until today, most clinical notes are not used in downstream analyses or for live decision support in electronic health records. These large language model approaches could potentially transform clinical natural language processing,” says David Sontag, MIT professor of electrical engineering and computer science, principal investigator in CSAIL and the Institute for Medical Engineering and Science, and supervising author on a paper about the work, which will be presented at the Conference on Empirical Methods in Natural Language Processing. “The research team’s advances in zero-shot clinical information extraction makes scaling possible. Even if you have hundreds of different use cases, no problem — you can build each model with a few minutes of work, versus having to label a ton of data for that particular task.”

    For example, without any labels at all, the researchers found these models could achieve 86 percent accuracy at expanding overloaded acronyms, and the team developed additional methods to boost this further to 90 percent accuracy, with still no labels required.

    Imprisoned in an EHR 

    Experts have been steadily building up large language models (LLMs) for quite some time, but they burst onto the mainstream with GPT-3’s widely covered ability to complete sentences. These LLMs are trained on a huge amount of text from the internet to finish sentences and predict the next most likely word. 

    While previous, smaller models like earlier GPT iterations or BERT have pulled off a good performance for extracting medical data, they still require substantial manual data-labeling effort. 

    For example, a note, “pt will dc vanco due to n/v” means that this patient (pt) was taking the antibiotic vancomycin (vanco) but experienced nausea and vomiting (n/v) severe enough for the care team to discontinue (dc) the medication. The team’s research avoids the status quo of training separate machine learning models for each task (extracting medication, side effects from the record, disambiguating common abbreviations, etc). In addition to expanding abbreviations, they investigated four other tasks, including if the models could parse clinical trials and extract detail-rich medication regimens.  

    “Prior work has shown that these models are sensitive to the prompt’s precise phrasing. Part of our technical contribution is a way to format the prompt so that the model gives you outputs in the correct format,” says Hunter Lang, CSAIL PhD student and author on the paper. “For these extraction problems, there are structured output spaces. The output space is not just a string. It can be a list. It can be a quote from the original input. So there’s more structure than just free text. Part of our research contribution is encouraging the model to give you an output with the correct structure. That significantly cuts down on post-processing time.”

    The approach can’t be applied to out-of-the-box health data at a hospital: that requires sending private patient information across the open internet to an LLM provider like OpenAI. The authors showed that it’s possible to work around this by distilling the model into a smaller one that could be used on-site.

    The model — sometimes just like humans — is not always beholden to the truth. Here’s what a potential problem might look like: Let’s say you’re asking the reason why someone took medication. Without proper guardrails and checks, the model might just output the most common reason for that medication, if nothing is explicitly mentioned in the note. This led to the team’s efforts to force the model to extract more quotes from data and less free text.

    Future work for the team includes extending to languages other than English, creating additional methods for quantifying uncertainty in the model, and pulling off similar results with open-sourced models. 

    “Clinical information buried in unstructured clinical notes has unique challenges compared to general domain text mostly due to large use of acronyms, and inconsistent textual patterns used across different health care facilities,” says Sadid Hasan, AI lead at Microsoft and former executive director of AI at CVS Health, who was not involved in the research. “To this end, this work sets forth an interesting paradigm of leveraging the power of general domain large language models for several important zero-/few-shot clinical NLP tasks. Specifically, the proposed guided prompt design of LLMs to generate more structured outputs could lead to further developing smaller deployable models by iteratively utilizing the model generated pseudo-labels.”

    “AI has accelerated in the last five years to the point at which these large models can predict contextualized recommendations with benefits rippling out across a variety of domains such as suggesting novel drug formulations, understanding unstructured text, code recommendations or create works of art inspired by any number of human artists or styles,” says Parminder Bhatia, who was formerly Head of Machine Learning at AWS Health AI and is currently Head of ML for low-code applications leveraging large language models at AWS AI Labs. “One of the applications of these large models [the team has] recently launched is Amazon CodeWhisperer, which is [an] ML-powered coding companion that helps developers in building applications.”

    As part of the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, Agrawal, Sontag, and Lang wrote the paper alongside Yoon Kim, MIT assistant professor and CSAIL principal investigator, and Stefan Hegselmann, a visiting PhD student from the University of Muenster. First-author Agrawal’s research was supported by a Takeda Fellowship, the MIT Deshpande Center for Technological Innovation, and the MLA@CSAIL Initiatives. More