More stories

  • in

    New leadership at MIT’s Center for Biomedical Innovation

    As it continues in its mission to improve global health through the development and implementation of biomedical innovation, the MIT Center for Biomedical Innovation (CBI) today announced changes to its leadership team: Stacy Springs has been named executive director, and Professor Richard Braatz has joined as the center’s new associate faculty director.

    The change in leadership comes at a time of rapid development in new therapeutic modalities, growing concern over global access to biologic medicines and healthy food, and widespread interest in applying computational tools and multi-disciplinary approaches to address long-standing biomedical challenges.

    “This marks an exciting new chapter for the CBI,” says faculty director Anthony J. Sinskey, professor of biology, who cofounded CBI in 2005. “As I look back at almost 20 years of CBI history, I see an exponential growth in our activities, educational offerings, and impact.”

    The center’s collaborative research model accelerates innovation in biotechnology and biomedical research, drawing on the expertise of faculty and researchers in MIT’s schools of Engineering and Science, the MIT Schwarzman College of Computing, and the MIT Sloan School of Management.

    Springs steps into the role of executive director having previously served as senior director of programs for CBI and as executive director of CBI’s Biomanufacturing Program and its Consortium on Adventitious Agent Contamination in Biomanufacturing (CAACB). She succeeds Gigi Hirsch, who founded the NEW Drug Development ParadIGmS (NEWDIGS) Initiative at CBI in 2009. Hirsch and NEWDIGS have now moved to Tufts Medical Center, establishing a headquarters at the new Center for Biomedical System Design within the Institute for Clinical Research and Health Policy Studies there.

    Braatz, a chemical engineer whose work is informed by mathematical modeling and computational techniques, conducts research in process data analytics, design, and control of advanced manufacturing systems.

    “It’s been great to interact with faculty from across the Institute who have complementary expertise,” says Braatz, the Edwin R. Gilliland Professor in the Department of Chemical Engineering. “Participating in CBI’s workshops has led to fruitful partnerships with companies in tackling industry-wide challenges.”

    CBI is housed under the Institute for Data Systems and Society and, specifically, the Sociotechnical Systems Research Center in the MIT Schwarzman College of Computing. CBI is home to two biomanufacturing consortia: the CAACB and the Biomanufacturing Consortium (BioMAN). Through these precompetitive collaborations, CBI researchers work with biomanufacturers and regulators to advance shared interests in biomanufacturing.

    In addition, CBI researchers are engaged in several sponsored research programs focused on integrated continuous biomanufacturing capabilities for monoclonal antibodies and vaccines, analytical technologies to measure quality and safety attributes of a variety of biologics, including gene and cell therapies, and rapid-cycle development of virus-like particle vaccines for SARS-CoV-2.

    In another significant initiative, CBI researchers are applying data analytics strategies to biomanufacturing problems. “In our smart data analytics project, we are creating new decision support tools and algorithms for biomanufacturing process control and plant-level decision-making. Further, we are leveraging machine learning and natural language processing to improve post-market surveillance studies,” says Springs.

    CBI is also working on advanced manufacturing for cell and gene therapies, among other new modalities, and is a part of the Singapore-MIT Alliance for Research and Technology – Critical Analytics for Manufacturing Personalized-Medicine (SMART CAMP). SMART CAMP is an international research effort focused on developing the analytical tools and biological understanding of critical quality attributes that will enable the manufacture and delivery of improved cell therapies to patients.

    “This is a crucial time for biomanufacturing and for innovation across the health-care value chain. The collaborative efforts of MIT researchers and consortia members will drive fundamental discovery and inform much-needed progress in industry,” says MIT Vice President for Research Maria Zuber.

    “CBI has a track record of engaging with health-care ecosystem challenges. I am confident that under the new leadership, it will continue to inspire MIT, the United States, and the entire world to improve the health of all people,” adds Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing. More

  • in

    Caspar Hare, Georgia Perakis named associate deans of Social and Ethical Responsibilities of Computing

    Caspar Hare and Georgia Perakis have been appointed the new associate deans of the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Stephen A. Schwarzman College of Computing. Their new roles will take effect on Sept. 1.

    “Infusing social and ethical aspects of computing in academic research and education is a critical component of the college mission,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “I look forward to working with Caspar and Georgia on continuing to develop and advance SERC and its reach across MIT. Their complementary backgrounds and their broad connections across MIT will be invaluable to this next chapter of SERC.”

    Caspar Hare

    Hare is a professor of philosophy in the Department of Linguistics and Philosophy. A member of the MIT faculty since 2003, his main interests are in ethics, metaphysics, and epistemology. The general theme of his recent work has been to bring ideas about practical rationality and metaphysics to bear on issues in normative ethics and epistemology. He is the author of two books: “On Myself, and Other, Less Important Subjects” (Princeton University Press 2009), about the metaphysics of perspective, and “The Limits of Kindness” (Oxford University Press 2013), about normative ethics.

    Georgia Perakis

    Perakis is the William F. Pounds Professor of Management and professor of operations research, statistics, and operations management at the MIT Sloan School of Management, where she has been a faculty member since 1998. She investigates the theory and practice of analytics and its role in operations problems and is particularly interested in how to solve complex and practical problems in pricing, revenue management, supply chains, health care, transportation, and energy applications, among other areas. Since 2019, she has been the co-director of the Operations Research Center, an interdepartmental PhD program that jointly reports to MIT Sloan and the MIT Schwarzman College of Computing, a role in which she will remain. Perakis will also assume an associate dean role at MIT Sloan in recognition of her leadership.

    Hare and Perakis succeed David Kaiser, the Germeshausen Professor of the History of Science and professor of physics, and Julie Shah, the H.N. Slater Professor of Aeronautics and Astronautics, who will be stepping down from their roles at the conclusion of their three-year term on Aug. 31.

    “My deepest thanks to Dave and Julie for their tremendous leadership of SERC and contributions to the college as associate deans,” says Huttenlocher.

    SERC impact

    As the inaugural associate deans of SERC, Kaiser and Shah have been responsible for advancing a mission to incorporate humanist, social science, social responsibility, and civic perspectives into MIT’s teaching, research, and implementation of computing. In doing so, they have engaged dozens of faculty members and thousands of students from across MIT during these first three years of the initiative.

    They have brought together people from a broad array of disciplines to collaborate on crafting original materials such as active learning projects, homework assignments, and in-class demonstrations. A collection of these materials was recently published and is now freely available to the world via MIT OpenCourseWare.

    In February 2021, they launched the MIT Case Studies in Social and Ethical Responsibilities of Computing for undergraduate instruction across a range of classes and fields of study. The specially commissioned and peer-reviewed cases are based on original research and are brief by design. Three issues have been published to date and a fourth will be released later this summer. Kaiser will continue to oversee the successful new series as editor.

    Last year, 60 undergraduates, graduate students, and postdocs joined a community of SERC Scholars to help advance SERC efforts in the college. The scholars participate in unique opportunities throughout, such as the summer Experiential Ethics program. A multidisciplinary team of graduate students last winter worked with the instructors and teaching assistants of class 6.036 (Introduction to Machine Learning), MIT’s largest machine learning course, to infuse weekly labs with material covering ethical computing, data and model bias, and fairness in machine learning through SERC.

    Through efforts such as these, SERC has had a substantial impact at MIT and beyond. Over the course of their tenure, Kaiser and Shah have engaged about 80 faculty members, and more than 2,100 students took courses that included new SERC content in the last year alone. SERC’s reach extended well beyond engineering students, with about 500 exposed to SERC content through courses offered in the School of Humanities, Arts, and Social Sciences, the MIT Sloan School of Management, and the School of Architecture and Planning. More

  • in

    Researchers discover major roadblock in alleviating network congestion

    When users want to send data over the internet faster than the network can handle, congestion can occur — the same way traffic congestion snarls the morning commute into a big city.

    Computers and devices that transmit data over the internet break the data down into smaller packets and use a special algorithm to decide how fast to send those packets. These congestion control algorithms seek to fully discover and utilize available network capacity while sharing it fairly with other users who may be sharing the same network. These algorithms try to minimize delay caused by data waiting in queues in the network.

    Over the past decade, researchers in industry and academia have developed several algorithms that attempt to achieve high rates while controlling delays. Some of these, such as the BBR algorithm developed by Google, are now widely used by many websites and applications.

    But a team of MIT researchers has discovered that these algorithms can be deeply unfair. In a new study, they show there will always be a network scenario where at least one sender receives almost no bandwidth compared to other senders; that is, a problem known as “starvation” cannot be avoided.

    “What is really surprising about this paper and the results is that when you take into account the real-world complexity of network paths and all the things they can do to data packets, it is basically impossible for delay-controlling congestion control algorithms to avoid starvation using current methods,” says Mohammad Alizadeh, associate professor of electrical engineering and computer science (EECS).

    While Alizadeh and his co-authors weren’t able to find a traditional congestion control algorithm that could avoid starvation, there may be algorithms in a different class that could prevent this problem. Their analysis also suggests that changing how these algorithms work, so that they allow for larger variations in delay, could help prevent starvation in some network situations.

    Alizadeh wrote the paper with first author and EECS graduate student Venkat Arun and senior author Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence. The research will be presented at the ACM Special Interest Group on Data Communications (SIGCOMM) conference.

    Controlling congestion

    Congestion control is a fundamental problem in networking that researchers have been trying to tackle since the 1980s.

    A user’s computer does not know how fast to send data packets over the network because it lacks information, such as the quality of the network connection or how many other senders are using the network. Sending packets too slowly makes poor use of the available bandwidth. But sending them too quickly can overwhelm the network, and in doing so, packets will start to get dropped. These packets must be resent, which leads to longer delays. Delays can also be caused by packets waiting in queues for a long time.

    Congestion control algorithms use packet losses and delays as signals to infer congestion and decide how fast to send data. But the internet is complicated, and packets can be delayed and lost for reasons unrelated to network congestion. For instance, data could be held up in a queue along the way and then released with a burst of other packets, or the receiver’s acknowledgement might be delayed. The authors call delays that are not caused by congestion “jitter.”

    Even if a congestion control algorithm measures delay perfectly, it can’t tell the difference between delay caused by congestion and delay caused by jitter. Delay caused by jitter is unpredictable and confuses the sender. Because of this ambiguity, users start estimating delay differently, which causes them to send packets at unequal rates. Eventually, this leads to a situation where starvation occurs and someone gets shut out completely, Arun explains.

    “We started the project because we lacked a theoretical understanding of congestion control behavior in the presence of jitter. To place it on a firmer theoretical footing, we built a mathematical model that was simple enough to think about, yet able to capture some of the complexities of the internet. It has been very rewarding to have math tell us things we didn’t know and that have practical relevance,” he says.

    Studying starvation

    The researchers fed their mathematical model to a computer, gave it a series of commonly used congestion control algorithms, and asked the computer to find an algorithm that could avoid starvation, using their model.

    “We couldn’t do it. We tried every algorithm that we are aware of, and some new ones we made up. Nothing worked. The computer always found a situation where some people get all the bandwidth and at least one person gets basically nothing,” Arun says.

    The researchers were surprised by this result, especially since these algorithms are widely believed to be reasonably fair. They started suspecting that it may not be possible to avoid starvation, an extreme form of unfairness. This motivated them to define a class of algorithms they call “delay-convergent algorithms” that they proved will always suffer from starvation under their network model. All existing congestion control algorithms that control delay (that the researchers are aware of) are delay-convergent.

    The fact that such simple failure modes of these widely used algorithms remained unknown for so long illustrates how difficult it is to understand algorithms through empirical testing alone, Arun adds. It underscores the importance of a solid theoretical foundation.

    But all hope is not lost. While all the algorithms they tested failed, there may be other algorithms which are not delay-convergent that might be able to avoid starvation This suggests that one way to fix the problem might be to design congestion control algorithms that vary the delay range more widely, so the range is larger than any delay that might occur due to jitter in the network.

    “To control delays, algorithms have tried to also bound the variations in delay about a desired equilibrium, but there is nothing wrong in potentially creating greater delay variation to get better measurements of congestive delays. It is just a new design philosophy you would have to adopt,” Balakrishnan adds.

    Now, the researchers want to keep pushing to see if they can find or build an algorithm that will eliminate starvation. They also want to apply this approach of mathematical modeling and computational proofs to other thorny, unsolved problems in networked systems.

    “We are increasingly reliant on computer systems for very critical things, and we need to put their reliability on a firmer conceptual footing. We’ve shown the surprising things you can discover when you put in the time to come up with these formal specifications of what the problem actually is,” says Alizadeh.

    The NASA University Leadership Initiative (grant #80NSSC20M0163) provided funds to assist the authors with their research, but the research paper solely reflects the opinions and conclusions of its authors and not any NASA entity. This work was also partially funded by the National Science Foundation, award number 1751009. More

  • in

    Emma Gibson: Optimizing health care logistics in Africa

    Growing up in South Africa at the turn of the century, Emma Gibson saw the rise of the HIV/AIDS epidemic and its devastating impact on her home country, where many people lacked life-saving health care. At the time, Gibson was too young to understand what a sexually transmitted infection was, but she knew that HIV was infecting millions of South Africans and AIDS was taking hundreds of thousands of lives. “As a child, I was terrified by this monster that was HIV and felt so powerless to do anything about it,” she says.

    Now, as an adult, her childhood fear of the HIV epidemic has evolved into a desire to fight it. Gibson seeks to improve health care for HIV and other diseases in regions with limited resources, including South Africa. She wants to help health care facilities in these areas to use their resources more effectively so that patients can more easily obtain care.

    To help reach her goal, Gibson sought mathematics and logistics training through higher education in South Africa. She first earned her bachelor’s degree in mathematical sciences at the University of the Witwatersrand, and then her master’s degree in operations research at Stellenbosch University. There, she learned to tackle complex decision-making problems using math, statistics, and computer simulations.

    During her master’s, Gibson studied the operational challenges faced in rural South African health care facilities by working with staff at Zithulele Hospital in the Eastern Cape, one of the country’s poorest provinces. Her research focused on ways to reduce hours-long wait times for patients seeking same-day care. In the end, she developed a software tool to model patient congestion throughout the day and optimize staff schedules accordingly, enabling the hospital to care for its patients more efficiently.

    After completing her master’s, Gibson wanted to further her education outside of South Africa and left to pursue a PhD in operations research at MIT. Upon arrival, she branched out in her research and worked on a project to improve breast cancer treatment in U.S. health care, a very different environment from what she was used to.

    Two years later, Gibson had the opportunity to return to researching health care in resource-limited settings and began working with Jónas Jónasson, an associate professor at the MIT Sloan School of Management, on a new project to improve diagnostic services in sub-Saharan Africa. For the past four years, she has been working diligently on this project in collaboration with researchers at the Indian School of Business and Northwestern University. “My love language is time,” she says. “If I’m investing a lot of time in something, I really value it.”

    Scheduling sample transport

    Diagnostic testing is an essential tool that allows medical professionals to identify new diagnoses in patients and monitor patients’ conditions as they undergo treatment. For example, people living with HIV require regular blood tests to ensure that their prescribed treatments are working effectively and provide an early warning of potential treatment failures.

    For Gibson’s current project, she’s trying to improve diagnostic services in Malawi, a landlocked country in southeast Africa. “We have the tools” to diagnose and treat diseases like HIV, she says. “But in resource-limited settings, we often lack the money, the staff, and the infrastructure to reach every patient that needs them.”

    When diagnostic testing is needed, clinicians collect samples from patients and send the samples to be tested at a laboratory, which then returns the results to the facility where the patient is treated. To move these items between facilities and laboratories, Malawi has developed a national sample transportation network. The transportation system plays an important role in linking remote, rural facilities to laboratory services and ensuring that patients in these areas can access diagnostic testing through community clinics. Samples collected at these clinics are first transported to nearby district hubs, and then forwarded to laboratories located in urban areas. Since most facilities do not have computers or communications infrastructure, laboratories print copies of test results and send them back to facilities through the same transportation process.

    The sample transportation cycle is onerous, but it’s a practical solution to a difficult problem. “During the Covid pandemic, we saw how hard it was to scale up diagnostic infrastructure,” Gibson says. Diagnostic services in sub-Saharan Africa face “similar challenges, but in a much poorer setting.”

    In Malawi, sample transportation is managed by a  nongovernment organization called Riders 4 Health. The organization has around 80 couriers on motorcycles who transport samples and test results between facilities. “When we started working with [Riders], the couriers operated on fixed weekly schedules, visiting each site once or twice a week,” Gibson says. But that led to “a lot of unnecessary trips and delays.”

    To make sample transportation more efficient, Gibson developed a dynamic scheduling system that adapts to the current demand for diagnostic testing. The system consists of two main parts: an information sharing platform that aggregates sample transportation data, and an algorithm that uses the data to generate optimized routes and schedules for sample transport couriers.

    In 2019, Gibson ran a four-month-long pilot test for this system in three out of the 27 districts in Malawi. During the pilot study, six couriers transported over 20,000 samples and results across 51 health care facilities, and 150 health care workers participated in data sharing.

    The pilot was a success. Gibson’s dynamic scheduling system eliminated about half the unnecessary trips and reduced transportation delays by 25 percent — a delay that used to be four days was reduced to three. Now, Riders 4 Health is developing their own version of Gibson’s system to operate nationally in Malawi. Throughout this project, “we focused on making sure this was something that could grow with the organization,” she says. “It’s gratifying to see that actually happening.”

    Leveraging patient data

    Gibson is completing her MIT degree this September but will continue working to improve health care in Africa. After graduation, she will join the technology and analytics health care practice of an established company in South Africa. Her initial focus will be on public health care institutions, including Chris Hani Baragwanath Academic Hospital in Johannesburg, the third-largest hospital in the world.

    In this role, Gibson will work to fill in gaps in African patient data for medical operational research and develop ways to use this data more effectively to improve health care in resource-limited areas. For example, better data systems can help to monitor the prevalence and impact of different diseases, guiding where health care workers and researchers put their efforts to help the most people. “You can’t make good decisions if you don’t have all the information,” Gibson says.

    To best leverage patient data for improving health care, Gibson plans to reevaluate how data systems are structured and used in the hospital. For ideas on upgrading the current system, she’ll look to existing data systems in other countries to see what works and what doesn’t, while also drawing upon her past research experience in U.S. health care. Ultimately, she’ll tailor the new hospital data system to South African needs to accurately inform future directions in health care.

    Gibson’s new job — her “dream job” — will be based in the United Kingdom, but she anticipates spending a significant amount of time in Johannesburg. “I have so many opportunities in the wider world, but the ones that appeal to me are always back in the place I came from,” she says. More

  • in

    A technique to improve both fairness and accuracy in artificial intelligence

    For workers who use machine-learning models to help them make decisions, knowing when to trust a model’s predictions is not always an easy task, especially since these models are often so complex that their inner workings remain a mystery.

    Users sometimes employ a technique, known as selective regression, in which the model estimates its confidence level for each prediction and will reject predictions when its confidence is too low. Then a human can examine those cases, gather additional information, and make a decision about each one manually.

    But while selective regression has been shown to improve the overall performance of a model, researchers at MIT and the MIT-IBM Watson AI Lab have discovered that the technique can have the opposite effect for underrepresented groups of people in a dataset. As the model’s confidence increases with selective regression, its chance of making the right prediction also increases, but this does not always happen for all subgroups.

    For instance, a model suggesting loan approvals might make fewer errors on average, but it may actually make more wrong predictions for Black or female applicants. One reason this can occur is due to the fact that the model’s confidence measure is trained using overrepresented groups and may not be accurate for these underrepresented groups.

    Once they had identified this problem, the MIT researchers developed two algorithms that can remedy the issue. Using real-world datasets, they show that the algorithms reduce performance disparities that had affected marginalized subgroups.

    “Ultimately, this is about being more intelligent about which samples you hand off to a human to deal with. Rather than just minimizing some broad error rate for the model, we want to make sure the error rate across groups is taken into account in a smart way,” says senior MIT author Greg Wornell, the Sumitomo Professor in Engineering in the Department of Electrical Engineering and Computer Science (EECS) who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory of Electronics (RLE) and is a member of the MIT-IBM Watson AI Lab.

    Joining Wornell on the paper are co-lead authors Abhin Shah, an EECS graduate student, and Yuheng Bu, a postdoc in RLE; as well as Joshua Ka-Wing Lee SM ’17, ScD ’21 and Subhro Das, Rameswar Panda, and Prasanna Sattigeri, research staff members at the MIT-IBM Watson AI Lab. The paper will be presented this month at the International Conference on Machine Learning.

    To predict or not to predict

    Regression is a technique that estimates the relationship between a dependent variable and independent variables. In machine learning, regression analysis is commonly used for prediction tasks, such as predicting the price of a home given its features (number of bedrooms, square footage, etc.) With selective regression, the machine-learning model can make one of two choices for each input — it can make a prediction or abstain from a prediction if it doesn’t have enough confidence in its decision.

    When the model abstains, it reduces the fraction of samples it is making predictions on, which is known as coverage. By only making predictions on inputs that it is highly confident about, the overall performance of the model should improve. But this can also amplify biases that exist in a dataset, which occur when the model does not have sufficient data from certain subgroups. This can lead to errors or bad predictions for underrepresented individuals.

    The MIT researchers aimed to ensure that, as the overall error rate for the model improves with selective regression, the performance for every subgroup also improves. They call this monotonic selective risk.

    “It was challenging to come up with the right notion of fairness for this particular problem. But by enforcing this criteria, monotonic selective risk, we can make sure the model performance is actually getting better across all subgroups when you reduce the coverage,” says Shah.

    Focus on fairness

    The team developed two neural network algorithms that impose this fairness criteria to solve the problem.

    One algorithm guarantees that the features the model uses to make predictions contain all information about the sensitive attributes in the dataset, such as race and sex, that is relevant to the target variable of interest. Sensitive attributes are features that may not be used for decisions, often due to laws or organizational policies. The second algorithm employs a calibration technique to ensure the model makes the same prediction for an input, regardless of whether any sensitive attributes are added to that input.

    The researchers tested these algorithms by applying them to real-world datasets that could be used in high-stakes decision making. One, an insurance dataset, is used to predict total annual medical expenses charged to patients using demographic statistics; another, a crime dataset, is used to predict the number of violent crimes in communities using socioeconomic information. Both datasets contain sensitive attributes for individuals.

    When they implemented their algorithms on top of a standard machine-learning method for selective regression, they were able to reduce disparities by achieving lower error rates for the minority subgroups in each dataset. Moreover, this was accomplished without significantly impacting the overall error rate.

    “We see that if we don’t impose certain constraints, in cases where the model is really confident, it could actually be making more errors, which could be very costly in some applications, like health care. So if we reverse the trend and make it more intuitive, we will catch a lot of these errors. A major goal of this work is to avoid errors going silently undetected,” Sattigeri says.

    The researchers plan to apply their solutions to other applications, such as predicting house prices, student GPA, or loan interest rate, to see if the algorithms need to be calibrated for those tasks, says Shah. They also want to explore techniques that use less sensitive information during the model training process to avoid privacy issues.

    And they hope to improve the confidence estimates in selective regression to prevent situations where the model’s confidence is low, but its prediction is correct. This could reduce the workload on humans and further streamline the decision-making process, Sattigeri says.

    This research was funded, in part, by the MIT-IBM Watson AI Lab and its member companies Boston Scientific, Samsung, and Wells Fargo, and by the National Science Foundation. More

  • in

    Costis Daskalakis appointed inaugural Avanessians Professor in the MIT Schwarzman College of Computing

    The MIT Stephen A. Schwarzman College of Computing has named Costis Daskalakis as the inaugural holder of the Avanessians Professorship. His chair began on July 1.

    Daskalakis is the first person appointed to this position generously endowed by Armen Avanessians ’81. Established in the MIT Schwarzman College of Computing, the new chair provides Daskalakis with additional support to pursue his research and develop his career.

    “I’m delighted to recognize Costis for his scholarship and extraordinary achievements with this distinguished professorship,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science.

    A professor in the MIT Department of Electrical Engineering and Computer Science, Daskalakis is a theoretical computer scientist who works at the interface of game theory, economics, probability theory, statistics, and machine learning. He has resolved long-standing open problems about the computational complexity of the Nash equilibrium, the mathematical structure and computational complexity of multi-item auctions, and the behavior of machine-learning methods such as the expectation-maximization algorithm. He has obtained computationally and statistically efficient methods for statistical hypothesis testing and learning in high-dimensional settings, as well as results characterizing the structure and concentration properties of high-dimensional distributions. His current work focuses on multi-agent learning, learning from biased and dependent data, causal inference, and econometrics.

    A native of Greece, Daskalakis joined the MIT faculty in 2009. He is a member of the Computer Science and Artificial Intelligence Laboratory and is affiliated with the Laboratory for Information and Decision Systems and the Operations Research Center. He is also an investigator in the Foundations of Data Science Institute.

    He has previously received such honors as the 2018 Nevanlinna Prize from the International Mathematical Union, the 2018 ACM Grace Murray Hopper Award, the Kalai Game Theory and Computer Science Prize from the Game Theory Society, and the 2008 ACM Doctoral Dissertation Award. More

  • in

    Teaching AI to ask clinical questions

    Physicians often query a patient’s electronic health record for information that helps them make treatment decisions, but the cumbersome nature of these records hampers the process. Research has shown that even when a doctor has been trained to use an electronic health record (EHR), finding an answer to just one question can take, on average, more than eight minutes.

    The more time physicians must spend navigating an oftentimes clunky EHR interface, the less time they have to interact with patients and provide treatment.

    Researchers have begun developing machine-learning models that can streamline the process by automatically finding information physicians need in an EHR. However, training effective models requires huge datasets of relevant medical questions, which are often hard to come by due to privacy restrictions. Existing models struggle to generate authentic questions — those that would be asked by a human doctor — and are often unable to successfully find correct answers.

    To overcome this data shortage, researchers at MIT partnered with medical experts to study the questions physicians ask when reviewing EHRs. Then, they built a publicly available dataset of more than 2,000 clinically relevant questions written by these medical experts.

    When they used their dataset to train a machine-learning model to generate clinical questions, they found that the model asked high-quality and authentic questions, as compared to real questions from medical experts, more than 60 percent of the time.

    With this dataset, they plan to generate vast numbers of authentic medical questions and then use those questions to train a machine-learning model which would help doctors find sought-after information in a patient’s record more efficiently.

    “Two thousand questions may sound like a lot, but when you look at machine-learning models being trained nowadays, they have so much data, maybe billions of data points. When you train machine-learning models to work in health care settings, you have to be really creative because there is such a lack of data,” says lead author Eric Lehman, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    The senior author is Peter Szolovits, a professor in the Department of Electrical Engineering and Computer Science (EECS) who heads the Clinical Decision-Making Group in CSAIL and is also a member of the MIT-IBM Watson AI Lab. The research paper, a collaboration between co-authors at MIT, the MIT-IBM Watson AI Lab, IBM Research, and the doctors and medical experts who helped create questions and participated in the study, will be presented at the annual conference of the North American Chapter of the Association for Computational Linguistics.

    “Realistic data is critical for training models that are relevant to the task yet difficult to find or create,” Szolovits says. “The value of this work is in carefully collecting questions asked by clinicians about patient cases, from which we are able to develop methods that use these data and general language models to ask further plausible questions.”

    Data deficiency

    The few large datasets of clinical questions the researchers were able to find had a host of issues, Lehman explains. Some were composed of medical questions asked by patients on web forums, which are a far cry from physician questions. Other datasets contained questions produced from templates, so they are mostly identical in structure, making many questions unrealistic.

    “Collecting high-quality data is really important for doing machine-learning tasks, especially in a health care context, and we’ve shown that it can be done,” Lehman says.

    To build their dataset, the MIT researchers worked with practicing physicians and medical students in their last year of training. They gave these medical experts more than 100 EHR discharge summaries and told them to read through a summary and ask any questions they might have. The researchers didn’t put any restrictions on question types or structures in an effort to gather natural questions. They also asked the medical experts to identify the “trigger text” in the EHR that led them to ask each question.

    For instance, a medical expert might read a note in the EHR that says a patient’s past medical history is significant for prostate cancer and hypothyroidism. The trigger text “prostate cancer” could lead the expert to ask questions like “date of diagnosis?” or “any interventions done?”

    They found that most questions focused on symptoms, treatments, or the patient’s test results. While these findings weren’t unexpected, quantifying the number of questions about each broad topic will help them build an effective dataset for use in a real, clinical setting, says Lehman.

    Once they had compiled their dataset of questions and accompanying trigger text, they used it to train machine-learning models to ask new questions based on the trigger text.

    Then the medical experts determined whether those questions were “good” using four metrics: understandability (Does the question make sense to a human physician?), triviality (Is the question too easily answerable from the trigger text?), medical relevance (Does it makes sense to ask this question based on the context?), and relevancy to the trigger (Is the trigger related to the question?).

    Cause for concern

    The researchers found that when a model was given trigger text, it was able to generate a good question 63 percent of the time, whereas a human physician would ask a good question 80 percent of the time.

    They also trained models to recover answers to clinical questions using the publicly available datasets they had found at the outset of this project. Then they tested these trained models to see if they could find answers to “good” questions asked by human medical experts.

    The models were only able to recover about 25 percent of answers to physician-generated questions.

    “That result is really concerning. What people thought were good-performing models were, in practice, just awful because the evaluation questions they were testing on were not good to begin with,” Lehman says.

    The team is now applying this work toward their initial goal: building a model that can automatically answer physicians’ questions in an EHR. For the next step, they will use their dataset to train a machine-learning model that can automatically generate thousands or millions of good clinical questions, which can then be used to train a new model for automatic question answering.

    While there is still much work to do before that model could be a reality, Lehman is encouraged by the strong initial results the team demonstrated with this dataset.

    This research was supported, in part, by the MIT-IBM Watson AI Lab. Additional co-authors include Leo Anthony Celi of the MIT Institute for Medical Engineering and Science; Preethi Raghavan and Jennifer J. Liang of the MIT-IBM Watson AI Lab; Dana Moukheiber of the University of Buffalo; Vladislav Lialin and Anna Rumshisky of the University of Massachusetts at Lowell; Katelyn Legaspi, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, and Pia Gabrielle I. Alfonso of the University of the Philippines; Anne Janelle R. Sy and Patricia Therese S. Pile of the University of the East Ramon Magsaysay Memorial Medical Center; Marianne Taliño of the Ateneo de Manila University School of Medicine and Public Health; and Byron C. Wallace of Northeastern University. More

  • in

    Building explainability into the components of machine-learning models

    Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. For example, if a model predicts a patient’s risk of developing cardiac disease, a physician might want to know how strongly the patient’s heart rate data influences that prediction.

    But if those features are so complex or convoluted that the user can’t understand them, does the explanation method do any good?

    MIT researchers are striving to improve the interpretability of features so decision makers will be more comfortable using the outputs of machine-learning models. Drawing on years of field work, they developed a taxonomy to help developers craft features that will be easier for their target audience to understand.

    “We found that out in the real world, even though we were using state-of-the-art ways of explaining machine-learning models, there is still a lot of confusion stemming from the features, not from the model itself,” says Alexandra Zytek, an electrical engineering and computer science PhD student and lead author of a paper introducing the taxonomy.

    To build the taxonomy, the researchers defined properties that make features interpretable for five types of users, from artificial intelligence experts to the people affected by a machine-learning model’s prediction. They also offer instructions for how model creators can transform features into formats that will be easier for a layperson to comprehend.

    They hope their work will inspire model builders to consider using interpretable features from the beginning of the development process, rather than trying to work backward and focus on explainability after the fact.

    MIT co-authors include Dongyu Liu, a postdoc; visiting professor Laure Berti-Équille, research director at IRD; and senior author Kalyan Veeramachaneni, principal research scientist in the Laboratory for Information and Decision Systems (LIDS) and leader of the Data to AI group. They are joined by Ignacio Arnaldo, a principal data scientist at Corelight. The research is published in the June edition of the Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining’s peer-reviewed Explorations Newsletter.

    Real-world lessons

    Features are input variables that are fed to machine-learning models; they are usually drawn from the columns in a dataset. Data scientists typically select and handcraft features for the model, and they mainly focus on ensuring features are developed to improve model accuracy, not on whether a decision-maker can understand them, Veeramachaneni explains.

    For several years, he and his team have worked with decision makers to identify machine-learning usability challenges. These domain experts, most of whom lack machine-learning knowledge, often don’t trust models because they don’t understand the features that influence predictions.

    For one project, they partnered with clinicians in a hospital ICU who used machine learning to predict the risk a patient will face complications after cardiac surgery. Some features were presented as aggregated values, like the trend of a patient’s heart rate over time. While features coded this way were “model ready” (the model could process the data), clinicians didn’t understand how they were computed. They would rather see how these aggregated features relate to original values, so they could identify anomalies in a patient’s heart rate, Liu says.

    By contrast, a group of learning scientists preferred features that were aggregated. Instead of having a feature like “number of posts a student made on discussion forums” they would rather have related features grouped together and labeled with terms they understood, like “participation.”

    “With interpretability, one size doesn’t fit all. When you go from area to area, there are different needs. And interpretability itself has many levels,” Veeramachaneni says.

    The idea that one size doesn’t fit all is key to the researchers’ taxonomy. They define properties that can make features more or less interpretable for different decision makers and outline which properties are likely most important to specific users.

    For instance, machine-learning developers might focus on having features that are compatible with the model and predictive, meaning they are expected to improve the model’s performance.

    On the other hand, decision makers with no machine-learning experience might be better served by features that are human-worded, meaning they are described in a way that is natural for users, and understandable, meaning they refer to real-world metrics users can reason about.

    “The taxonomy says, if you are making interpretable features, to what level are they interpretable? You may not need all levels, depending on the type of domain experts you are working with,” Zytek says.

    Putting interpretability first

    The researchers also outline feature engineering techniques a developer can employ to make features more interpretable for a specific audience.

    Feature engineering is a process in which data scientists transform data into a format machine-learning models can process, using techniques like aggregating data or normalizing values. Most models also can’t process categorical data unless they are converted to a numerical code. These transformations are often nearly impossible for laypeople to unpack.

    Creating interpretable features might involve undoing some of that encoding, Zytek says. For instance, a common feature engineering technique organizes spans of data so they all contain the same number of years. To make these features more interpretable, one could group age ranges using human terms, like infant, toddler, child, and teen. Or rather than using a transformed feature like average pulse rate, an interpretable feature might simply be the actual pulse rate data, Liu adds.

    “In a lot of domains, the tradeoff between interpretable features and model accuracy is actually very small. When we were working with child welfare screeners, for example, we retrained the model using only features that met our definitions for interpretability, and the performance decrease was almost negligible,” Zytek says.

    Building off this work, the researchers are developing a system that enables a model developer to handle complicated feature transformations in a more efficient manner, to create human-centered explanations for machine-learning models. This new system will also convert algorithms designed to explain model-ready datasets into formats that can be understood by decision makers. More