More stories

  • in

    Using seismology for groundwater management

    As climate change increases the number of extreme weather events, such as megadroughts, groundwater management is key for sustaining water supply. But current groundwater monitoring tools are either costly or insufficient for deeper aquifers, limiting our ability to monitor and practice sustainable management in populated areas.

    Now, a new paper published in Nature Communications bridges seismology and hydrology with a pilot application that uses seismometers as a cost-effective way to monitor and map groundwater fluctuations.

    “Our measurements are independent from and complementary to traditional observations,” says Shujuan Mao PhD ’21, lead author on the paper. “It provides a new way to dictate groundwater management and evaluate the impact of human activity on shaping underground hydrologic systems.”

    Mao, currently a Thompson Postdoctoral Fellow in the Geophysics department at Stanford University, conducted most of the research during her PhD in MIT’s Department of Earth, Atmospheric and Planetary Sciences (EAPS). Other contributors to the paper include EAPS department chair and Schlumberger Professor of Earth and Planetary Sciences Robert van der Hilst, as well as Michel Campillo and Albanne Lecointre from the Institut des Sciences de la Terre in France.

    While there are a few different methods currently used for measuring groundwater, they all come with notable drawbacks. Hydraulic heads, which drill through the ground and into the aquifers, are expensive and can only give limited information at the specific location they’re placed. Noninvasive techniques based on satellite- or airborne-sensing lack the sensitivity and resolution needed to observe deeper depths.

    Mao proposes using seismometers, which are instruments used to measure ground vibrations such as the waves produced by earthquakes. They can measure seismic velocity, which is the propagation speed of seismic waves. Seismic velocity measurements are unique to the mechanical state of rocks, or the ways rocks respond to their physical environment, and can tell us a lot about them.

    The idea of using seismic velocity to characterize property changes in rocks has long been used in laboratory-scale analysis, but only recently have scientists been able to measure it continuously in realistic-scale geological settings. For aquifer monitoring, Mao and her team associate the seismic velocity with the hydraulic property, or the water content, in the rocks.

    Seismic velocity measurements make use of ambient seismic fields, or background noise, recorded by seismometers. “The Earth’s surface is always vibrating, whether due to ocean waves, winds, or human activities,” she explains. “Most of the time those vibrations are really small and are considered ‘noise’ by traditional seismologists. But in recent years scientists have shown that the continuous noise records in fact contain a wealth of information about the properties and structures of the Earth’s interior.”

    To extract useful information from the noise records, Mao and her team used a technique called seismic interferometry, which analyzes wave interference to calculate the seismic velocity of the medium the waves pass through. For their pilot application, Mao and her team applied this analysis to basins in the Metropolitan Los Angeles region, an area suffering from worsening drought and a growing population.

    By doing this, Mao and her team were able to see how the aquifers changed physically over time at a high resolution. Their seismic velocity measurements verified measurements taken by hydraulic heads over the last 20 years, and the images matched very well with satellite data. They could also see differences in how the storage areas changed between counties in the area that used different water pumping practices, which is important for developing water management protocol.

    Mao also calls using the seismometers a “buy-one get-one free” deal, since seismometers are already in use for earthquake and tectonic studies not just across California, but worldwide, and could help “avoid the expensive cost of drilling and maintaining dedicated groundwater monitoring wells,” she says.

    Mao emphasizes that this study is just the beginning of exploring possible applications of seismic noise interferometry in this way. It can be used to monitor other near-surface systems, such as geothermal or volcanic systems, and Mao is currently applying it to oil and gas fields. But in places like California currently experiencing megadroughts, and who rely on groundwater for a large portion of their water needs, this kind of information is key for sustainable water management.

    “It’s really important, especially now, to characterize these changes in groundwater storage so that we can promote data-informed policymaking to help them thrive under increasing water stress,” she says.

    This study was funded, in part, by the European Research Council, with additional support from the Thompson Fellowship at Stanford University. More

  • in

    Researchers discover major roadblock in alleviating network congestion

    When users want to send data over the internet faster than the network can handle, congestion can occur — the same way traffic congestion snarls the morning commute into a big city.

    Computers and devices that transmit data over the internet break the data down into smaller packets and use a special algorithm to decide how fast to send those packets. These congestion control algorithms seek to fully discover and utilize available network capacity while sharing it fairly with other users who may be sharing the same network. These algorithms try to minimize delay caused by data waiting in queues in the network.

    Over the past decade, researchers in industry and academia have developed several algorithms that attempt to achieve high rates while controlling delays. Some of these, such as the BBR algorithm developed by Google, are now widely used by many websites and applications.

    But a team of MIT researchers has discovered that these algorithms can be deeply unfair. In a new study, they show there will always be a network scenario where at least one sender receives almost no bandwidth compared to other senders; that is, a problem known as “starvation” cannot be avoided.

    “What is really surprising about this paper and the results is that when you take into account the real-world complexity of network paths and all the things they can do to data packets, it is basically impossible for delay-controlling congestion control algorithms to avoid starvation using current methods,” says Mohammad Alizadeh, associate professor of electrical engineering and computer science (EECS).

    While Alizadeh and his co-authors weren’t able to find a traditional congestion control algorithm that could avoid starvation, there may be algorithms in a different class that could prevent this problem. Their analysis also suggests that changing how these algorithms work, so that they allow for larger variations in delay, could help prevent starvation in some network situations.

    Alizadeh wrote the paper with first author and EECS graduate student Venkat Arun and senior author Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence. The research will be presented at the ACM Special Interest Group on Data Communications (SIGCOMM) conference.

    Controlling congestion

    Congestion control is a fundamental problem in networking that researchers have been trying to tackle since the 1980s.

    A user’s computer does not know how fast to send data packets over the network because it lacks information, such as the quality of the network connection or how many other senders are using the network. Sending packets too slowly makes poor use of the available bandwidth. But sending them too quickly can overwhelm the network, and in doing so, packets will start to get dropped. These packets must be resent, which leads to longer delays. Delays can also be caused by packets waiting in queues for a long time.

    Congestion control algorithms use packet losses and delays as signals to infer congestion and decide how fast to send data. But the internet is complicated, and packets can be delayed and lost for reasons unrelated to network congestion. For instance, data could be held up in a queue along the way and then released with a burst of other packets, or the receiver’s acknowledgement might be delayed. The authors call delays that are not caused by congestion “jitter.”

    Even if a congestion control algorithm measures delay perfectly, it can’t tell the difference between delay caused by congestion and delay caused by jitter. Delay caused by jitter is unpredictable and confuses the sender. Because of this ambiguity, users start estimating delay differently, which causes them to send packets at unequal rates. Eventually, this leads to a situation where starvation occurs and someone gets shut out completely, Arun explains.

    “We started the project because we lacked a theoretical understanding of congestion control behavior in the presence of jitter. To place it on a firmer theoretical footing, we built a mathematical model that was simple enough to think about, yet able to capture some of the complexities of the internet. It has been very rewarding to have math tell us things we didn’t know and that have practical relevance,” he says.

    Studying starvation

    The researchers fed their mathematical model to a computer, gave it a series of commonly used congestion control algorithms, and asked the computer to find an algorithm that could avoid starvation, using their model.

    “We couldn’t do it. We tried every algorithm that we are aware of, and some new ones we made up. Nothing worked. The computer always found a situation where some people get all the bandwidth and at least one person gets basically nothing,” Arun says.

    The researchers were surprised by this result, especially since these algorithms are widely believed to be reasonably fair. They started suspecting that it may not be possible to avoid starvation, an extreme form of unfairness. This motivated them to define a class of algorithms they call “delay-convergent algorithms” that they proved will always suffer from starvation under their network model. All existing congestion control algorithms that control delay (that the researchers are aware of) are delay-convergent.

    The fact that such simple failure modes of these widely used algorithms remained unknown for so long illustrates how difficult it is to understand algorithms through empirical testing alone, Arun adds. It underscores the importance of a solid theoretical foundation.

    But all hope is not lost. While all the algorithms they tested failed, there may be other algorithms which are not delay-convergent that might be able to avoid starvation This suggests that one way to fix the problem might be to design congestion control algorithms that vary the delay range more widely, so the range is larger than any delay that might occur due to jitter in the network.

    “To control delays, algorithms have tried to also bound the variations in delay about a desired equilibrium, but there is nothing wrong in potentially creating greater delay variation to get better measurements of congestive delays. It is just a new design philosophy you would have to adopt,” Balakrishnan adds.

    Now, the researchers want to keep pushing to see if they can find or build an algorithm that will eliminate starvation. They also want to apply this approach of mathematical modeling and computational proofs to other thorny, unsolved problems in networked systems.

    “We are increasingly reliant on computer systems for very critical things, and we need to put their reliability on a firmer conceptual footing. We’ve shown the surprising things you can discover when you put in the time to come up with these formal specifications of what the problem actually is,” says Alizadeh.

    The NASA University Leadership Initiative (grant #80NSSC20M0163) provided funds to assist the authors with their research, but the research paper solely reflects the opinions and conclusions of its authors and not any NASA entity. This work was also partially funded by the National Science Foundation, award number 1751009. More

  • in

    A technique to improve both fairness and accuracy in artificial intelligence

    For workers who use machine-learning models to help them make decisions, knowing when to trust a model’s predictions is not always an easy task, especially since these models are often so complex that their inner workings remain a mystery.

    Users sometimes employ a technique, known as selective regression, in which the model estimates its confidence level for each prediction and will reject predictions when its confidence is too low. Then a human can examine those cases, gather additional information, and make a decision about each one manually.

    But while selective regression has been shown to improve the overall performance of a model, researchers at MIT and the MIT-IBM Watson AI Lab have discovered that the technique can have the opposite effect for underrepresented groups of people in a dataset. As the model’s confidence increases with selective regression, its chance of making the right prediction also increases, but this does not always happen for all subgroups.

    For instance, a model suggesting loan approvals might make fewer errors on average, but it may actually make more wrong predictions for Black or female applicants. One reason this can occur is due to the fact that the model’s confidence measure is trained using overrepresented groups and may not be accurate for these underrepresented groups.

    Once they had identified this problem, the MIT researchers developed two algorithms that can remedy the issue. Using real-world datasets, they show that the algorithms reduce performance disparities that had affected marginalized subgroups.

    “Ultimately, this is about being more intelligent about which samples you hand off to a human to deal with. Rather than just minimizing some broad error rate for the model, we want to make sure the error rate across groups is taken into account in a smart way,” says senior MIT author Greg Wornell, the Sumitomo Professor in Engineering in the Department of Electrical Engineering and Computer Science (EECS) who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory of Electronics (RLE) and is a member of the MIT-IBM Watson AI Lab.

    Joining Wornell on the paper are co-lead authors Abhin Shah, an EECS graduate student, and Yuheng Bu, a postdoc in RLE; as well as Joshua Ka-Wing Lee SM ’17, ScD ’21 and Subhro Das, Rameswar Panda, and Prasanna Sattigeri, research staff members at the MIT-IBM Watson AI Lab. The paper will be presented this month at the International Conference on Machine Learning.

    To predict or not to predict

    Regression is a technique that estimates the relationship between a dependent variable and independent variables. In machine learning, regression analysis is commonly used for prediction tasks, such as predicting the price of a home given its features (number of bedrooms, square footage, etc.) With selective regression, the machine-learning model can make one of two choices for each input — it can make a prediction or abstain from a prediction if it doesn’t have enough confidence in its decision.

    When the model abstains, it reduces the fraction of samples it is making predictions on, which is known as coverage. By only making predictions on inputs that it is highly confident about, the overall performance of the model should improve. But this can also amplify biases that exist in a dataset, which occur when the model does not have sufficient data from certain subgroups. This can lead to errors or bad predictions for underrepresented individuals.

    The MIT researchers aimed to ensure that, as the overall error rate for the model improves with selective regression, the performance for every subgroup also improves. They call this monotonic selective risk.

    “It was challenging to come up with the right notion of fairness for this particular problem. But by enforcing this criteria, monotonic selective risk, we can make sure the model performance is actually getting better across all subgroups when you reduce the coverage,” says Shah.

    Focus on fairness

    The team developed two neural network algorithms that impose this fairness criteria to solve the problem.

    One algorithm guarantees that the features the model uses to make predictions contain all information about the sensitive attributes in the dataset, such as race and sex, that is relevant to the target variable of interest. Sensitive attributes are features that may not be used for decisions, often due to laws or organizational policies. The second algorithm employs a calibration technique to ensure the model makes the same prediction for an input, regardless of whether any sensitive attributes are added to that input.

    The researchers tested these algorithms by applying them to real-world datasets that could be used in high-stakes decision making. One, an insurance dataset, is used to predict total annual medical expenses charged to patients using demographic statistics; another, a crime dataset, is used to predict the number of violent crimes in communities using socioeconomic information. Both datasets contain sensitive attributes for individuals.

    When they implemented their algorithms on top of a standard machine-learning method for selective regression, they were able to reduce disparities by achieving lower error rates for the minority subgroups in each dataset. Moreover, this was accomplished without significantly impacting the overall error rate.

    “We see that if we don’t impose certain constraints, in cases where the model is really confident, it could actually be making more errors, which could be very costly in some applications, like health care. So if we reverse the trend and make it more intuitive, we will catch a lot of these errors. A major goal of this work is to avoid errors going silently undetected,” Sattigeri says.

    The researchers plan to apply their solutions to other applications, such as predicting house prices, student GPA, or loan interest rate, to see if the algorithms need to be calibrated for those tasks, says Shah. They also want to explore techniques that use less sensitive information during the model training process to avoid privacy issues.

    And they hope to improve the confidence estimates in selective regression to prevent situations where the model’s confidence is low, but its prediction is correct. This could reduce the workload on humans and further streamline the decision-making process, Sattigeri says.

    This research was funded, in part, by the MIT-IBM Watson AI Lab and its member companies Boston Scientific, Samsung, and Wells Fargo, and by the National Science Foundation. More

  • in

    Teaching AI to ask clinical questions

    Physicians often query a patient’s electronic health record for information that helps them make treatment decisions, but the cumbersome nature of these records hampers the process. Research has shown that even when a doctor has been trained to use an electronic health record (EHR), finding an answer to just one question can take, on average, more than eight minutes.

    The more time physicians must spend navigating an oftentimes clunky EHR interface, the less time they have to interact with patients and provide treatment.

    Researchers have begun developing machine-learning models that can streamline the process by automatically finding information physicians need in an EHR. However, training effective models requires huge datasets of relevant medical questions, which are often hard to come by due to privacy restrictions. Existing models struggle to generate authentic questions — those that would be asked by a human doctor — and are often unable to successfully find correct answers.

    To overcome this data shortage, researchers at MIT partnered with medical experts to study the questions physicians ask when reviewing EHRs. Then, they built a publicly available dataset of more than 2,000 clinically relevant questions written by these medical experts.

    When they used their dataset to train a machine-learning model to generate clinical questions, they found that the model asked high-quality and authentic questions, as compared to real questions from medical experts, more than 60 percent of the time.

    With this dataset, they plan to generate vast numbers of authentic medical questions and then use those questions to train a machine-learning model which would help doctors find sought-after information in a patient’s record more efficiently.

    “Two thousand questions may sound like a lot, but when you look at machine-learning models being trained nowadays, they have so much data, maybe billions of data points. When you train machine-learning models to work in health care settings, you have to be really creative because there is such a lack of data,” says lead author Eric Lehman, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    The senior author is Peter Szolovits, a professor in the Department of Electrical Engineering and Computer Science (EECS) who heads the Clinical Decision-Making Group in CSAIL and is also a member of the MIT-IBM Watson AI Lab. The research paper, a collaboration between co-authors at MIT, the MIT-IBM Watson AI Lab, IBM Research, and the doctors and medical experts who helped create questions and participated in the study, will be presented at the annual conference of the North American Chapter of the Association for Computational Linguistics.

    “Realistic data is critical for training models that are relevant to the task yet difficult to find or create,” Szolovits says. “The value of this work is in carefully collecting questions asked by clinicians about patient cases, from which we are able to develop methods that use these data and general language models to ask further plausible questions.”

    Data deficiency

    The few large datasets of clinical questions the researchers were able to find had a host of issues, Lehman explains. Some were composed of medical questions asked by patients on web forums, which are a far cry from physician questions. Other datasets contained questions produced from templates, so they are mostly identical in structure, making many questions unrealistic.

    “Collecting high-quality data is really important for doing machine-learning tasks, especially in a health care context, and we’ve shown that it can be done,” Lehman says.

    To build their dataset, the MIT researchers worked with practicing physicians and medical students in their last year of training. They gave these medical experts more than 100 EHR discharge summaries and told them to read through a summary and ask any questions they might have. The researchers didn’t put any restrictions on question types or structures in an effort to gather natural questions. They also asked the medical experts to identify the “trigger text” in the EHR that led them to ask each question.

    For instance, a medical expert might read a note in the EHR that says a patient’s past medical history is significant for prostate cancer and hypothyroidism. The trigger text “prostate cancer” could lead the expert to ask questions like “date of diagnosis?” or “any interventions done?”

    They found that most questions focused on symptoms, treatments, or the patient’s test results. While these findings weren’t unexpected, quantifying the number of questions about each broad topic will help them build an effective dataset for use in a real, clinical setting, says Lehman.

    Once they had compiled their dataset of questions and accompanying trigger text, they used it to train machine-learning models to ask new questions based on the trigger text.

    Then the medical experts determined whether those questions were “good” using four metrics: understandability (Does the question make sense to a human physician?), triviality (Is the question too easily answerable from the trigger text?), medical relevance (Does it makes sense to ask this question based on the context?), and relevancy to the trigger (Is the trigger related to the question?).

    Cause for concern

    The researchers found that when a model was given trigger text, it was able to generate a good question 63 percent of the time, whereas a human physician would ask a good question 80 percent of the time.

    They also trained models to recover answers to clinical questions using the publicly available datasets they had found at the outset of this project. Then they tested these trained models to see if they could find answers to “good” questions asked by human medical experts.

    The models were only able to recover about 25 percent of answers to physician-generated questions.

    “That result is really concerning. What people thought were good-performing models were, in practice, just awful because the evaluation questions they were testing on were not good to begin with,” Lehman says.

    The team is now applying this work toward their initial goal: building a model that can automatically answer physicians’ questions in an EHR. For the next step, they will use their dataset to train a machine-learning model that can automatically generate thousands or millions of good clinical questions, which can then be used to train a new model for automatic question answering.

    While there is still much work to do before that model could be a reality, Lehman is encouraged by the strong initial results the team demonstrated with this dataset.

    This research was supported, in part, by the MIT-IBM Watson AI Lab. Additional co-authors include Leo Anthony Celi of the MIT Institute for Medical Engineering and Science; Preethi Raghavan and Jennifer J. Liang of the MIT-IBM Watson AI Lab; Dana Moukheiber of the University of Buffalo; Vladislav Lialin and Anna Rumshisky of the University of Massachusetts at Lowell; Katelyn Legaspi, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, and Pia Gabrielle I. Alfonso of the University of the Philippines; Anne Janelle R. Sy and Patricia Therese S. Pile of the University of the East Ramon Magsaysay Memorial Medical Center; Marianne Taliño of the Ateneo de Manila University School of Medicine and Public Health; and Byron C. Wallace of Northeastern University. More

  • in

    Hurricane-resistant construction may be undervalued by billions of dollars annually

    In Florida, June typically marks the beginning of hurricane season. Preparation for a storm may appear as otherworldly as it is routine: businesses and homes board up windows and doors, bottled water is quick to sell out, and public buildings cease operations to serve as emergency shelters.

    What happens next may be unpredictable. If things take a turn for the worse, myriad homes may be leveled. A 2019 Congressional Budget Office report estimated that hurricane-related wind damage causes $14 billion in losses to the residential sector annually. 

    However, new research led by Ipek Bensu Manav, an MIT graduate student in civil and environmental engineering and research assistant at MIT’s Concrete Sustainability Hub, suggests that the value of mitigating this wind damage through stronger construction methods may be significantly underestimated. 

    In fact, the failure of wind loss models to account for neighborhood texture — the density and configuration of surrounding buildings with respect to a building of interest — may result in an over 80 percent undervaluation of these methods in Florida.

    Methodology

    Hazus, a loss estimation tool developed and currently used by the Federal Emergency Management Agency (FEMA), estimates physical and economic damage to buildings due to wind and windborne debris. However, the tool assumes that all buildings in a neighborhood experience the same wind loading.

    Manav notes that this assumption disregards the complexity of neighborhood texture. Buildings of different shapes and sizes can be arranged in innumerable ways. This arrangement can amplify or reduce the wind load on buildings within the neighborhood. 

    Wind load amplifications and reductions result from effects referred to as tunneling and shielding. Densely built-up areas with grid-like layouts are particularly susceptible to wind tunneling effects. You might have experienced these effects yourself walking down a windy street, such as Main Street in Cambridge, Massachusetts, near the MIT campus, only to turn the corner and feel calmer air.

    To address this, Manav and her team sought to create a hurricane loss model that accounts for neighborhood texture. By combining GIS files, census tract data, and models of wind recurrence and structural performance, the researchers constructed a high-resolution estimate of expected wind-related structural losses, as well as the benefits of mitigation to reduce those losses. 

    The model builds on prior research led by Jacob Roxon, a recent CSHub postdoc and co-author of this paper, who developed an empirical relationship that estimates building-specific wind gusts with information about building layout in a given neighborhood. 

    A challenge the researchers had to overcome was the fact that the building footprints that were available for this estimation have little-to-no information on occupancy and building type.

    Manav addressed this by developing a novel statistical model that assigns occupancy and building types to structures based on characteristics of the census tract in which they are located.

    Analysis and cost perspective

    The researchers then estimated the value of stronger construction in a case study of residential buildings in Florida. This involved modeling the impact of several mitigation measures applied to over 9.3 million housing units spread across 6.9 million buildings.

    A map of effective wind speed ratio in Florida. Orange coloration indicates census tracts where, on average, structures experience amplifications in wind loads beyond what current tools estimate. Blue coloration indicates census tracts where, on average, structures experience reductions in wind loads.

    Image courtesy of the MIT Concrete Sustainability Hub.

    Previous item
    Next item

    Texture-related loss implications were found to be higher in census tracts along the coast. This occurs because these areas tend to be more dense and ordered, leading to higher wind load amplifications. Also, these loss implications are particularly high for single-family homes, which are more susceptible to damage and have a higher replacement cost per housing unit.

    “Our results sound the alarm that wind loads are more severe than we think,” says Manav. “That is not even accounting for climate change, which might make hurricanes more frequent and their wind speeds more intense over time.”

    The researchers computed expected losses and benefits statewide for hurricane wind damage and its mitigation. They found that $8.1 billion could be saved per year in a scenario where all homes were mitigated with simple measures such as stronger connections between roofs and walls or tighter nail spacing.

    Conventional loss estimation models value these same measures as saving only $4.4 billion per year. This means that conventional models are underestimating the value of stronger construction by over 80 percent.

    “It is important that the benefits of resilient design be quantified so that financial incentives — whether lending, insurance, or otherwise — can be brought to bear to increase mitigation. Manav’s research will move the industry forward toward justifying these benefits,” says structural engineer Evan Reis, who is the executive director of the U.S. Resiliency Council.

    Further implications

    The paper recommends that coastal states enhance their building codes, especially in densely built-up areas, to save dollars and save lives. Manav notes that current building codes do not sufficiently account for texture-induced load amplifications. 

    “Even a building built to code may not be able to protect you and your family,” says Manav. “We need to properly quantify the benefits of mitigating in areas that are exposed to high winds so we promote the right standards of construction where losses can be catastrophic.”

    A goal of Manav’s work is to provide citizens with the information they need before disaster strikes. She has created an online dashboard where you can preview the potential benefits of applying mitigation measures in different communities — perhaps even your own.

    “During my research, I kept hitting a wall. I found that it was difficult to use publicly available information to piece together the bigger picture,” she comments. “We started developing the dashboard to equip homeowners and stakeholders with accessible and actionable information.”

    As a next step, Manav is investigating socioeconomic consequences of hurricane wind damage. 

    “High-resolution analysis, like our case study, allows us to simulate individual household impacts within a geographical context,” adds Manav. “With this, we can capture how differing availability of financial resources may influence how communities cope with the aftermath of natural hazards.” More

  • in

    Building explainability into the components of machine-learning models

    Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. For example, if a model predicts a patient’s risk of developing cardiac disease, a physician might want to know how strongly the patient’s heart rate data influences that prediction.

    But if those features are so complex or convoluted that the user can’t understand them, does the explanation method do any good?

    MIT researchers are striving to improve the interpretability of features so decision makers will be more comfortable using the outputs of machine-learning models. Drawing on years of field work, they developed a taxonomy to help developers craft features that will be easier for their target audience to understand.

    “We found that out in the real world, even though we were using state-of-the-art ways of explaining machine-learning models, there is still a lot of confusion stemming from the features, not from the model itself,” says Alexandra Zytek, an electrical engineering and computer science PhD student and lead author of a paper introducing the taxonomy.

    To build the taxonomy, the researchers defined properties that make features interpretable for five types of users, from artificial intelligence experts to the people affected by a machine-learning model’s prediction. They also offer instructions for how model creators can transform features into formats that will be easier for a layperson to comprehend.

    They hope their work will inspire model builders to consider using interpretable features from the beginning of the development process, rather than trying to work backward and focus on explainability after the fact.

    MIT co-authors include Dongyu Liu, a postdoc; visiting professor Laure Berti-Équille, research director at IRD; and senior author Kalyan Veeramachaneni, principal research scientist in the Laboratory for Information and Decision Systems (LIDS) and leader of the Data to AI group. They are joined by Ignacio Arnaldo, a principal data scientist at Corelight. The research is published in the June edition of the Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining’s peer-reviewed Explorations Newsletter.

    Real-world lessons

    Features are input variables that are fed to machine-learning models; they are usually drawn from the columns in a dataset. Data scientists typically select and handcraft features for the model, and they mainly focus on ensuring features are developed to improve model accuracy, not on whether a decision-maker can understand them, Veeramachaneni explains.

    For several years, he and his team have worked with decision makers to identify machine-learning usability challenges. These domain experts, most of whom lack machine-learning knowledge, often don’t trust models because they don’t understand the features that influence predictions.

    For one project, they partnered with clinicians in a hospital ICU who used machine learning to predict the risk a patient will face complications after cardiac surgery. Some features were presented as aggregated values, like the trend of a patient’s heart rate over time. While features coded this way were “model ready” (the model could process the data), clinicians didn’t understand how they were computed. They would rather see how these aggregated features relate to original values, so they could identify anomalies in a patient’s heart rate, Liu says.

    By contrast, a group of learning scientists preferred features that were aggregated. Instead of having a feature like “number of posts a student made on discussion forums” they would rather have related features grouped together and labeled with terms they understood, like “participation.”

    “With interpretability, one size doesn’t fit all. When you go from area to area, there are different needs. And interpretability itself has many levels,” Veeramachaneni says.

    The idea that one size doesn’t fit all is key to the researchers’ taxonomy. They define properties that can make features more or less interpretable for different decision makers and outline which properties are likely most important to specific users.

    For instance, machine-learning developers might focus on having features that are compatible with the model and predictive, meaning they are expected to improve the model’s performance.

    On the other hand, decision makers with no machine-learning experience might be better served by features that are human-worded, meaning they are described in a way that is natural for users, and understandable, meaning they refer to real-world metrics users can reason about.

    “The taxonomy says, if you are making interpretable features, to what level are they interpretable? You may not need all levels, depending on the type of domain experts you are working with,” Zytek says.

    Putting interpretability first

    The researchers also outline feature engineering techniques a developer can employ to make features more interpretable for a specific audience.

    Feature engineering is a process in which data scientists transform data into a format machine-learning models can process, using techniques like aggregating data or normalizing values. Most models also can’t process categorical data unless they are converted to a numerical code. These transformations are often nearly impossible for laypeople to unpack.

    Creating interpretable features might involve undoing some of that encoding, Zytek says. For instance, a common feature engineering technique organizes spans of data so they all contain the same number of years. To make these features more interpretable, one could group age ranges using human terms, like infant, toddler, child, and teen. Or rather than using a transformed feature like average pulse rate, an interpretable feature might simply be the actual pulse rate data, Liu adds.

    “In a lot of domains, the tradeoff between interpretable features and model accuracy is actually very small. When we were working with child welfare screeners, for example, we retrained the model using only features that met our definitions for interpretability, and the performance decrease was almost negligible,” Zytek says.

    Building off this work, the researchers are developing a system that enables a model developer to handle complicated feature transformations in a more efficient manner, to create human-centered explanations for machine-learning models. This new system will also convert algorithms designed to explain model-ready datasets into formats that can be understood by decision makers. More

  • in

    3 Questions: Marking the 10th anniversary of the Higgs boson discovery

    This July 4 marks 10 years since the discovery of the Higgs boson, the long-sought particle that imparts mass to all elementary particles. The elusive particle was the last missing piece in the Standard Model of particle physics, which is our most complete model of the universe.

    In early summer of 2012, signs of the Higgs particle were detected in the Large Hadron Collider (LHC), the world’s largest particle accelerator, which is operated by CERN, the European Organization for Nuclear Research. The LHC is engineered to smash together billions upon billions of protons for the chance at producing the Higgs boson and other particles that are predicted to have been created in the early universe.

    In analyzing the products of countless proton-on-proton collisions, scientists registered a Higgs-like signal in the accelerator’s two independent detectors, ATLAS and CMS (the Compact Muon Solenoid). Specifically, the teams observed signs that a new particle had been created and then decayed to two photons, two Z bosons or two W bosons, and that this new particle was likely the Higgs boson.

    The discovery was revealed within the CMS collaboration, including over 3,000 scientists, on June 15, and ATLAS and CMS announced their respective observations to the world on July 4. More than 50 MIT physicists and students contributed to the CMS experiment, including Christoph Paus, professor of physics, who was one of the experiment’s two lead investigators to organize the search for the Higgs boson.

    As the LHC prepares to start back up on July 5 with “Run 3,” MIT News spoke with Paus about what physicists have learned about the Higgs particle in the last 10 years, and what they hope to discover with this next deluge of particle data.

    Q: Looking back, what do you remember as the key moments leading up to the Higgs boson’s discovery?

    A: I remember that by the end of 2011, we had taken a significant amount of data, and there were some first hints that there could be something, but nothing that was conclusive enough. It was clear to everybody that we were entering the critical phase of a potential discovery. We still wanted to improve our searches, and so we decided, which I felt was one of the most important decisions we took, that we had to remove the bias — that is, remove our knowledge about where the signal could appear. Because it’s dangerous as a scientist to say, “I know the solution,” which can influence the result unconsciously. So, we made that decision together in the coordination group and said, we are going to get rid of this bias by doing what people refer to as a “blind” analysis. This allowed the analyzers to focus on the technical aspects, making sure everything was correct without having to worry about being influenced by what they saw.

    Then, of course, there had to be the moment where we unblind the data and really look to see, is the Higgs there or not. And about two weeks before the scheduled presentations on July 4 where we eventually announced the discovery, there was a meeting on June 15 to show the analysis with its results to the collaboration. The most significant analysis turned out to be the two-photon analysis. One of my students, Joshua Bendavid PhD ’13, was leading that analysis, and the night before the meeting, only he and another person on the team were allowed to unblind the data. They were working until 2 in the morning, when they finally pushed a button to see what it looks like. And they were the first in CMS to have that moment of seeing that [the Higgs boson] was there. Another student of mine who was working on this analysis, Mingming Yang PhD ’15, presented the results of that search to the Collaboration at CERN that following afternoon. It was a very exciting moment for all of us. The room was hot and filled with electricity.

    The scientific process of the discovery was very well-designed and executed, and I think it can serve as a blueprint for how people should do such searches.

    Q: What more have scientists learned of the Higgs boson since the particle’s detection?

    A: At the time of the discovery, something interesting happened I did not really expect. While we were always talking about the Higgs boson before, we became very careful once we saw that “narrow peak.” How could we be sure that it was the Higgs boson and not something else? It certainly looked like the Higgs boson, but our vision was quite blurry. It could have turned out in the following years that it was not the Higgs boson. But as we now know, with so much more data, everything is completely consistent with what the Higgs boson is predicted to look like, so we became comfortable with calling the narrow resonance not just a Higgs-like particle but rather simply the Higgs boson. And there were a few milestones that made sure this is really the Higgs as we know it.

    The initial discovery was based on Higgs bosons decaying to two photons, two Z bosons or two W bosons. That was only a small fraction of decays that the Higgs could undergo. There are many more. The amount of decays of the Higgs boson into a particular set of particles depends critically on their masses. This characteristic feature is essential to confirm that we are really dealing with the Higgs boson.

    What we found since then is that the Higgs boson does not only decay to bosons, but also to fermions, which is not obvious because bosons are force carrier particles while fermions are matter particles. The first new decay was the decay to tau leptons, the heavier sibling of the electron. The next step was the observation of the Higgs boson decaying to b quarks, the heaviest quark that the Higgs can decay to. The b quark is the heaviest sibling of the down quark, which is a building block of protons and neutrons and thus all atomic nuclei around us. These two fermions are part of the heaviest generation of fermions in the standard model. Only recently the Higgs boson was observed to decay to muons, the charge lepton of the second and thus lighter generation, at the expected rate. Also, the direct coupling to the heaviest  top quark was established, which spans together with the muons four orders of magnitudes in terms of their masses, and the Higgs coupling behaves as expected over this wide range.

    Q: As the Large Hadron Collider gears up for its new “Run 3,” what do you hope to discover next?

    One very interesting question that Run 3 might give us some first hints on is the self-coupling of the Higgs boson. As the Higgs couples to any massive particle, it can also couple to itself. It is unlikely that there is enough data to make a discovery, but first hints of this coupling would be very exciting to see, and this constitutes a fundamentally different test than what has been done so far.

    Another interesting aspect that more data will help to elucidate is the question of whether the Higgs boson might be a portal and decay to invisible particles that could be candidates for explaining the mystery of dark matter in the universe. This is not predicted in our standard model and thus would unveil the Higgs boson as an imposter.

    Of course, we want to double down on all the measurements we have made so far and see whether they continue to line up with our expectations.

    This is true also for the upcoming major upgrade of the LHC (runs starting in 2029) for what we refer to as the High Luminosity LHC (HL-LHC). Another factor of 10 more events will be accumulated during this program, which for the Higgs boson means we will be able to observe its self-coupling. For the far future, there are plans for a Future Circular Collider, which could ultimately measure the total decay width of the Higgs boson independent of its decay mode, which would be another important and very precise test whether the Higgs boson is an imposter.

    As any other good physicist, I hope though that we can find a crack in the armor of the Standard Model, which is so far holding up all too well. There are a number of very important observations, for example the nature of dark matter, that cannot be explained using the Standard Model. All of our future studies, from Run 3 starting on July 5 to the very in the future FCC, will give us access to entirely uncharted territory. New phenomena can pop up, and I like to be optimistic. More

  • in

    Exploring emerging topics in artificial intelligence policy

    Members of the public sector, private sector, and academia convened for the second AI Policy Forum Symposium last month to explore critical directions and questions posed by artificial intelligence in our economies and societies.

    The virtual event, hosted by the AI Policy Forum (AIPF) — an undertaking by the MIT Schwarzman College of Computing to bridge high-level principles of AI policy with the practices and trade-offs of governing — brought together an array of distinguished panelists to delve into four cross-cutting topics: law, auditing, health care, and mobility.

    In the last year there have been substantial changes in the regulatory and policy landscape around AI in several countries — most notably in Europe with the development of the European Union Artificial Intelligence Act, the first attempt by a major regulator to propose a law on artificial intelligence. In the United States, the National AI Initiative Act of 2020, which became law in January 2021, is providing a coordinated program across federal government to accelerate AI research and application for economic prosperity and security gains. Finally, China recently advanced several new regulations of its own.

    Each of these developments represents a different approach to legislating AI, but what makes a good AI law? And when should AI legislation be based on binding rules with penalties versus establishing voluntary guidelines?

    Jonathan Zittrain, professor of international law at Harvard Law School and director of the Berkman Klein Center for Internet and Society, says the self-regulatory approach taken during the expansion of the internet had its limitations with companies struggling to balance their interests with those of their industry and the public.

    “One lesson might be that actually having representative government take an active role early on is a good idea,” he says. “It’s just that they’re challenged by the fact that there appears to be two phases in this environment of regulation. One, too early to tell, and two, too late to do anything about it. In AI I think a lot of people would say we’re still in the ‘too early to tell’ stage but given that there’s no middle zone before it’s too late, it might still call for some regulation.”

    A theme that came up repeatedly throughout the first panel on AI laws — a conversation moderated by Dan Huttenlocher, dean of the MIT Schwarzman College of Computing and chair of the AI Policy Forum — was the notion of trust. “If you told me the truth consistently, I would say you are an honest person. If AI could provide something similar, something that I can say is consistent and is the same, then I would say it’s trusted AI,” says Bitange Ndemo, professor of entrepreneurship at the University of Nairobi and the former permanent secretary of Kenya’s Ministry of Information and Communication.

    Eva Kaili, vice president of the European Parliament, adds that “In Europe, whenever you use something, like any medication, you know that it has been checked. You know you can trust it. You know the controls are there. We have to achieve the same with AI.” Kalli further stresses that building trust in AI systems will not only lead to people using more applications in a safe manner, but that AI itself will reap benefits as greater amounts of data will be generated as a result.

    The rapidly increasing applicability of AI across fields has prompted the need to address both the opportunities and challenges of emerging technologies and the impact they have on social and ethical issues such as privacy, fairness, bias, transparency, and accountability. In health care, for example, new techniques in machine learning have shown enormous promise for improving quality and efficiency, but questions of equity, data access and privacy, safety and reliability, and immunology and global health surveillance remain at large.

    MIT’s Marzyeh Ghassemi, an assistant professor in the Department of Electrical Engineering and Computer Science and the Institute for Medical Engineering and Science, and David Sontag, an associate professor of electrical engineering and computer science, collaborated with Ziad Obermeyer, an associate professor of health policy and management at the University of California Berkeley School of Public Health, to organize AIPF Health Wide Reach, a series of sessions to discuss issues of data sharing and privacy in clinical AI. The organizers assembled experts devoted to AI, policy, and health from around the world with the goal of understanding what can be done to decrease barriers to access to high-quality health data to advance more innovative, robust, and inclusive research results while being respectful of patient privacy.

    Over the course of the series, members of the group presented on a topic of expertise and were tasked with proposing concrete policy approaches to the challenge discussed. Drawing on these wide-ranging conversations, participants unveiled their findings during the symposium, covering nonprofit and government success stories and limited access models; upside demonstrations; legal frameworks, regulation, and funding; technical approaches to privacy; and infrastructure and data sharing. The group then discussed some of their recommendations that are summarized in a report that will be released soon.

    One of the findings calls for the need to make more data available for research use. Recommendations that stem from this finding include updating regulations to promote data sharing to enable easier access to safe harbors such as the Health Insurance Portability and Accountability Act (HIPAA) has for de-identification, as well as expanding funding for private health institutions to curate datasets, amongst others. Another finding, to remove barriers to data for researchers, supports a recommendation to decrease obstacles to research and development on federally created health data. “If this is data that should be accessible because it’s funded by some federal entity, we should easily establish the steps that are going to be part of gaining access to that so that it’s a more inclusive and equitable set of research opportunities for all,” says Ghassemi. The group also recommends taking a careful look at the ethical principles that govern data sharing. While there are already many principles proposed around this, Ghassemi says that “obviously you can’t satisfy all levers or buttons at once, but we think that this is a trade-off that’s very important to think through intelligently.”

    In addition to law and health care, other facets of AI policy explored during the event included auditing and monitoring AI systems at scale, and the role AI plays in mobility and the range of technical, business, and policy challenges for autonomous vehicles in particular.

    The AI Policy Forum Symposium was an effort to bring together communities of practice with the shared aim of designing the next chapter of AI. In his closing remarks, Aleksander Madry, the Cadence Designs Systems Professor of Computing at MIT and faculty co-lead of the AI Policy Forum, emphasized the importance of collaboration and the need for different communities to communicate with each other in order to truly make an impact in the AI policy space.

    “The dream here is that we all can meet together — researchers, industry, policymakers, and other stakeholders — and really talk to each other, understand each other’s concerns, and think together about solutions,” Madry said. “This is the mission of the AI Policy Forum and this is what we want to enable.” More