More stories

  • in

    MIT PhD students honored for their work to solve critical issues in water and food

    In 2017, the Abdul Latif Jameel Water and Food Systems Lab (J-WAFS) initiated the J-WAFS Fellowship Program for outstanding MIT PhD students working to solve humankind’s water-related challenges. Since then, J-WAFS has awarded 18 fellowships to students who have gone on to create innovations like a pump that can maximize energy efficiency even with changing flow rates, and a low-cost water filter made out of sapwood xylem that has seen real-world use in rural India. Last year, J-WAFS expanded eligibility to students with food-related research. The 2022 fellows included students working on micronutrient deficiency and plastic waste from traditional food packaging materials. 

    Today, J-WAFS has announced the award of the 2023-24 fellowships to Gokul Sampath and Jie Yun. A doctoral student in the Department of Urban Studies and planning, Sampath has been awarded the Rasikbhai L. Meswani Fellowship for Water Solutions, which is supported through a generous gift from Elina and Nikhil Meswani and family. Yun, who is in the Department of Civil and Environmental Engineering, received a J-WAFS Fellowship for Water and Food Solutions, which is funded by the J-WAFS Research Affiliate Program. Currently, Xylem, Inc. and GoAigua are J-WAFS’ Research Affiliate companies. A review committee comprised of MIT faculty and staff selected Sampath and Yun from a competitive field of outstanding graduate students working in water and food who were nominated by their faculty advisors. Sampath and Yun will receive one academic semester of funding, along with opportunities for networking and mentoring to advance their research.

    “Both Yun and Sampath have demonstrated excellence in their research,” says J-WAFS executive director Renee J. Robins. “They also stood out in their communication skills and their passion to work on issues of agricultural sustainability and resilience and access to safe water. We are so pleased to have them join our inspiring group of J-WAFS fellows,” she adds.

    Using behavioral health strategies to address the arsenic crisis in India and Bangladesh

    Gokul Sampath’s research centers on ways to improve access to safe drinking water in developing countries. A PhD candidate in the International Development Group in the Department of Urban Studies and Planning, his current work examines the issue of arsenic in drinking water sources in India and Bangladesh. In Eastern India, millions of shallow tube wells provide rural households a personal water source that is convenient, free, and mostly safe from cholera. Unfortunately, it is now known that one-in-four of these wells is contaminated with naturally occurring arsenic at levels dangerous to human health. As a result, approximately 40 million people across the region are at elevated risk of cancer, stroke, and heart disease from arsenic consumed through drinking water and cooked food. 

    Since the discovery of arsenic in wells in the late 1980s, governments and nongovernmental organizations have sought to address the problem in rural villages by providing safe community water sources. Yet despite access to safe alternatives, many households still consume water from their contaminated home wells. Sampath’s research seeks to understand the constraints and trade-offs that account for why many villagers don’t collect water from arsenic-safe government wells in the village, even when they know their own wells at home could be contaminated.

    Before coming to MIT, Sampath received a master’s degree in Middle East, South Asian, and African studies from Columbia University, as well as a bachelor’s degree in microbiology and history from the University of California at Davis. He has long worked on water management in India, beginning in 2015 as a Fulbright scholar studying households’ water source choices in arsenic-affected areas of the state of West Bengal. He also served as a senior research associate with the Abdul Latif Jameel Poverty Action Lab, where he conducted randomized evaluations of market incentives for groundwater conservation in Gujarat, India. Sampath’s advisor, Bishwapriya Sanyal, the Ford International Professor of Urban Development and Planning at MIT, says Sampath has shown “remarkable hard work and dedication.” In addition to his classes and research, Sampath taught the department’s undergraduate Introduction to International Development course, for which he received standout evaluations from students.

    This summer, Sampath will travel to India to conduct field work in four arsenic-affected villages in West Bengal to understand how social influence shapes villagers’ choices between arsenic-safe and unsafe water sources. Through longitudinal surveys, he hopes to connect data on the social ties between families in villages and the daily water source choices they make. Exclusionary practices in Indian village communities, especially the segregation of water sources on the basis of caste and religion, has long been suspected to be a barrier to equitable drinking water access in Indian villages. Yet despite this, planners seeking to expand safe water access in diverse Indian villages have rarely considered the way social divisions within communities might be working against their efforts. Sampath hopes to test whether the injunctive norms enabled by caste ties constrain villagers’ ability to choose the safest water source among those shared within the village. When he returns to MIT in the fall, he plans to dive into analyzing his survey data and start work on a publication.

    Understanding plant responses to stress to improve crop drought resistance and yield

    Plants, including crops, play a fundamental role in Earth’s ecosystems through their effects on climate, air quality, and water availability. At the same time, plants grown for agriculture put a burden on the environment as they require energy, irrigation, and chemical inputs. Understanding plant/environment interactions is becoming more and more important as intensifying drought is straining agricultural systems. Jie Yun, a PhD student in the Department of Civil and Environmental Engineering, is studying plant response to drought stress in the hopes of improving agricultural sustainability and yield under climate change.  Yun’s research focuses on genotype-by-environment interaction (GxE.) This relates to the observation that plant varieties respond to environmental changes differently. The effects of GxE in crop breeding can be exploited because differing environmental responses among varieties enables breeders to select for plants that demonstrate high stress-tolerant genotypes under particular growing conditions. Yun bases her studies on Brachypodium, a model grass species related to wheat, oat, barley, rye, and perennial forage grasses. By experimenting with this species, findings can be directly applied to cereal and forage crop improvement. For the first part of her thesis, Yun collaborated with Professor Caroline Uhler’s group in the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society. Uhler’s computational tools helped Yun to evaluate gene regulatory networks and how they relate to plant resilience and environmental adaptation. This work will help identify the types of genes and pathways that drive differences in drought stress response among plant varieties.  David Des Marais, the Cecil and Ida Green Career Development Professor in the Department of Civil and Environmental Engineering, is Yun’s advisor. He notes, “throughout Jie’s time [at MIT] I have been struck by her intellectual curiosity, verging on fearlessness.” When she’s not mentoring undergraduate students in Des Marais’ lab, Yun is working on the second part of her project: how carbon allocation in plants and growth is affected by soil drying. One result of this work will be to understand which populations of plants harbor the necessary genetic diversity to adapt or acclimate to climate change. Another likely impact is identifying targets for the genetic improvement of crop species to increase crop yields with less water supply. Growing up in China, Yun witnessed environmental issues springing from the development of the steel industry, which caused contamination of rivers in her hometown. On one visit to her aunt’s house in rural China, she learned that water pollution was widespread after noticing wastewater was piped outside of the house into nearby farmland without being treated. These experiences led Yun to study water supply and sewage engineering for her undergraduate degree at Shenyang Jianzhu University. She then went on to complete a master’s program in civil and environmental engineering at Carnegie Mellon University. It was there that Yun discovered a passion for plant-environment interactions; during an independent study on perfluorooctanoic sulfonate, she realized the amazing ability of plants to adapt to environmental changes, toxins, and stresses. Her goal is to continue researching plant and environment interactions and to translate the latest scientific findings into applications that can improve food security. More

  • in

    Study: Shutting down nuclear power could increase air pollution

    Nearly 20 percent of today’s electricity in the United States comes from nuclear power. The U.S. has the largest nuclear fleet in the world, with 92 reactors scattered around the country. Many of these power plants have run for more than half a century and are approaching the end of their expected lifetimes.

    Policymakers are debating whether to retire the aging reactors or reinforce their structures to continue producing nuclear energy, which many consider a low-carbon alternative to climate-warming coal, oil, and natural gas.

    Now, MIT researchers say there’s another factor to consider in weighing the future of nuclear power: air quality. In addition to being a low carbon-emitting source, nuclear power is relatively clean in terms of the air pollution it generates. Without nuclear power, how would the pattern of air pollution shift, and who would feel its effects?

    The MIT team took on these questions in a new study appearing today in Nature Energy. They lay out a scenario in which every nuclear power plant in the country has shut down, and consider how other sources such as coal, natural gas, and renewable energy would fill the resulting energy needs throughout an entire year.

    Their analysis reveals that indeed, air pollution would increase, as coal, gas, and oil sources ramp up to compensate for nuclear power’s absence. This in itself may not be surprising, but the team has put numbers to the prediction, estimating that the increase in air pollution would have serious health effects, resulting in an additional 5,200 pollution-related deaths over a single year.

    If, however, more renewable energy sources become available to supply the energy grid, as they are expected to by the year 2030, air pollution would be curtailed, though not entirely. The team found that even under this heartier renewable scenario, there is still a slight increase in air pollution in some parts of the country, resulting in a total of 260 pollution-related deaths over one year.

    When they looked at the populations directly affected by the increased pollution, they found that Black or African American communities — a disproportionate number of whom live near fossil-fuel plants — experienced the greatest exposure.

    “This adds one more layer to the environmental health and social impacts equation when you’re thinking about nuclear shutdowns, where the conversation often focuses on local risks due to accidents and mining or long-term climate impacts,” says lead author Lyssa Freese, a graduate student in MIT’s Department of Earth, Atmospheric and Planetary Sciences (EAPS).

    “In the debate over keeping nuclear power plants open, air quality has not been a focus of that discussion,” adds study author Noelle Selin, a professor in MIT’s Institute for Data, Systems, and Society (IDSS) and EAPS. “What we found was that air pollution from fossil fuel plants is so damaging, that anything that increases it, such as a nuclear shutdown, is going to have substantial impacts, and for some people more than others.”

    The study’s MIT-affiliated co-authors also include Principal Research Scientist Sebastian Eastham and Guillaume Chossière SM ’17, PhD ’20, along with Alan Jenn of the University of California at Davis.

    Future phase-outs

    When nuclear power plants have closed in the past, fossil fuel use increased in response. In 1985, the closure of reactors in Tennessee Valley prompted a spike in coal use, while the 2012 shutdown of a plant in California led to an increase in natural gas. In Germany, where nuclear power has almost completely been phased out, coal-fired power increased initially to fill the gap.

    Noting these trends, the MIT team wondered how the U.S. energy grid would respond if nuclear power were completely phased out.

    “We wanted to think about what future changes were expected in the energy grid,” Freese says. “We knew that coal use was declining, and there was a lot of work already looking at the impact of what that would have on air quality. But no one had looked at air quality and nuclear power, which we also noticed was on the decline.”

    In the new study, the team used an energy grid dispatch model developed by Jenn to assess how the U.S. energy system would respond to a shutdown of nuclear power. The model simulates the production of every power plant in the country and runs continuously to estimate, hour by hour, the energy demands in 64 regions across the country.

    Much like the way the actual energy market operates, the model chooses to turn a plant’s production up or down based on cost: Plants producing the cheapest energy at any given time are given priority to supply the grid over more costly energy sources.

    The team fed the model available data on each plant’s changing emissions and energy costs throughout an entire year. They then ran the model under different scenarios, including: an energy grid with no nuclear power, a baseline grid similar to today’s that includes nuclear power, and a grid with no nuclear power that also incorporates the additional renewable sources that are expected to be added by 2030.

    They combined each simulation with an atmospheric chemistry model to simulate how each plant’s various emissions travel around the country and to overlay these tracks onto maps of population density. For populations in the path of pollution, they calculated the risk of premature death based on their degree of exposure.

    System response

    Play video

    Courtesy of the researchers, edited by MIT News

    Their analysis showed a clear pattern: Without nuclear power, air pollution worsened in general, mainly affecting regions in the East Coast, where nuclear power plants are mostly concentrated. Without those plants, the team observed an uptick in production from coal and gas plants, resulting in 5,200 pollution-related deaths across the country, compared to the baseline scenario.

    They also calculated that more people are also likely to die prematurely due to climate impacts from the increase in carbon dioxide emissions, as the grid compensates for nuclear power’s absence. The climate-related effects from this additional influx of carbon dioxide could lead to 160,000 additional deaths over the next century.

    “We need to be thoughtful about how we’re retiring nuclear power plants if we are trying to think about them as part of an energy system,” Freese says. “Shutting down something that doesn’t have direct emissions itself can still lead to increases in emissions, because the grid system will respond.”

    “This might mean that we need to deploy even more renewables, in order to fill the hole left by nuclear, which is essentially a zero-emissions energy source,” Selin adds. “Otherwise we will have a reduction in air quality that we weren’t necessarily counting on.”

    This study was supported, in part, by the U.S. Environmental Protection Agency. More

  • in

    A method for designing neural networks optimally suited for certain tasks

    Neural networks, a type of machine-learning model, are being used to help humans complete a wide variety of tasks, from predicting if someone’s credit score is high enough to qualify for a loan to diagnosing whether a patient has a certain disease. But researchers still have only a limited understanding of how these models work. Whether a given model is optimal for certain task remains an open question.

    MIT researchers have found some answers. They conducted an analysis of neural networks and proved that they can be designed so they are “optimal,” meaning they minimize the probability of misclassifying borrowers or patients into the wrong category when the networks are given a lot of labeled training data. To achieve optimality, these networks must be built with a specific architecture.

    The researchers discovered that, in certain situations, the building blocks that enable a neural network to be optimal are not the ones developers use in practice. These optimal building blocks, derived through the new analysis, are unconventional and haven’t been considered before, the researchers say.

    In a paper published this week in the Proceedings of the National Academy of Sciences, they describe these optimal building blocks, called activation functions, and show how they can be used to design neural networks that achieve better performance on any dataset. The results hold even as the neural networks grow very large. This work could help developers select the correct activation function, enabling them to build neural networks that classify data more accurately in a wide range of application areas, explains senior author Caroline Uhler, a professor in the Department of Electrical Engineering and Computer Science (EECS).

    “While these are new activation functions that have never been used before, they are simple functions that someone could actually implement for a particular problem. This work really shows the importance of having theoretical proofs. If you go after a principled understanding of these models, that can actually lead you to new activation functions that you would otherwise never have thought of,” says Uhler, who is also co-director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS) and its Institute for Data, Systems and Society (IDSS).

    Joining Uhler on the paper are lead author Adityanarayanan Radhakrishnan, an EECS graduate student and an Eric and Wendy Schmidt Center Fellow, and Mikhail Belkin, a professor in the Halicioğlu Data Science Institute at the University of California at San Diego.

    Activation investigation

    A neural network is a type of machine-learning model that is loosely based on the human brain. Many layers of interconnected nodes, or neurons, process data. Researchers train a network to complete a task by showing it millions of examples from a dataset.

    For instance, a network that has been trained to classify images into categories, say dogs and cats, is given an image that has been encoded as numbers. The network performs a series of complex multiplication operations, layer by layer, until the result is just one number. If that number is positive, the network classifies the image a dog, and if it is negative, a cat.

    Activation functions help the network learn complex patterns in the input data. They do this by applying a transformation to the output of one layer before data are sent to the next layer. When researchers build a neural network, they select one activation function to use. They also choose the width of the network (how many neurons are in each layer) and the depth (how many layers are in the network.)

    “It turns out that, if you take the standard activation functions that people use in practice, and keep increasing the depth of the network, it gives you really terrible performance. We show that if you design with different activation functions, as you get more data, your network will get better and better,” says Radhakrishnan.

    He and his collaborators studied a situation in which a neural network is infinitely deep and wide — which means the network is built by continually adding more layers and more nodes — and is trained to perform classification tasks. In classification, the network learns to place data inputs into separate categories.

    “A clean picture”

    After conducting a detailed analysis, the researchers determined that there are only three ways this kind of network can learn to classify inputs. One method classifies an input based on the majority of inputs in the training data; if there are more dogs than cats, it will decide every new input is a dog. Another method classifies by choosing the label (dog or cat) of the training data point that most resembles the new input.

    The third method classifies a new input based on a weighted average of all the training data points that are similar to it. Their analysis shows that this is the only method of the three that leads to optimal performance. They identified a set of activation functions that always use this optimal classification method.

    “That was one of the most surprising things — no matter what you choose for an activation function, it is just going to be one of these three classifiers. We have formulas that will tell you explicitly which of these three it is going to be. It is a very clean picture,” he says.

    They tested this theory on a several classification benchmarking tasks and found that it led to improved performance in many cases. Neural network builders could use their formulas to select an activation function that yields improved classification performance, Radhakrishnan says.

    In the future, the researchers want to use what they’ve learned to analyze situations where they have a limited amount of data and for networks that are not infinitely wide or deep. They also want to apply this analysis to situations where data do not have labels.

    “In deep learning, we want to build theoretically grounded models so we can reliably deploy them in some mission-critical setting. This is a promising approach at getting toward something like that — building architectures in a theoretically grounded way that translates into better results in practice,” he says.

    This work was supported, in part, by the National Science Foundation, Office of Naval Research, the MIT-IBM Watson AI Lab, the Eric and Wendy Schmidt Center at the Broad Institute, and a Simons Investigator Award. More

  • in

    Boosting passenger experience and increasing connectivity at the Hong Kong International Airport

    Recently, a cohort of 36 students from MIT and universities across Hong Kong came together for the MIT Entrepreneurship and Maker Skills Integrator (MEMSI), an intense two-week startup boot camp hosted at the MIT Hong Kong Innovation Node.

    “We’re very excited to be in Hong Kong,” said Professor Charles Sodini, LeBel Professor of Electrical Engineering and faculty director of the Node. “The dream always was to bring MIT and Hong Kong students together.”

    Students collaborated on six teams to meet real-world industry challenges through action learning, defining a problem, designing a solution, and crafting a business plan. The experience culminated in the MEMSI Showcase, where each team presented its process and unique solution to a panel of judges. “The MEMSI program is a great demonstration of important international educational goals for MIT,” says Professor Richard Lester, associate provost for international activities and chair of the Node Steering Committee at MIT. “It creates opportunities for our students to solve problems in a particular and distinctive cultural context, and to learn how innovations can cross international boundaries.” 

    Meeting an urgent challenge in the travel and tourism industry

    The Hong Kong Airport Authority (AAHK) served as the program’s industry partner for the third consecutive year, challenging students to conceive innovative ideas to make passenger travel more personalized from end-to-end while increasing connectivity. As the travel industry resuscitates profitability and welcomes crowds back amidst ongoing delays and labor shortages, the need for a more passenger-centric travel ecosystem is urgent.

    The airport is the third-busiest international passenger airport and the world’s busiest cargo transit. Students experienced an insider’s tour of the Hong Kong International Airport to gain on-the-ground orientation. They observed firsthand the complex logistics, possibilities, and constraints of operating with a team of 78,000 employees who serve 71.5 million passengers with unique needs and itineraries.

    Throughout the program, the cohort was coached and supported by MEMSI alumni, travel industry mentors, and MIT faculty such as Richard de Neufville, professor of engineering systems.

    The mood inside the open-plan MIT Hong Kong Innovation Node was nonstop energetic excitement for the entire program. Each of the six teams was composed of students from MIT and from Hong Kong universities. They learned to work together under time pressure, develop solutions, receive feedback from industry mentors, and iterate around the clock.

    “MEMSI was an enriching and amazing opportunity to learn about entrepreneurship while collaborating with a diverse team to solve a complex problem,” says Maria Li, a junior majoring in computer science, economics, and data science at MIT. “It was incredible to see the ideas we initially came up with as a team turn into a single, thought-out solution by the end.”

    Unsurprisingly given MIT’s focus on piloting the latest technology and the tech-savvy culture of Hong Kong as a global center, many team projects focused on virtual reality, apps, and wearable technology designed to make passengers’ journeys more individualized, efficient, or enjoyable.

    After observing geospatial patterns charting passengers’ movement through an airport, one team realized that many people on long trips aim to meet fitness goals by consciously getting their daily steps power walking the expansive terminals. The team’s prototype, FitAir, is a smart, biometric token integrated virtual coach, which plans walking routes within the airport to promote passenger health and wellness.

    Another team noted a common frustration among frequent travelers who manage multiple mileage rewards program profiles, passwords, and status reports. They proposed AirPoint, a digital wallet that consolidates different rewards programs and presents passengers with all their airport redemption opportunities in one place.

    “Today, there is no loser,” said Vivian Cheung, chief operating officer of AAHK, who served as one of the judges. “Everyone is a winner. I am a winner, too. I have learned a lot from the showcase. Some of the ideas, I believe, can really become a business.”

    Cheung noted that in just 12 days, all teams observed and solved her organization’s pain points and successfully designed solutions to address them.

    More than a competition

    Although many of the models pitched are inventive enough to potentially shape the future of travel, the main focus of MEMSI isn’t to act as yet another startup challenge and incubator.

    “What we’re really focusing on is giving students the ability to learn entrepreneurial thinking,” explains Marina Chan, senior director and head of education at the Node. “It’s the dynamic experience in a highly connected environment that makes being in Hong Kong truly unique. When students can adapt and apply theory to an international context, it builds deeper cultural competency.”

    From an aerial view, the boot camp produced many entrepreneurs in the making and lasting friendships, and respect for other cultural backgrounds and operating environments.

    “I learned the overarching process of how to make a startup pitch, all the way from idea generation, market research, and making business models, to the pitch itself and the presentation,” says Arun Wongprommoon, a senior double majoring in computer science and engineering and linguistics.  “It was all a black box to me before I came into the program.”

    He said he gained tremendous respect for the startup world and the pure hard work and collaboration required to get ahead.

    Spearheaded by the Node, MEMSI is a collaboration among the MIT Innovation Initiative, the Martin Trust Center for Entrepreneurship, the MIT International Science and Technology Initiatives, and Project Manus. Learn more about applying to MEMSI. More

  • in

    Strengthening trust in machine-learning models

    Probabilistic machine learning methods are becoming increasingly powerful tools in data analysis, informing a range of critical decisions across disciplines and applications, from forecasting election results to predicting the impact of microloans on addressing poverty.

    This class of methods uses sophisticated concepts from probability theory to handle uncertainty in decision-making. But the math is only one piece of the puzzle in determining their accuracy and effectiveness. In a typical data analysis, researchers make many subjective choices, or potentially introduce human error, that must also be assessed in order to cultivate users’ trust in the quality of decisions based on these methods.

    To address this issue, MIT computer scientist Tamara Broderick, associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems (LIDS), and a team of researchers have developed a classification system — a “taxonomy of trust” — that defines where trust might break down in a data analysis and identifies strategies to strengthen trust at each step. The other researchers on the project are Professor Anna Smith at the University of Kentucky, professors Tian Zheng and Andrew Gelman at Columbia University, and Professor Rachael Meager at the London School of Economics. The team’s hope is to highlight concerns that are already well-studied and those that need more attention.

    In their paper, published in February in Science Advances, the researchers begin by detailing the steps in the data analysis process where trust might break down: Analysts make choices about what data to collect and which models, or mathematical representations, most closely mirror the real-life problem or question they are aiming to answer. They select algorithms to fit the model and use code to run those algorithms. Each of these steps poses unique challenges around building trust. Some components can be checked for accuracy in measurable ways. “Does my code have bugs?”, for example, is a question that can be tested against objective criteria. Other times, problems are more subjective, with no clear-cut answers; analysts are confronted with numerous strategies to gather data and decide whether a model reflects the real world.

    “What I think is nice about making this taxonomy, is that it really highlights where people are focusing. I think a lot of research naturally focuses on this level of ‘are my algorithms solving a particular mathematical problem?’ in part because it’s very objective, even if it’s a hard problem,” Broderick says.

    “I think it’s really hard to answer ‘is it reasonable to mathematize an important applied problem in a certain way?’ because it’s somehow getting into a harder space, it’s not just a mathematical problem anymore.”

    Capturing real life in a model

    The researchers’ work in categorizing where trust breaks down, though it may seem abstract, is rooted in real-world application.

    Meager, a co-author on the paper, analyzed whether microfinances can have a positive effect in a community. The project became a case study for where trust could break down, and ways to reduce this risk.

    At first look, measuring the impact of microfinancing might seem like a straightforward endeavor. But like any analysis, researchers meet challenges at each step in the process that can affect trust in the outcome. Microfinancing — in which individuals or small businesses receive small loans and other financial services in lieu of conventional banking — can offer different services, depending on the program. For the analysis, Meager gathered datasets from microfinance programs in countries across the globe, including in Mexico, Mongolia, Bosnia, and the Philippines.

    When combining conspicuously distinct datasets, in this case from multiple countries and across different cultures and geographies, researchers must evaluate whether specific case studies can reflect broader trends. It is also important to contextualize the data on hand. For example, in rural Mexico, owning goats may be counted as an investment.

    “It’s hard to measure the quality of life of an individual. People measure things like, ‘What’s the business profit of the small business?’ Or ‘What’s the consumption level of a household?’ There’s this potential for mismatch between what you ultimately really care about, and what you’re measuring,” Broderick says. “Before we get to the mathematical level, what data and what assumptions are we leaning on?”

    With data on hand, analysts must define the real-world questions they seek to answer. In the case of evaluating the benefits of microfinancing, analysts must define what they consider a positive outcome. It is standard in economics, for example, to measure the average financial gain per business in communities where a microfinance program is introduced. But reporting an average might suggest a net positive effect even if only a few (or even one) person benefited, instead of the community as a whole.

    “What you really wanted was that a lot of people are benefiting,” Broderick says. “It sounds simple. Why didn’t we measure the thing that we cared about? But I think it’s really common that practitioners use standard machine learning tools, for a lot of reasons. And these tools might report a proxy that doesn’t always agree with the quantity of interest.”

    Analysts may consciously or subconsciously favor models they are familiar with, especially after investing a great deal of time learning their ins and outs. “Someone might be hesitant to try a nonstandard method because they might be less certain they will use it correctly. Or peer review might favor certain familiar methods, even if a researcher might like to use nonstandard methods,” Broderick says. “There are a lot of reasons, sociologically. But this can be a concern for trust.”

    Final step, checking the code 

    While distilling a real-life problem into a model can be a big-picture, amorphous problem, checking the code that runs an algorithm can feel “prosaic,” Broderick says. But it is another potentially overlooked area where trust can be strengthened.

    In some cases, checking a coding pipeline that executes an algorithm might be considered outside the purview of an analyst’s job, especially when there is the option to use standard software packages.

    One way to catch bugs is to test whether code is reproducible. Depending on the field, however, sharing code alongside published work is not always a requirement or the norm. As models increase in complexity over time, it becomes harder to recreate code from scratch. Reproducing a model becomes difficult or even impossible.

    “Let’s just start with every journal requiring you to release your code. Maybe it doesn’t get totally double-checked, and everything isn’t absolutely perfect, but let’s start there,” Broderick says, as one step toward building trust.

    Paper co-author Gelman worked on an analysis that forecast the 2020 U.S. presidential election using state and national polls in real-time. The team published daily updates in The Economist magazine, while also publishing their code online for anyone to download and run themselves. Throughout the season, outsiders pointed out both bugs and conceptual problems in the model, ultimately contributing to a stronger analysis.

    The researchers acknowledge that while there is no single solution to create a perfect model, analysts and scientists have the opportunity to reinforce trust at nearly every turn.

    “I don’t think we expect any of these things to be perfect,” Broderick says, “but I think we can expect them to be better or to be as good as possible.” More

  • in

    Festival of Learning 2023 underscores importance of well-designed learning environments

    During its first in-person gathering since 2020, MIT’s Festival of Learning 2023 explored how the learning sciences can inform the Institute on how to best support students. Co-sponsored by MIT Open Learning and the Office of the Vice Chancellor (OVC), this annual event celebrates teaching and learning innovations with MIT instructors, students, and staff.

    Bror Saxberg SM ’85, PhD ’89, founder of LearningForge LLC and former chief learning officer at Kaplan, Inc., was invited as keynote speaker, with opening remarks by MIT Chancellor Melissa Nobles and Vice President for Open Learning Eric Grimson, and discussion moderated by Senior Associate Dean of Open Learning Christopher Capozzola. This year’s festival focused on how creating well-designed learning environments using learning engineering can increase learning success.

    Play video

    2023 Festival of Learning: Highlights

    Well-designed learning environments are key

    In his keynote speech “Learning Engineering: What We Know, What We Can Do,” Saxberg defined “learning engineering” as the practical application of learning sciences to real-world problems at scale. He said, “High levels can be reached by all learners, given access to well-designed instruction and motivation for enough practice opportunities.”

    Informed by decades of empirical evidence from the field of learning science, Saxberg’s own research, and insights from Kaplan, Inc., Saxberg finds that a hands-on strategy he calls “prepare, practice, perform” delivers better learning outcomes than a traditional “read, write, discuss” approach. Saxberg recommends educators devote at least 60 percent of learning time to hands-on approaches, such as producing, creating, and engaging. Only 20-30 percent of learning time should be spent in the more passive “knowledge acquisition” modes of listening and reading.

    “Here at MIT, a place that relies on data to make informed decisions, learning engineering can provide a framework for us to center in on the learner to identify the challenges associated with learning, and to apply the learning sciences in data-driven ways to improve instructional approaches,” said Nobles. During their opening remarks, Nobles and Grimson both emphasized how learning engineering at MIT is informed by the Institute’s commitment to educating the whole student, which encompasses student well-being and belonging in addition to academic rigor. “What lessons can we take away to change the way we think about education moving forward? This is a chance to iterate,” said Grimson.

    Well-designed learning environments are informed by understanding motivation, considering the connection between long-term and working memory, identifying the range of learners’ prior experience, grounding practice in authentic contexts (i.e., work environments), and using data-driven instructional approaches to iterate and improve.

    Play video

    2023 Festival of Learning: Keynote by Bror Saxberg

    Understand learner motivation

    Saxberg asserted that before developing course structures and teaching approaches known to encourage learning, educators must first examine learner motivation. Motivation doesn’t require enjoyment of the subject or task to spur engagement. Similar to how a well-designed physical training program can change your muscle cells, if a learner starts, persists, and exerts mental effort in a well-designed learning environment, they can change their neurons — they learn. Saxberg described four main barriers to learner motivation, and solutions for each:

    The learner doesn’t see the value of the lesson. Ways to address this include helping the learners find value; leveraging the learner’s expertise in another area to better understand the topic at hand; and making the activity itself enjoyable. “Finding value” could be as simple as explaining the practical applications of this knowledge in their future work in the field, or how this lesson prepares learners for their advanced level courses. 
    Self-efficacy for learners who don’t think they’re capable. Educators can point to parallel experiences with similar goals that students may have already achieved in another context. Alternatively, educators can share stories of professionals who have successfully transitioned from one area of expertise to another. 
    “Something” in the learner’s way, such as not having the time, space, or correct materials. This is an opportunity to demonstrate how a learner can use problem-solving skills to find a solution to their perceived problem. As with the barrier of self-efficacy, educators can assure learners that they are in control of the situation by sharing similar stories of those who’ve encountered the same problem and the solution they devised.
    The learner’s emotional state. This is no small barrier to motivation. If a learner is angry, depressed, scared, or grieving, it will be challenging for them to switch their mindset into learning mode. A wide array of emotions require a wide array of possible solutions, from structured conversation techniques to recommending professional help.
    Consider the cognitive load

    Saxberg has found that learning occurs when we use working memory to problem-solve, but our working memory can only process three to five verbal or conscious thoughts at a time. Long-term memory stores knowledge that can be accessed non-verbally and non-consciously, which is why experts appear to remember information effortlessly. Until a learner develops that expertise, extraneous information in a lesson will occupy space in their working memory, running the risk of distracting the learner from the desired learning outcome.

    To accommodate learners’ finite cognitive load, Saxberg suggested the solution of reevaluating which material is essential, then simplifying the exercise or removing unnecessary material accordingly. “That notion of, ‘what do we really need students to be able to do?’ helps you focus,” said Saxberg.

    Another solution is to leverage the knowledge, skills, and interests learners already bring to the course — these long-term memories can scaffold the new material. “What do you have in your head already, what do you love, what’s easy to draw from long-term memory? That would be the starting point for challenging new skills. It’s not the ending point because you want to use your new skills to then find out new things,” Saxberg said. Finally, consider how your course engages with the syllabi. Do you explain the reasoning behind the course structure? Do you show how the exercises or material will be applied to future courses or the field? Do you share best practices for engaging working memory and learning? By acknowledging and empathizing with the practical challenges that learners face, you can remove a barrier from their cognitive load.

    Ground practice in authentic contexts

    Saxberg stated that few experts read textbooks to learn new information — they discover what they need to know while working in the field, using those relevant facts in context. As such, students will have an easier time remembering facts if they’re practicing in relevant or similar environments to their future work.

    If students can practice classifying problems in real work contexts rather than theoretical practice problems, they can build a framework to classify what’s important. That helps students recognize the type of problem they’re trying to solve before trying to solve the problem itself. With enough hands-on practice and examples of how experts use processes and identify which principles are relevant, learners can holistically learn entire procedures. And that learning continues once learners graduate to the workforce: professionals often meet to exchange knowledge at conferences, charrettes, and other gatherings.

    Enhancing teaching at MIT

    The Festival of Learning furthers the Office of the Chancellor’s mission to advance academic innovation that will foster the growth of MIT students. The festival also aligns with the MIT Open Learning’s Residential Education team’s goal of making MIT education more effective and efficient. Throughout the year, their team offers continuous support to MIT faculty and instructors using digital technologies to augment and transform how they teach.

    “We are doubling down on our commitment to continuous growth in how we teach,” said Nobles. More

  • in

    Helping the cause of environmental resilience

    Haruko Wainwright, the Norman C. Rasmussen Career Development Professor in Nuclear Science and Engineering (NSE) and assistant professor in civil and environmental engineering at MIT, grew up in rural Japan, where many nuclear facilities are located. She remembers worrying about the facilities as a child. Wainwright was only 6 at the time of the Chernobyl accident in 1986, but still recollects it vividly.

    Those early memories have contributed to Wainwright’s determination to research how technologies can mold environmental resilience — the capability of mitigating the consequences of accidents and recovering from contamination.

    Wainwright believes that environmental monitoring can help improve resilience. She co-leads the U.S. Department of Energy (DOE)’s Advanced Long-term Environmental Monitoring Systems (ALTEMIS) project, which integrates technologies such as in situ sensors, geophysics, remote sensing, simulations, and artificial intelligence to establish new paradigms for monitoring. The project focuses on soil and groundwater contamination at more than 100 U.S. sites that were used for nuclear weapons production.

    As part of this research, which was featured last year in Environmental Science & Technology Journal, Wainwright is working on a machine learning framework for improving environmental monitoring strategies. She hopes the ALTEMIS project will enable the rapid detection of anomalies while ensuring the stability of residual contamination and waste disposal facilities.

    Childhood in rural Japan

    Even as a child, Wainwright was interested in physics, history, and a variety of other subjects.

    But growing up in a rural area was not ideal for someone interested in STEM. There were no engineers or scientists in the community and no science museums, either. “It was not so cool to be interested in science, and I never talked about my interest with anyone,” Wainwright recalls.

    Television and books were the only door to the world of science. “I did not study English until middle school and I had never been on a plane until college. I sometimes find it miraculous that I am now working in the U.S. and teaching at MIT,” she says.

    As she grew a little older, Wainwright heard a lot of discussions about nuclear facilities in the region and many stories about Hiroshima and Nagasaki.

    At the same time, giants like Marie Curie inspired her to pursue science. Nuclear physics was particularly fascinating. “At some point during high school, I started wondering ‘what are radiations, what is radioactivity, what is light,’” she recalls. Reading Richard Feynman’s books and trying to understand quantum mechanics made her want to study physics in college.

    Pursuing research in the United States

    Wainwright pursued an undergraduate degree in engineering physics at Kyoto University. After two research internships in the United States, Wainwright was impressed by the dynamic and fast-paced research environment in the country.

    And compared to Japan, there were “more women in science and engineering,” Wainwright says. She enrolled at the University of California at Berkeley in 2005, where she completed her doctorate in nuclear engineering with minors in statistics and civil and environmental engineering.

    Before moving to MIT NSE in 2022, Wainwright was a staff scientist in the Earth and Environmental Area at Lawrence Berkeley National Laboratory (LBNL). She worked on a variety of topics, including radioactive contamination, climate science, CO2 sequestration, precision agriculture, and watershed science. Her time at LBNL helped Wainwright build a solid foundation about a variety of environmental sensors and monitoring and simulation methods across different earth science disciplines.   

    Empowering communities through monitoring

    One of the most compelling takeaways from Wainwright’s early research: People trust actual measurements and data as facts, even though they are skeptical about models and predictions. “I talked with many people living in Fukushima prefecture. Many of them have dosimeters and measure radiation levels on their own. They might not trust the government, but they trust their own data and are then convinced that it is safe to live there and to eat local food,” Wainwright says.

    She has been impressed that area citizens have gained significant knowledge about radiation and radioactivity through these efforts. “But they are often frustrated that people living far away, in cities like Tokyo, still avoid agricultural products from Fukushima,” Wainwright says.

    Wainwright thinks that data derived from environmental monitoring — through proper visualization and communication — can address misconceptions and fake news that often hurt people near contaminated sites.

    Wainwright is now interested in how these technologies — tested with real data at contaminated sites — can be proactively used for existing and future nuclear facilities “before contamination happens,” as she explored for Nuclear News. “I don’t think it is a good idea to simply dismiss someone’s concern as irrational. Showing credible data has been much more effective to provide assurance. Or a proper monitoring network would enable us to minimize contamination or support emergency responses when accidents happen,” she says.

    Educating communities and students

    Part of empowering communities involves improving their ability to process science-based information. “Potentially hazardous facilities always end up in rural regions; minorities’ concerns are often ignored. The problem is that these regions don’t produce so many scientists or policymakers; they don’t have a voice,” Wainwright says, “I am determined to dedicate my time to improve STEM education in rural regions and to increase the voice in these regions.”

    In a project funded by DOE, she collaborates with the team of researchers at the University of Alaska — the Alaska Center for Energy and Power and Teaching Through Technology program — aiming to improve STEM education for rural and indigenous communities. “Alaska is an important place for energy transition and environmental justice,” Wainwright says. Micro-nuclear reactors can potentially improve the life of rural communities who bear the brunt of the high cost of fuel and transportation. However, there is a distrust of nuclear technologies, stemming from past nuclear weapon testing. At the same time, Alaska has vast metal mining resources for renewable energy and batteries. And there are concerns about environmental contamination from mining and various sources. The teams’ vision is much broader, she points out. “The focus is on broader environmental monitoring technologies and relevant STEM education, addressing general water and air qualities,” Wainwright says.

    The issues also weave into the courses Wainwright teaches at MIT. “I think it is important for engineering students to be aware of environmental justice related to energy waste and mining as well as past contamination events and their recovery,” she says. “It is not OK just to send waste to, or develop mines in, rural regions, which could be a special place for some people. We need to make sure that these developments will not harm the environment and health of local communities.” Wainwright also hopes that this knowledge will ultimately encourage students to think creatively about engineering designs that minimize waste or recycle material.

    The last question of the final quiz of one of her recent courses was: Assume that you store high-level radioactive waste in your “backyard.” What technical strategies would make you and your family feel safe? “All students thought about this question seriously and many suggested excellent points, including those addressing environmental monitoring,” Wainwright says, “that made me hopeful about the future.” More

  • in

    Learning to grow machine-learning models

    It’s no secret that OpenAI’s ChatGPT has some incredible capabilities — for instance, the chatbot can write poetry that resembles Shakespearean sonnets or debug code for a computer program. These abilities are made possible by the massive machine-learning model that ChatGPT is built upon. Researchers have found that when these types of models become large enough, extraordinary capabilities emerge.

    But bigger models also require more time and money to train. The training process involves showing hundreds of billions of examples to a model. Gathering so much data is an involved process in itself. Then come the monetary and environmental costs of running many powerful computers for days or weeks to train a model that may have billions of parameters. 

    “It’s been estimated that training models at the scale of what ChatGPT is hypothesized to run on could take millions of dollars, just for a single training run. Can we improve the efficiency of these training methods, so we can still get good models in less time and for less money? We propose to do this by leveraging smaller language models that have previously been trained,” says Yoon Kim, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Rather than discarding a previous version of a model, Kim and his collaborators use it as the building blocks for a new model. Using machine learning, their method learns to “grow” a larger model from a smaller model in a way that encodes knowledge the smaller model has already gained. This enables faster training of the larger model.

    Their technique saves about 50 percent of the computational cost required to train a large model, compared to methods that train a new model from scratch. Plus, the models trained using the MIT method performed as well as, or better than, models trained with other techniques that also use smaller models to enable faster training of larger models.

    Reducing the time it takes to train huge models could help researchers make advancements faster with less expense, while also reducing the carbon emissions generated during the training process. It could also enable smaller research groups to work with these massive models, potentially opening the door to many new advances.

    “As we look to democratize these types of technologies, making training faster and less expensive will become more important,” says Kim, senior author of a paper on this technique.

    Kim and his graduate student Lucas Torroba Hennigen wrote the paper with lead author Peihao Wang, a graduate student at the University of Texas at Austin, as well as others at the MIT-IBM Watson AI Lab and Columbia University. The research will be presented at the International Conference on Learning Representations.

    The bigger the better

    Large language models like GPT-3, which is at the core of ChatGPT, are built using a neural network architecture called a transformer. A neural network, loosely based on the human brain, is composed of layers of interconnected nodes, or “neurons.” Each neuron contains parameters, which are variables learned during the training process that the neuron uses to process data.

    Transformer architectures are unique because, as these types of neural network models get bigger, they achieve much better results.

    “This has led to an arms race of companies trying to train larger and larger transformers on larger and larger datasets. More so than other architectures, it seems that transformer networks get much better with scaling. We’re just not exactly sure why this is the case,” Kim says.

    These models often have hundreds of millions or billions of learnable parameters. Training all these parameters from scratch is expensive, so researchers seek to accelerate the process.

    One effective technique is known as model growth. Using the model growth method, researchers can increase the size of a transformer by copying neurons, or even entire layers of a previous version of the network, then stacking them on top. They can make a network wider by adding new neurons to a layer or make it deeper by adding additional layers of neurons.

    In contrast to previous approaches for model growth, parameters associated with the new neurons in the expanded transformer are not just copies of the smaller network’s parameters, Kim explains. Rather, they are learned combinations of the parameters of the smaller model.

    Learning to grow

    Kim and his collaborators use machine learning to learn a linear mapping of the parameters of the smaller model. This linear map is a mathematical operation that transforms a set of input values, in this case the smaller model’s parameters, to a set of output values, in this case the parameters of the larger model.

    Their method, which they call a learned Linear Growth Operator (LiGO), learns to expand the width and depth of larger network from the parameters of a smaller network in a data-driven way.

    But the smaller model may actually be quite large — perhaps it has a hundred million parameters — and researchers might want to make a model with a billion parameters. So the LiGO technique breaks the linear map into smaller pieces that a machine-learning algorithm can handle.

    LiGO also expands width and depth simultaneously, which makes it more efficient than other methods. A user can tune how wide and deep they want the larger model to be when they input the smaller model and its parameters, Kim explains.

    When they compared their technique to the process of training a new model from scratch, as well as to model-growth methods, it was faster than all the baselines. Their method saves about 50 percent of the computational costs required to train both vision and language models, while often improving performance.

    The researchers also found they could use LiGO to accelerate transformer training even when they didn’t have access to a smaller, pretrained model.

    “I was surprised by how much better all the methods, including ours, did compared to the random initialization, train-from-scratch baselines.” Kim says.

    In the future, Kim and his collaborators are looking forward to applying LiGO to even larger models.

    The work was funded, in part, by the MIT-IBM Watson AI Lab, Amazon, the IBM Research AI Hardware Center, Center for Computational Innovation at Rensselaer Polytechnic Institute, and the U.S. Army Research Office. More