More stories

  • in

    Celebrating open data

    The inaugural MIT Prize for Open Data, which included a $2,500 cash prize, was recently awarded to 10 individual and group research projects. Presented jointly by the School of Science and the MIT Libraries, the prize recognizes MIT-affiliated researchers who make their data openly accessible and reusable by others. The prize winners and 16 honorable mention recipients were honored at the Open Data @ MIT event held Oct. 28 at Hayden Library. 

    “By making data open, researchers create opportunities for novel uses of their data and for new insights to be gleaned,” says Chris Bourg, director of MIT Libraries. “Open data accelerates scholarly progress and discovery, advances equity in scholarly participation, and increases transparency, replicability, and trust in science.” 

    Recognizing shared values

    Spearheaded by Bourg and Rebecca Saxe, associate dean of the School of Science and John W. Jarve (1978) Professor of Brain and Cognitive Sciences, the MIT Prize for Open Data was launched to highlight the value of open data at MIT and to encourage the next generation of researchers. Nominations were solicited from across the Institute, with a focus on trainees: research technicians, undergraduate or graduate students, or postdocs.

    “By launching an MIT-wide prize and event, we aimed to create visibility for the scholars who create, use, and advocate for open data,” says Saxe. “Highlighting this research and creating opportunities for networking would also help open-data advocates across campus find each other.” 

    Recognizing researchers who share data was also one of the recommendations of the Ad Hoc Task Force on Open Access to MIT’s Research, which Bourg co-chaired with Hal Abelson, Class of 1922 Professor, Department of Electrical Engineering and Computer Science. An annual award was one of the strategies put forth by the task force to further the Institute’s mission to disseminate the fruits of its research and scholarship as widely as possible.

    Strong competition

    Winners and honorable mentions were chosen from more than 70 nominees, representing all five schools, the MIT Schwarzman College of Computing, and several research centers across MIT. A committee composed of faculty, staff, and a graduate student made the selections:

    Yunsie Chung, graduate student in the Department of Chemical Engineering, won for SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds. 
    Matthew Groh, graduate student, MIT Media Lab, accepted on behalf of the team behind the Fitzpatrick 17k dataset, an open dataset consisting of nearly 17,000 images of skin disease alongside skin disease and skin tone annotations. 
    Tom Pollard, research scientist at the Institute for Medical Engineering and Science, accepted on behalf of the PhysioNet team. This data-sharing platform enables thousands of clinical and machine-learning research studies each year and allows researchers to share sensitive resources that would not be possible through typical data sharing platforms. 
    Joseph Replogle, graduate student with the Whitehead Institute for Biomedical Research, was recognized for the Genome-wide Perturb-seq dataset, the largest publicly available, single-cell transcriptional dataset collected to date. 
    Pedro Reynolds-Cuéllar, graduate student with the MIT Media Lab/Art, Culture, and Technology, and Diana Duarte, co-founder at Diversa, won for Retos, an open-data platform for detailed documentation and sharing of local innovations from under-resourced settings. 
    Maanas Sharma, an undergraduate student, led States of Emergency, a nationwide project analyzing and grading the responses of prison systems to Covid-19 using data scraped from public databases and manually collected data. 
    Djuna von Maydell, graduate student in the Department of Brain and Cognitive Sciences, created the first publicly available dataset of single-cell gene expression from postmortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene. 
    Raechel Walker, graduate researcher in the MIT Media Lab, and her collaborators created a Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program in Cambridge, Massachusetts. Students learned how to use data science to recognize, mitigate, and advocate for people who are disproportionately impacted by systemic inequality. 
    Suyeol Yun, graduate student in the Department of Political Science, was recognized for DeepWTO, a project creating open data for use in legal natural language processing research using cases from the World Trade Organization. 
    Jonathan Zheng, graduate student in the Department of Chemical Engineering, won for an open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution.
    A full list of winners and honorable mentions is available on the Open Data @ MIT website.

    A campus-wide celebration

    Awards were presented at a celebratory event held in the Nexus in Hayden Library during International Open Access Week. School of Science Dean Nergis Mavalvala kicked off the program by describing the long and proud history of open scholarship at MIT, citing the Institute-wide faculty open access policy and the launch of the open-source digital repository DSpace. “When I was a graduate student, we were trying to figure out how to share our theses during the days of the nascent internet,” she said, “With DSpace, MIT was figuring it out for us.” 

    The centerpiece of the program was a series of five-minute presentations from the prize winners on their research. Presenters detailed the ways they created, used, or advocated for open data, and the value that openness brings to their respective fields. Winner Djuna von Maydell, a graduate student in Professor Li-Huei Tsai’s lab who studies the genetic causes of neurodegeneration, underscored why it is important to share data, particularly data obtained from postmortem human brains. 

    “This is data generated from human brains, so every data point stems from a living, breathing human being, who presumably made this donation in the hope that we would use it to advance knowledge and uncover truth,” von Maydell said. “To maximize the probability of that happening, we have to make it available to the scientific community.” 

    MIT community members who would like to learn more about making their research data open can consult MIT Libraries’ Data Services team.  More

  • in

    Coordinating climate and air-quality policies to improve public health

    As America’s largest investment to fight climate change, the Inflation Reduction Act positions the country to reduce its greenhouse gas emissions by an estimated 40 percent below 2005 levels by 2030. But as it edges the United States closer to achieving its international climate commitment, the legislation is also expected to yield significant — and more immediate — improvements in the nation’s health. If successful in accelerating the transition from fossil fuels to clean energy alternatives, the IRA will sharply reduce atmospheric concentrations of fine particulates known to exacerbate respiratory and cardiovascular disease and cause premature deaths, along with other air pollutants that degrade human health. One recent study shows that eliminating air pollution from fossil fuels in the contiguous United States would prevent more than 50,000 premature deaths and avoid more than $600 billion in health costs each year.

    While national climate policies such as those advanced by the IRA can simultaneously help mitigate climate change and improve air quality, their results may vary widely when it comes to improving public health. That’s because the potential health benefits associated with air quality improvements are much greater in some regions and economic sectors than in others. Those benefits can be maximized, however, through a prudent combination of climate and air-quality policies.

    Several past studies have evaluated the likely health impacts of various policy combinations, but their usefulness has been limited due to a reliance on a small set of standard policy scenarios. More versatile tools are needed to model a wide range of climate and air-quality policy combinations and assess their collective effects on air quality and human health. Now researchers at the MIT Joint Program on the Science and Policy of Global Change and MIT Institute for Data, Systems and Society (IDSS) have developed a publicly available, flexible scenario tool that does just that.

    In a study published in the journal Geoscientific Model Development, the MIT team introduces its Tool for Air Pollution Scenarios (TAPS), which can be used to estimate the likely air-quality and health outcomes of a wide range of climate and air-quality policies at the regional, sectoral, and fuel-based level. 

    “This tool can help integrate the siloed sustainability issues of air pollution and climate action,” says the study’s lead author William Atkinson, who recently served as a Biogen Graduate Fellow and research assistant at the IDSS Technology and Policy Program’s (TPP) Research to Policy Engagement Initiative. “Climate action does not guarantee a clean air future, and vice versa — but the issues have similar sources that imply shared solutions if done right.”

    The study’s initial application of TAPS shows that with current air-quality policies and near-term Paris Agreement climate pledges alone, short-term pollution reductions give way to long-term increases — given the expected growth of emissions-intensive industrial and agricultural processes in developing regions. More ambitious climate and air-quality policies could be complementary, each reducing different pollutants substantially to give tremendous near- and long-term health benefits worldwide.

    “The significance of this work is that we can more confidently identify the long-term emission reduction strategies that also support air quality improvements,” says MIT Joint Program Deputy Director C. Adam Schlosser, a co-author of the study. “This is a win-win for setting climate targets that are also healthy targets.”

    TAPS projects air quality and health outcomes based on three integrated components: a recent global inventory of detailed emissions resulting from human activities (e.g., fossil fuel combustion, land-use change, industrial processes); multiple scenarios of emissions-generating human activities between now and the year 2100, produced by the MIT Economic Projection and Policy Analysis model; and emissions intensity (emissions per unit of activity) scenarios based on recent data from the Greenhouse Gas and Air Pollution Interactions and Synergies model.

    “We see the climate crisis as a health crisis, and believe that evidence-based approaches are key to making the most of this historic investment in the future, particularly for vulnerable communities,” says Johanna Jobin, global head of corporate reputation and responsibility at Biogen. “The scientific community has spoken with unanimity and alarm that not all climate-related actions deliver equal health benefits. We’re proud of our collaboration with the MIT Joint Program to develop this tool that can be used to bridge research-to-policy gaps, support policy decisions to promote health among vulnerable communities, and train the next generation of scientists and leaders for far-reaching impact.”

    The tool can inform decision-makers about a wide range of climate and air-quality policies. Policy scenarios can be applied to specific regions, sectors, or fuels to investigate policy combinations at a more granular level, or to target short-term actions with high-impact benefits.

    TAPS could be further developed to account for additional emissions sources and trends.

    “Our new tool could be used to examine a large range of both climate and air quality scenarios. As the framework is expanded, we can add detail for specific regions, as well as additional pollutants such as air toxics,” says study supervising co-author Noelle Selin, professor at IDSS and the MIT Department of Earth, Atmospheric and Planetary Sciences, and director of TPP.    

    This research was supported by the U.S. Environmental Protection Agency and its Science to Achieve Results (STAR) program; Biogen; TPP’s Leading Technology and Policy Initiative; and TPP’s Research to Policy Engagement Initiative. More

  • in

    Deep learning with light

    Ask a smart home device for the weather forecast, and it takes several seconds for the device to respond. One reason this latency occurs is because connected devices don’t have enough memory or power to store and run the enormous machine-learning models needed for the device to understand what a user is asking of it. The model is stored in a data center that may be hundreds of miles away, where the answer is computed and sent to the device.

    MIT researchers have created a new method for computing directly on these devices, which drastically reduces this latency. Their technique shifts the memory-intensive steps of running a machine-learning model to a central server where components of the model are encoded onto light waves.

    The waves are transmitted to a connected device using fiber optics, which enables tons of data to be sent lightning-fast through a network. The receiver then employs a simple optical device that rapidly performs computations using the parts of a model carried by those light waves.

    This technique leads to more than a hundredfold improvement in energy efficiency when compared to other methods. It could also improve security, since a user’s data do not need to be transferred to a central location for computation.

    This method could enable a self-driving car to make decisions in real-time while using just a tiny percentage of the energy currently required by power-hungry computers. It could also allow a user to have a latency-free conversation with their smart home device, be used for live video processing over cellular networks, or even enable high-speed image classification on a spacecraft millions of miles from Earth.

    “Every time you want to run a neural network, you have to run the program, and how fast you can run the program depends on how fast you can pipe the program in from memory. Our pipe is massive — it corresponds to sending a full feature-length movie over the internet every millisecond or so. That is how fast data comes into our system. And it can compute as fast as that,” says senior author Dirk Englund, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and member of the MIT Research Laboratory of Electronics.

    Joining Englund on the paper is lead author and EECS grad student Alexander Sludds; EECS grad student Saumil Bandyopadhyay, Research Scientist Ryan Hamerly, as well as others from MIT, the MIT Lincoln Laboratory, and Nokia Corporation. The research is published today in Science.

    Lightening the load

    Neural networks are machine-learning models that use layers of connected nodes, or neurons, to recognize patterns in datasets and perform tasks, like classifying images or recognizing speech. But these models can contain billions of weight parameters, which are numeric values that transform input data as they are processed. These weights must be stored in memory. At the same time, the data transformation process involves billions of algebraic computations, which require a great deal of power to perform.

    The process of fetching data (the weights of the neural network, in this case) from memory and moving them to the parts of a computer that do the actual computation is one of the biggest limiting factors to speed and energy efficiency, says Sludds.

    “So our thought was, why don’t we take all that heavy lifting — the process of fetching billions of weights from memory — move it away from the edge device and put it someplace where we have abundant access to power and memory, which gives us the ability to fetch those weights quickly?” he says.

    The neural network architecture they developed, Netcast, involves storing weights in a central server that is connected to a novel piece of hardware called a smart transceiver. This smart transceiver, a thumb-sized chip that can receive and transmit data, uses technology known as silicon photonics to fetch trillions of weights from memory each second.

    It receives weights as electrical signals and imprints them onto light waves. Since the weight data are encoded as bits (1s and 0s) the transceiver converts them by switching lasers; a laser is turned on for a 1 and off for a 0. It combines these light waves and then periodically transfers them through a fiber optic network so a client device doesn’t need to query the server to receive them.

    “Optics is great because there are many ways to carry data within optics. For instance, you can put data on different colors of light, and that enables a much higher data throughput and greater bandwidth than with electronics,” explains Bandyopadhyay.

    Trillions per second

    Once the light waves arrive at the client device, a simple optical component known as a broadband “Mach-Zehnder” modulator uses them to perform super-fast, analog computation. This involves encoding input data from the device, such as sensor information, onto the weights. Then it sends each individual wavelength to a receiver that detects the light and measures the result of the computation.

    The researchers devised a way to use this modulator to do trillions of multiplications per second, which vastly increases the speed of computation on the device while using only a tiny amount of power.   

    “In order to make something faster, you need to make it more energy efficient. But there is a trade-off. We’ve built a system that can operate with about a milliwatt of power but still do trillions of multiplications per second. In terms of both speed and energy efficiency, that is a gain of orders of magnitude,” Sludds says.

    They tested this architecture by sending weights over an 86-kilometer fiber that connects their lab to MIT Lincoln Laboratory. Netcast enabled machine-learning with high accuracy — 98.7 percent for image classification and 98.8 percent for digit recognition — at rapid speeds.

    “We had to do some calibration, but I was surprised by how little work we had to do to achieve such high accuracy out of the box. We were able to get commercially relevant accuracy,” adds Hamerly.

    Moving forward, the researchers want to iterate on the smart transceiver chip to achieve even better performance. They also want to miniaturize the receiver, which is currently the size of a shoe box, down to the size of a single chip so it could fit onto a smart device like a cell phone.

    “Using photonics and light as a platform for computing is a really exciting area of research with potentially huge implications on the speed and efficiency of our information technology landscape,” says Euan Allen, a Royal Academy of Engineering Research Fellow at the University of Bath, who was not involved with this work. “The work of Sludds et al. is an exciting step toward seeing real-world implementations of such devices, introducing a new and practical edge-computing scheme whilst also exploring some of the fundamental limitations of computation at very low (single-photon) light levels.”

    The research is funded, in part, by NTT Research, the National Science Foundation, the Air Force Office of Scientific Research, the Air Force Research Laboratory, and the Army Research Office. More

  • in

    Ad hoc committee releases report on remote teaching best practices for on-campus education

    The Ad Hoc Committee on Leveraging Best Practices from Remote Teaching for On-Campus Education has released a report that captures how instructors are weaving lessons learned from remote teaching into in-person classes. Despite the challenges imposed by teaching and learning remotely during the Covid-19 pandemic, the report says, “there were seeds planted then that, we hope, will bear fruit in the coming years.”

    “In the long run, one of the best things about having lived through our remote learning experience may be the intense and broad focus on pedagogy that it necessitated,” the report continues. “In a moment when nobody could just teach the way they had always done before, all of us had to go back to first principles and ask ourselves: What are our learning goals for our students? How can we best help them to achieve these goals?”

    The committee’s work is a direct response to one of the Refinement and Implementation Committees (RIC) formed as part of Task Force 2021 and Beyond. Led by co-chairs Krishna Rajagopal, the William A. M. Burden Professor of Physics, and Janet Rankin, director of the MIT Teaching + Learning Lab, the committee engaged with faculty and instructional staff, associate department heads, and undergraduate and graduate officers across MIT.

    The findings are distilled into four broad themes:

    Community, Well-being, and Belonging. Conversations revealed new ways that instructors cultivated these key interrelated concepts, all of which are fundamental to student learning and success. Many instructors focused more on supporting well-being and building community and belonging during the height of the pandemic precisely because the MIT community, and everyone in it, was under such great stress. Some of the resulting practices are continuing, the committee found. Examples include introducing simple gestures, such as start-of-class welcoming practices, and providing extensions and greater flexibility on student assignments. Also, many across MIT felt that the week-long Thanksgiving break offered in 2020 should become a permanent fixture in the academic calendar, because it enhances the well-being of both students and instructors at a time in the fall semester when everyone’s batteries need recharging. 
    Enhancing Engagement. The committee found a variety of practices that have enhanced engagement between students and instructors; among students; and among instructors. For example, many instructors have continued to offer some office hours on Zoom, which seems to reduce barriers to participation for many students, while offering in-person office hours for those who want to take advantage of opportunities for more open-ended conversations. Several departments increased their usage of undergraduate teaching assistants (UTAs) in ways that make students’ learning experience more engaging and give the UTAs a real teaching experience. In addition, many instructors are leveraging out-of-class communication spaces like Slack, Perusall, and Piazza so students can work together, ask questions, and share ideas. 
    Enriching and Augmenting the Learning Environment. The report presents two ways in which instructors have enhanced learning within the classroom: through blended learning and by incorporating authentic experiences. Although blended learning techniques are not new at MIT, after having made it through remote teaching many faculty have found new ways to combine synchronous in-person teaching with asynchronous activities for on-campus students, such as pre-class or pre-lab sequences of videos with exercises interspersed, take-home lab kits, auto-graded online problems that give students immediate feedback, and recorded lab experiences for subsequent review. In addition, instructors found many creative ways to make students’ learning more authentic by going on virtual field trips, using Zoom to bring experts from around the world into MIT classrooms or to enable interactions with students at other universities, and live-streaming experiments that students could not otherwise experience since they cannot be performed in a teaching lab.   
     Assessing Learning. For all its challenges, the report notes, remote teaching prompted instructors to take a step back and think about what they wanted students to learn, how to support it, and how to measure it. The committee found a variety of examples of alternatives to traditional assessments, such as papers or timed, written exams, that instructors tried during the pandemic and are continuing to use. These alternatives include shorter, more frequent, lower-stakes assessments; oral exams or debates; asynchronous, open-book/notes exams; virtual poster sessions; alternate grading schemes; and uploading paper psets and exams into Gradescope to use its logistics and rubrics to improve grading effectiveness and efficiency.
    A large portion of the report is devoted to an extensive, annotated list of best practices from remote instruction that are being used in the classroom. Interestingly, Rankin says, “so many of the strategies and practices developed and used during the pandemic are based on, and supported by, solid educational research.”

    The report concludes with one broad recommendation: that all faculty and instructors read the findings and experiment with some of the best practices in their own instruction. “Our hope is that the practices shared in the report will continue to be adopted, adapted, and expanded by members of the teaching community at MIT, and that instructors’ openness in sharing and learning from each will continue,” Rankin says.

    Two additional, specific recommendations are included in the report. First, the committee endorses the RIC 16 recommendation that a Classroom Advisory Board be created to provide strategic input grounded in evolving pedagogy about future classroom use and technology needs. In its conversations, the committee found a number of ways that remote teaching and learning have impacted students’ and instructors’ perceptions as they have returned to the classroom. For example, during the pandemic students benefited from being able to see everyone else’s faces on Zoom. As a result, some instructors would prefer classrooms that enable students to face each other, such as semi-circular classrooms instead of rectangular ones.

    More generally, the committee concluded, MIT needs classrooms with seats and tables that can be quickly and flexibly reconfigured to facilitate varying pedagogical objectives. The Classroom Advisory Board could also examine classroom technology; this includes the role of videoconferencing to create authentic engagement between MIT students and people far from campus, and blended learning that allows students to experience more of the in-classroom engagement with their peers and instructors from which the “magic of MIT” originates.

    Second, the committee recommends that an implementation group be formed to investigate the possibility of changing the MIT academic calendar to create a one-week break over Thanksgiving. “Finalizing an implementation plan will require careful consideration of various significant logistical challenges,” the report says. “However, the resulting gains to both well-being and learning from this change to the fall calendar make doing so worthwhile.”

    Rankin notes that the report findings dovetail with the recently released MIT Strategic Action Plan for Belonging, Achievement and Composition. “I believe that one of the most important things that became really apparent during remote teaching was that community, inclusion, and belonging really matter and are necessary for both learning and teaching, and that instructors can and should play a central role in creating structures and processes to support them in their classrooms and other learning environments,” she says.

    Rajagopal finds it inspiring that “during a time of intense stress — that nobody ever wants to relive — there was such an intense focus on how we teach and how our students learn that, today, in essentially every direction we look we see colleagues improving on-campus education for tomorrow. I hope that the report will help instructors across the Institute, and perhaps elsewhere, learn from each other. Its readers will see, as our committee did, new ways in which students and instructors are finding those moments, those interactions, where the magic of MIT is created.”

    In addition to the report, the co-chairs recommend two other valuable remote teaching resources: a video interview series, TLL’s Fresh Perspectives, and Open Learning’s collection of examples of how MIT faculty and instructors leveraged digital technology to support and transform teaching and learning during the heart of the pandemic. More

  • in

    Four from MIT receive NIH New Innovator Awards for 2022

    The National Institutes of Health (NIH) has awarded grants to four MIT faculty members as part of its High-Risk, High-Reward Research program.

    The program supports unconventional approaches to challenges in biomedical, behavioral, and social sciences. Each year, NIH Director’s Awards are granted to program applicants who propose high-risk, high-impact research in areas relevant to the NIH’s mission. In doing so, the NIH encourages innovative proposals that, due to their inherent risk, might struggle in the traditional peer-review process.

    This year, Lindsay Case, Siniša Hrvatin, Deblina Sarkar, and Caroline Uhler have been chosen to receive the New Innovator Award, which funds exceptionally creative research from early-career investigators. The award, which was established in 2007, supports researchers who are within 10 years of their final degree or clinical residency and have not yet received a research project grant or equivalent NIH grant.

    Lindsay Case, the Irwin and Helen Sizer Department of Biology Career Development Professor and an extramural member of the Koch Institute for Integrative Cancer Research, uses biochemistry and cell biology to study the spatial organization of signal transduction. Her work focuses on understanding how signaling molecules assemble into compartments with unique biochemical and biophysical properties to enable cells to sense and respond to information in their environment. Earlier this year, Case was one of two MIT assistant professors named as Searle Scholars.

    Siniša Hrvatin, who joined the School of Science faculty this past winter, is an assistant professor in the Department of Biology and a core member at the Whitehead Institute for Biomedical Research. He studies how animals and cells enter, regulate, and survive states of dormancy such as torpor and hibernation, aiming to harness the potential of these states therapeutically.

    Deblina Sarkar is an assistant professor and AT&T Career Development Chair Professor at the MIT Media Lab​. Her research combines the interdisciplinary fields of nanoelectronics, applied physics, and biology to invent disruptive technologies for energy-efficient nanoelectronics and merge such next-generation technologies with living matter to create a new paradigm for life-machine symbiosis. Her high-risk, high-reward proposal received the rare perfect impact score of 10, which is the highest score awarded by NIH.

    Caroline Uhler is a professor in the Department of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society. In addition, she is a core institute member at the Broad Institute of MIT and Harvard, where she co-directs the Eric and Wendy Schmidt Center. By combining machine learning, statistics, and genomics, she develops representation learning and causal inference methods to elucidate gene regulation in health and disease.

    The High-Risk, High-Reward Research program is supported by the NIH Common Fund, which oversees programs that pursue major opportunities and gaps in biomedical research that require collaboration across NIH Institutes and Centers. In addition to the New Innovator Award, the NIH also issues three other awards each year: the Pioneer Award, which supports bold and innovative research projects with unusually broad scientific impact; the Transformative Research Award, which supports risky and untested projects with transformative potential; and the Early Independence Award, which allows especially impressive junior scientists to skip the traditional postdoctoral training program to launch independent research careers.

    This year, the High-Risk, High-Reward Research program is awarding 103 awards, including eight Pioneer Awards, 72 New Innovator Awards, nine Transformative Research Awards, and 14 Early Independence Awards. These 103 awards total approximately $285 million in support from the institutes, centers, and offices across NIH over five years. “The science advanced by these researchers is poised to blaze new paths of discovery in human health,” says Lawrence A. Tabak DDS, PhD, who is performing the duties of the director of NIH. “This unique cohort of scientists will transform what is known in the biological and behavioral world. We are privileged to support this innovative science.” More

  • in

    Learning on the edge

    Microcontrollers, miniature computers that can run simple commands, are the basis for billions of connected devices, from internet-of-things (IoT) devices to sensors in automobiles. But cheap, low-power microcontrollers have extremely limited memory and no operating system, making it challenging to train artificial intelligence models on “edge devices” that work independently from central computing resources.

    Training a machine-learning model on an intelligent edge device allows it to adapt to new data and make better predictions. For instance, training a model on a smart keyboard could enable the keyboard to continually learn from the user’s writing. However, the training process requires so much memory that it is typically done using powerful computers at a data center, before the model is deployed on a device. This is more costly and raises privacy issues since user data must be sent to a central server.

    To address this problem, researchers at MIT and the MIT-IBM Watson AI Lab developed a new technique that enables on-device training using less than a quarter of a megabyte of memory. Other training solutions designed for connected devices can use more than 500 megabytes of memory, greatly exceeding the 256-kilobyte capacity of most microcontrollers (there are 1,024 kilobytes in one megabyte).

    The intelligent algorithms and framework the researchers developed reduce the amount of computation required to train a model, which makes the process faster and more memory efficient. Their technique can be used to train a machine-learning model on a microcontroller in a matter of minutes.

    This technique also preserves privacy by keeping data on the device, which could be especially beneficial when data are sensitive, such as in medical applications. It also could enable customization of a model based on the needs of users. Moreover, the framework preserves or improves the accuracy of the model when compared to other training approaches.

    “Our study enables IoT devices to not only perform inference but also continuously update the AI models to newly collected data, paving the way for lifelong on-device learning. The low resource utilization makes deep learning more accessible and can have a broader reach, especially for low-power edge devices,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and senior author of the paper describing this innovation.

    Joining Han on the paper are co-lead authors and EECS PhD students Ji Lin and Ligeng Zhu, as well as MIT postdocs Wei-Ming Chen and Wei-Chen Wang, and Chuang Gan, a principal research staff member at the MIT-IBM Watson AI Lab. The research will be presented at the Conference on Neural Information Processing Systems.

    Han and his team previously addressed the memory and computational bottlenecks that exist when trying to run machine-learning models on tiny edge devices, as part of their TinyML initiative.

    Lightweight training

    A common type of machine-learning model is known as a neural network. Loosely based on the human brain, these models contain layers of interconnected nodes, or neurons, that process data to complete a task, such as recognizing people in photos. The model must be trained first, which involves showing it millions of examples so it can learn the task. As it learns, the model increases or decreases the strength of the connections between neurons, which are known as weights.

    The model may undergo hundreds of updates as it learns, and the intermediate activations must be stored during each round. In a neural network, activation is the middle layer’s intermediate results. Because there may be millions of weights and activations, training a model requires much more memory than running a pre-trained model, Han explains.

    Han and his collaborators employed two algorithmic solutions to make the training process more efficient and less memory-intensive. The first, known as sparse update, uses an algorithm that identifies the most important weights to update at each round of training. The algorithm starts freezing the weights one at a time until it sees the accuracy dip to a set threshold, then it stops. The remaining weights are updated, while the activations corresponding to the frozen weights don’t need to be stored in memory.

    “Updating the whole model is very expensive because there are a lot of activations, so people tend to update only the last layer, but as you can imagine, this hurts the accuracy. For our method, we selectively update those important weights and make sure the accuracy is fully preserved,” Han says.

    Their second solution involves quantized training and simplifying the weights, which are typically 32 bits. An algorithm rounds the weights so they are only eight bits, through a process known as quantization, which cuts the amount of memory for both training and inference. Inference is the process of applying a model to a dataset and generating a prediction. Then the algorithm applies a technique called quantization-aware scaling (QAS), which acts like a multiplier to adjust the ratio between weight and gradient, to avoid any drop in accuracy that may come from quantized training.

    The researchers developed a system, called a tiny training engine, that can run these algorithmic innovations on a simple microcontroller that lacks an operating system. This system changes the order of steps in the training process so more work is completed in the compilation stage, before the model is deployed on the edge device.

    “We push a lot of the computation, such as auto-differentiation and graph optimization, to compile time. We also aggressively prune the redundant operators to support sparse updates. Once at runtime, we have much less workload to do on the device,” Han explains.

    A successful speedup

    Their optimization only required 157 kilobytes of memory to train a machine-learning model on a microcontroller, whereas other techniques designed for lightweight training would still need between 300 and 600 megabytes.

    They tested their framework by training a computer vision model to detect people in images. After only 10 minutes of training, it learned to complete the task successfully. Their method was able to train a model more than 20 times faster than other approaches.

    Now that they have demonstrated the success of these techniques for computer vision models, the researchers want to apply them to language models and different types of data, such as time-series data. At the same time, they want to use what they’ve learned to shrink the size of larger models without sacrificing accuracy, which could help reduce the carbon footprint of training large-scale machine-learning models.

    “AI model adaptation/training on a device, especially on embedded controllers, is an open challenge. This research from MIT has not only successfully demonstrated the capabilities, but also opened up new possibilities for privacy-preserving device personalization in real-time,” says Nilesh Jain, a principal engineer at Intel who was not involved with this work. “Innovations in the publication have broader applicability and will ignite new systems-algorithm co-design research.”

    “On-device learning is the next major advance we are working toward for the connected intelligent edge. Professor Song Han’s group has shown great progress in demonstrating the effectiveness of edge devices for training,” adds Jilei Hou, vice president and head of AI research at Qualcomm. “Qualcomm has awarded his team an Innovation Fellowship for further innovation and advancement in this area.”

    This work is funded by the National Science Foundation, the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, Amazon, Intel, Qualcomm, Ford Motor Company, and Google. More

  • in

    Investigating at the interface of data science and computing

    A visual model of Guy Bresler’s research would probably look something like a Venn diagram. He works at the four-way intersection where theoretical computer science, statistics, probability, and information theory collide.

    “There are always new things to do be done at the interface. There are always opportunities for entirely new questions to ask,” says Bresler, an associate professor who recently earned tenure in MIT’s Department of Electrical Engineering and Computer Science (EECS).

    A theoretician, he aims to understand the delicate interplay between structure in data, the complexity of models, and the amount of computation needed to learn those models. Recently, his biggest focus has been trying to unveil fundamental phenomena that are broadly responsible for determining the computational complexity of statistics problems — and finding the “sweet spot” where available data and computation resources enable researchers to effectively solve a problem.

    When trying to solve a complex statistics problem, there is often a tug-of-war between data and computation. Without enough data, the computation needed to solve a statistical problem can be intractable, or at least consume a staggering amount of resources. But get just enough data and suddenly the intractable becomes solvable; the amount of computation needed to come up with a solution drops dramatically.

    The majority of modern statistical problems exhibits this sort of trade-off between computation and data, with applications ranging from drug development to weather prediction. Another well-studied and practically important example is cryo-electron microscopy, Bresler says. With this technique, researchers use an electron microscope to take images of molecules in different orientations. The central challenge is how to solve the inverse problem — determining the molecule’s structure given the noisy data. Many statistical problems can be formulated as inverse problems of this sort.

    One aim of Bresler’s work is to elucidate relationships between the wide variety of different statistics problems currently being studied. The dream is to classify statistical problems into equivalence classes, as has been done for other types of computational problems in the field of computational complexity. Showing these sorts of relationships means that, instead of trying to understand each problem in isolation, researchers can transfer their understanding from a well-studied problem to a poorly understood one, he says.

    Adopting a theoretical approach

    For Bresler, a desire to theoretically understand various basic phenomena inspired him to follow a path into academia.

    Both of his parents worked as professors and showed how fulfilling academia can be, he says. His earliest introduction to the theoretical side of engineering came from his father, who is an electrical engineer and theoretician studying signal processing. Bresler was inspired by his work from an early age. As an undergraduate at the University of Illinois at Urbana-Champaign, he bounced between physics, math, and computer science courses. But no matter the topic, he gravitated toward the theoretical viewpoint.

    In graduate school at the University of California at Berkeley, Bresler enjoyed the opportunity to work in a wide variety of topics spanning probability, theoretical computer science, and mathematics. His driving motivator was a love of learning new things.

    “Working at the interface of multiple fields with new questions, there is a feeling that one had better learn as much as possible if one is to have any chance of finding the right tools to answer those questions,” he says.

    That curiosity led him to MIT for a postdoc in the Laboratory for Information and Decision Systems (LIDS) in 2013, and then he joined the faculty two years later as an assistant professor in EECS. He was named an associate professor in 2019.

    Bresler says he was drawn to the intellectual atmosphere at MIT, as well as the supportive environment for launching bold research quests and trying to make progress in new areas of study.

    Opportunities for collaboration

    “What really struck me was how vibrant and energetic and collaborative MIT is. I have this mental list of more than 20 people here who I would love to have lunch with every single week and collaborate with on research. So just based on sheer numbers, joining MIT was a clear win,” he says.

    He’s especially enjoyed collaborating with his students, who continually teach him new things and ask deep questions that drive exciting research projects. One such student, Matthew Brennan, who was one of Bresler’s closest collaborators, tragically and unexpectedly passed away in January, 2021.

    The shock from Brennan’s death is still raw for Bresler, and it derailed his research for a time.

    “Beyond his own prodigious capabilities and creativity, he had this amazing ability to listen to an idea of mine that was almost completely wrong, extract from it a useful piece, and then pass the ball back,” he says. “We had the same vision for what we wanted to achieve in the work, and we were driven to try to tell a certain story. At the time, almost nobody was pursuing this particular line of work, and it was in a way kind of lonely. But he trusted me, and we encouraged one another to keep at it when things seemed bleak.”

    Those lessons in perseverance fuel Bresler as he and his students continue exploring questions that, by their nature, are difficult to answer.

    One area he’s worked in on-and-off for over a decade involves learning graphical models from data. Models of certain types of data, such as time-series data consisting of temperature readings, are often constructed by domain experts who have relevant knowledge and can build a reasonable model, he explains.

    But for many types of data with complex dependencies, such as social network or biological data, it is not at all clear what structure a model should take. Bresler’s work seeks to estimate a structured model from data, which could then be used for downstream applications like making recommendations or better predicting the weather.

    The basic question of identifying good models, whether algorithmically in a complex setting or analytically, by specifying a useful toy model for theoretical analysis, connects the abstract work with engineering practice, he says.

    “In general, modeling is an art. Real life is complicated and if you write down some super-complicated model that tries to capture every feature of a problem, it is doomed,” says Bresler. “You have to think about the problem and understand the practical side of things on some level to identify the correct features of the problem to be modeled, so that you can hope to actually solve it and gain insight into what one should do in practice.”

    Outside the lab, Bresler often finds himself solving very different kinds of problems. He is an avid rock climber and spends much of his free time bouldering throughout New England.

    “I really love it. It is a good excuse to get outside and get sucked into a whole different world. Even though there is problem solving involved, and there are similarities at the philosophical level, it is totally orthogonal to sitting down and doing math,” he says. More

  • in

    Neurodegenerative disease can progress in newly identified patterns

    Neurodegenerative diseases — like amyotrophic lateral sclerosis (ALS, or Lou Gehrig’s disease), Alzheimer’s, and Parkinson’s — are complicated, chronic ailments that can present with a variety of symptoms, worsen at different rates, and have many underlying genetic and environmental causes, some of which are unknown. ALS, in particular, affects voluntary muscle movement and is always fatal, but while most people survive for only a few years after diagnosis, others live with the disease for decades. Manifestations of ALS can also vary significantly; often slower disease development correlates with onset in the limbs and affecting fine motor skills, while the more serious, bulbar ALS impacts swallowing, speaking, breathing, and mobility. Therefore, understanding the progression of diseases like ALS is critical to enrollment in clinical trials, analysis of potential interventions, and discovery of root causes.

    However, assessing disease evolution is far from straightforward. Current clinical studies typically assume that health declines on a downward linear trajectory on a symptom rating scale, and use these linear models to evaluate whether drugs are slowing disease progression. However, data indicate that ALS often follows nonlinear trajectories, with periods where symptoms are stable alternating with periods when they are rapidly changing. Since data can be sparse, and health assessments often rely on subjective rating metrics measured at uneven time intervals, comparisons across patient populations are difficult. These heterogenous data and progression, in turn, complicate analyses of invention effectiveness and potentially mask disease origin.

    Now, a new machine-learning method developed by researchers from MIT, IBM Research, and elsewhere aims to better characterize ALS disease progression patterns to inform clinical trial design.

    “There are groups of individuals that share progression patterns. For example, some seem to have really fast-progressing ALS and others that have slow-progressing ALS that varies over time,” says Divya Ramamoorthy PhD ’22, a research specialist at MIT and lead author of a new paper on the work that was published this month in Nature Computational Science. “The question we were asking is: can we use machine learning to identify if, and to what extent, those types of consistent patterns across individuals exist?”

    Their technique, indeed, identified discrete and robust clinical patterns in ALS progression, many of which are non-linear. Further, these disease progression subtypes were consistent across patient populations and disease metrics. The team additionally found that their method can be applied to Alzheimer’s and Parkinson’s diseases as well.

    Joining Ramamoorthy on the paper are MIT-IBM Watson AI Lab members Ernest Fraenkel, a professor in the MIT Department of Biological Engineering; Research Scientist Soumya Ghosh of IBM Research; and Principal Research Scientist Kenney Ng, also of IBM Research. Additional authors include Kristen Severson PhD ’18, a senior researcher at Microsoft Research and former member of the Watson Lab and of IBM Research; Karen Sachs PhD ’06 of Next Generation Analytics; a team of researchers with Answer ALS; Jonathan D. Glass and Christina N. Fournier of the Emory University School of Medicine; the Pooled Resource Open-Access ALS Clinical Trials Consortium; ALS/MND Natural History Consortium; Todd M. Herrington of Massachusetts General Hospital (MGH) and Harvard Medical School; and James D. Berry of MGH.

    Play video

    MIT Professor Ernest Fraenkel describes early stages of his research looking at root causes of amyotrophic lateral sclerosis (ALS).

    Reshaping health decline

    After consulting with clinicians, the team of machine learning researchers and neurologists let the data speak for itself. They designed an unsupervised machine-learning model that employed two methods: Gaussian process regression and Dirichlet process clustering. These inferred the health trajectories directly from patient data and automatically grouped similar trajectories together without prescribing the number of clusters or the shape of the curves, forming ALS progression “subtypes.” Their method incorporated prior clinical knowledge in the way of a bias for negative trajectories — consistent with expectations for neurodegenerative disease progressions — but did not assume any linearity. “We know that linearity is not reflective of what’s actually observed,” says Ng. “The methods and models that we use here were more flexible, in the sense that, they capture what was seen in the data,” without the need for expensive labeled data and prescription of parameters.

    Primarily, they applied the model to five longitudinal datasets from ALS clinical trials and observational studies. These used the gold standard to measure symptom development: the ALS functional rating scale revised (ALSFRS-R), which captures a global picture of patient neurological impairment but can be a bit of a “messy metric.” Additionally, performance on survivability probabilities, forced vital capacity (a measurement of respiratory function), and subscores of ALSFRS-R, which looks at individual bodily functions, were incorporated.

    New regimes of progression and utility

    When their population-level model was trained and tested on these metrics, four dominant patterns of disease popped out of the many trajectories — sigmoidal fast progression, stable slow progression, unstable slow progression, and unstable moderate progression — many with strong nonlinear characteristics. Notably, it captured trajectories where patients experienced a sudden loss of ability, called a functional cliff, which would significantly impact treatments, enrollment in clinical trials, and quality of life.

    The researchers compared their method against other commonly used linear and nonlinear approaches in the field to separate the contribution of clustering and linearity to the model’s accuracy. The new work outperformed them, even patient-specific models, and found that subtype patterns were consistent across measures. Impressively, when data were withheld, the model was able to interpolate missing values, and, critically, could forecast future health measures. The model could also be trained on one ALSFRS-R dataset and predict cluster membership in others, making it robust, generalizable, and accurate with scarce data. So long as 6-12 months of data were available, health trajectories could be inferred with higher confidence than conventional methods.

    The researchers’ approach also provided insights into Alzheimer’s and Parkinson’s diseases, both of which can have a range of symptom presentations and progression. For Alzheimer’s, the new technique could identify distinct disease patterns, in particular variations in the rates of conversion of mild to severe disease. The Parkinson’s analysis demonstrated a relationship between progression trajectories for off-medication scores and disease phenotypes, such as the tremor-dominant or postural instability/gait difficulty forms of Parkinson’s disease.

    The work makes significant strides to find the signal amongst the noise in the time-series of complex neurodegenerative disease. “The patterns that we see are reproducible across studies, which I don’t believe had been shown before, and that may have implications for how we subtype the [ALS] disease,” says Fraenkel. As the FDA has been considering the impact of non-linearity in clinical trial designs, the team notes that their work is particularly pertinent.

    As new ways to understand disease mechanisms come online, this model provides another tool to pick apart illnesses like ALS, Alzheimer’s, and Parkinson’s from a systems biology perspective.

    “We have a lot of molecular data from the same patients, and so our long-term goal is to see whether there are subtypes of the disease,” says Fraenkel, whose lab looks at cellular changes to understand the etiology of diseases and possible targets for cures. “One approach is to start with the symptoms … and see if people with different patterns of disease progression are also different at the molecular level. That might lead you to a therapy. Then there’s the bottom-up approach, where you start with the molecules” and try to reconstruct biological pathways that might be affected. “We’re going [to be tackling this] from both ends … and finding if something meets in the middle.”

    This research was supported, in part, by the MIT-IBM Watson AI Lab, the Muscular Dystrophy Association, Department of Veterans Affairs of Research and Development, the Department of Defense, NSF Gradate Research Fellowship Program, Siebel Scholars Fellowship, Answer ALS, the United States Army Medical Research Acquisition Activity, National Institutes of Health, and the NIH/NINDS. More