More stories

  • in

    Busy GPUs: Sampling and pipelining method speeds up deep learning on large graphs

    Graphs, a potentially extensive web of nodes connected by edges, can be used to express and interrogate relationships between data, like social connections, financial transactions, traffic, energy grids, and molecular interactions. As researchers collect more data and build out these graphical pictures, researchers will need faster and more efficient methods, as well as more computational power, to conduct deep learning on them, in the way of graph neural networks (GNN).  

    Now, a new method, called SALIENT (SAmpling, sLIcing, and data movemeNT), developed by researchers at MIT and IBM Research, improves the training and inference performance by addressing three key bottlenecks in computation. This dramatically cuts down on the runtime of GNNs on large datasets, which, for example, contain on the scale of 100 million nodes and 1 billion edges. Further, the team found that the technique scales well when computational power is added from one to 16 graphical processing units (GPUs). The work was presented at the Fifth Conference on Machine Learning and Systems.

    “We started to look at the challenges current systems experienced when scaling state-of-the-art machine learning techniques for graphs to really big datasets. It turned out there was a lot of work to be done, because a lot of the existing systems were achieving good performance primarily on smaller datasets that fit into GPU memory,” says Tim Kaler, the lead author and a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

    By vast datasets, experts mean scales like the entire Bitcoin network, where certain patterns and data relationships could spell out trends or foul play. “There are nearly a billion Bitcoin transactions on the blockchain, and if we want to identify illicit activities inside such a joint network, then we are facing a graph of such a scale,” says co-author Jie Chen, senior research scientist and manager of IBM Research and the MIT-IBM Watson AI Lab. “We want to build a system that is able to handle that kind of graph and allows processing to be as efficient as possible, because every day we want to keep up with the pace of the new data that are generated.”

    Kaler and Chen’s co-authors include Nickolas Stathas MEng ’21 of Jump Trading, who developed SALIENT as part of his graduate work; former MIT-IBM Watson AI Lab intern and MIT graduate student Anne Ouyang; MIT CSAIL postdoc Alexandros-Stavros Iliopoulos; MIT CSAIL Research Scientist Tao B. Schardl; and Charles E. Leiserson, the Edwin Sibley Webster Professor of Electrical Engineering at MIT and a researcher with the MIT-IBM Watson AI Lab.     

    For this problem, the team took a systems-oriented approach in developing their method: SALIENT, says Kaler. To do this, the researchers implemented what they saw as important, basic optimizations of components that fit into existing machine-learning frameworks, such as PyTorch Geometric and the deep graph library (DGL), which are interfaces for building a machine-learning model. Stathas says the process is like swapping out engines to build a faster car. Their method was designed to fit into existing GNN architectures, so that domain experts could easily apply this work to their specified fields to expedite model training and tease out insights during inference faster. The trick, the team determined, was to keep all of the hardware (CPUs, data links, and GPUs) busy at all times: while the CPU samples the graph and prepares mini-batches of data that will then be transferred through the data link, the more critical GPU is working to train the machine-learning model or conduct inference. 

    The researchers began by analyzing the performance of a commonly used machine-learning library for GNNs (PyTorch Geometric), which showed a startlingly low utilization of available GPU resources. Applying simple optimizations, the researchers improved GPU utilization from 10 to 30 percent, resulting in a 1.4 to two times performance improvement relative to public benchmark codes. This fast baseline code could execute one complete pass over a large training dataset through the algorithm (an epoch) in 50.4 seconds.                          

    Seeking further performance improvements, the researchers set out to examine the bottlenecks that occur at the beginning of the data pipeline: the algorithms for graph sampling and mini-batch preparation. Unlike other neural networks, GNNs perform a neighborhood aggregation operation, which computes information about a node using information present in other nearby nodes in the graph — for example, in a social network graph, information from friends of friends of a user. As the number of layers in the GNN increase, the number of nodes the network has to reach out to for information can explode, exceeding the limits of a computer. Neighborhood sampling algorithms help by selecting a smaller random subset of nodes to gather; however, the researchers found that current implementations of this were too slow to keep up with the processing speed of modern GPUs. In response, they identified a mix of data structures, algorithmic optimizations, and so forth that improved sampling speed, ultimately improving the sampling operation alone by about three times, taking the per-epoch runtime from 50.4 to 34.6 seconds. They also found that sampling, at an appropriate rate, can be done during inference, improving overall energy efficiency and performance, a point that had been overlooked in the literature, the team notes.      

    In previous systems, this sampling step was a multi-process approach, creating extra data and unnecessary data movement between the processes. The researchers made their SALIENT method more nimble by creating a single process with lightweight threads that kept the data on the CPU in shared memory. Further, SALIENT takes advantage of a cache of modern processors, says Stathas, parallelizing feature slicing, which extracts relevant information from nodes of interest and their surrounding neighbors and edges, within the shared memory of the CPU core cache. This again reduced the overall per-epoch runtime from 34.6 to 27.8 seconds.

    The last bottleneck the researchers addressed was to pipeline mini-batch data transfers between the CPU and GPU using a prefetching step, which would prepare data just before it’s needed. The team calculated that this would maximize bandwidth usage in the data link and bring the method up to perfect utilization; however, they only saw around 90 percent. They identified and fixed a performance bug in a popular PyTorch library that caused unnecessary round-trip communications between the CPU and GPU. With this bug fixed, the team achieved a 16.5 second per-epoch runtime with SALIENT.

    “Our work showed, I think, that the devil is in the details,” says Kaler. “When you pay close attention to the details that impact performance when training a graph neural network, you can resolve a huge number of performance issues. With our solutions, we ended up being completely bottlenecked by GPU computation, which is the ideal goal of such a system.”

    SALIENT’s speed was evaluated on three standard datasets ogbn-arxiv, ogbn-products, and ogbn-papers100M, as well as in multi-machine settings, with different levels of fanout (amount of data that the CPU would prepare for the GPU), and across several architectures, including the most recent state-of-the-art one, GraphSAGE-RI. In each setting, SALIENT outperformed PyTorch Geometric, most notably on the large ogbn-papers100M dataset, containing 100 million nodes and over a billion edges Here, it was three times faster, running on one GPU, than the optimized baseline that was originally created for this work; with 16 GPUs, SALIENT was an additional eight times faster. 

    While other systems had slightly different hardware and experimental setups, so it wasn’t always a direct comparison, SALIENT still outperformed them. Among systems that achieved similar accuracy, representative performance numbers include 99 seconds using one GPU and 32 CPUs, and 13 seconds using 1,536 CPUs. In contrast, SALIENT’s runtime using one GPU and 20 CPUs was 16.5 seconds and was just two seconds with 16 GPUs and 320 CPUs. “If you look at the bottom-line numbers that prior work reports, our 16 GPU runtime (two seconds) is an order of magnitude faster than other numbers that have been reported previously on this dataset,” says Kaler. The researchers attributed their performance improvements, in part, to their approach of optimizing their code for a single machine before moving to the distributed setting. Stathas says that the lesson here is that for your money, “it makes more sense to use the hardware you have efficiently, and to its extreme, before you start scaling up to multiple computers,” which can provide significant savings on cost and carbon emissions that can come with model training.

    This new capacity will now allow researchers to tackle and dig deeper into bigger and bigger graphs. For example, the Bitcoin network that was mentioned earlier contained 100,000 nodes; the SALIENT system can capably handle a graph 1,000 times (or three orders of magnitude) larger.

    “In the future, we would be looking at not just running this graph neural network training system on the existing algorithms that we implemented for classifying or predicting the properties of each node, but we also want to do more in-depth tasks, such as identifying common patterns in a graph (subgraph patterns), [which] may be actually interesting for indicating financial crimes,” says Chen. “We also want to identify nodes in a graph that are similar in a sense that they possibly would be corresponding to the same bad actor in a financial crime. These tasks would require developing additional algorithms, and possibly also neural network architectures.”

    This research was supported by the MIT-IBM Watson AI Lab and in part by the U.S. Air Force Research Laboratory and the U.S. Air Force Artificial Intelligence Accelerator. More

  • in

    Breaking the scaling limits of analog computing

    As machine-learning models become larger and more complex, they require faster and more energy-efficient hardware to perform computations. Conventional digital computers are struggling to keep up.

    An analog optical neural network could perform the same tasks as a digital one, such as image classification or speech recognition, but because computations are performed using light instead of electrical signals, optical neural networks can run many times faster while consuming less energy.

    However, these analog devices are prone to hardware errors that can make computations less precise. Microscopic imperfections in hardware components are one cause of these errors. In an optical neural network that has many connected components, errors can quickly accumulate.

    Even with error-correction techniques, due to fundamental properties of the devices that make up an optical neural network, some amount of error is unavoidable. A network that is large enough to be implemented in the real world would be far too imprecise to be effective.

    MIT researchers have overcome this hurdle and found a way to effectively scale an optical neural network. By adding a tiny hardware component to the optical switches that form the network’s architecture, they can reduce even the uncorrectable errors that would otherwise accumulate in the device.

    Their work could enable a super-fast, energy-efficient, analog neural network that can function with the same accuracy as a digital one. With this technique, as an optical circuit becomes larger, the amount of error in its computations actually decreases.  

    “This is remarkable, as it runs counter to the intuition of analog systems, where larger circuits are supposed to have higher errors, so that errors set a limit on scalability. This present paper allows us to address the scalability question of these systems with an unambiguous ‘yes,’” says lead author Ryan Hamerly, a visiting scientist in the MIT Research Laboratory for Electronics (RLE) and Quantum Photonics Laboratory and senior scientist at NTT Research.

    Hamerly’s co-authors are graduate student Saumil Bandyopadhyay and senior author Dirk Englund, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), leader of the Quantum Photonics Laboratory, and member of the RLE. The research is published today in Nature Communications.

    Multiplying with light

    An optical neural network is composed of many connected components that function like reprogrammable, tunable mirrors. These tunable mirrors are called Mach-Zehnder Inferometers (MZI). Neural network data are encoded into light, which is fired into the optical neural network from a laser.

    A typical MZI contains two mirrors and two beam splitters. Light enters the top of an MZI, where it is split into two parts which interfere with each other before being recombined by the second beam splitter and then reflected out the bottom to the next MZI in the array. Researchers can leverage the interference of these optical signals to perform complex linear algebra operations, known as matrix multiplication, which is how neural networks process data.

    But errors that can occur in each MZI quickly accumulate as light moves from one device to the next. One can avoid some errors by identifying them in advance and tuning the MZIs so earlier errors are cancelled out by later devices in the array.

    “It is a very simple algorithm if you know what the errors are. But these errors are notoriously difficult to ascertain because you only have access to the inputs and outputs of your chip,” says Hamerly. “This motivated us to look at whether it is possible to create calibration-free error correction.”

    Hamerly and his collaborators previously demonstrated a mathematical technique that went a step further. They could successfully infer the errors and correctly tune the MZIs accordingly, but even this didn’t remove all the error.

    Due to the fundamental nature of an MZI, there are instances where it is impossible to tune a device so all light flows out the bottom port to the next MZI. If the device loses a fraction of light at each step and the array is very large, by the end there will only be a tiny bit of power left.

    “Even with error correction, there is a fundamental limit to how good a chip can be. MZIs are physically unable to realize certain settings they need to be configured to,” he says.

    So, the team developed a new type of MZI. The researchers added an additional beam splitter to the end of the device, calling it a 3-MZI because it has three beam splitters instead of two. Due to the way this additional beam splitter mixes the light, it becomes much easier for an MZI to reach the setting it needs to send all light from out through its bottom port.

    Importantly, the additional beam splitter is only a few micrometers in size and is a passive component, so it doesn’t require any extra wiring. Adding additional beam splitters doesn’t significantly change the size of the chip.

    Bigger chip, fewer errors

    When the researchers conducted simulations to test their architecture, they found that it can eliminate much of the uncorrectable error that hampers accuracy. And as the optical neural network becomes larger, the amount of error in the device actually drops — the opposite of what happens in a device with standard MZIs.

    Using 3-MZIs, they could potentially create a device big enough for commercial uses with error that has been reduced by a factor of 20, Hamerly says.

    The researchers also developed a variant of the MZI design specifically for correlated errors. These occur due to manufacturing imperfections — if the thickness of a chip is slightly wrong, the MZIs may all be off by about the same amount, so the errors are all about the same. They found a way to change the configuration of an MZI to make it robust to these types of errors. This technique also increased the bandwidth of the optical neural network so it can run three times faster.

    Now that they have showcased these techniques using simulations, Hamerly and his collaborators plan to test these approaches on physical hardware and continue driving toward an optical neural network they can effectively deploy in the real world.

    This research is funded, in part, by a National Science Foundation graduate research fellowship and the U.S. Air Force Office of Scientific Research. More

  • in

    A far-sighted approach to machine learning

    Picture two teams squaring off on a football field. The players can cooperate to achieve an objective, and compete against other players with conflicting interests. That’s how the game works.

    Creating artificial intelligence agents that can learn to compete and cooperate as effectively as humans remains a thorny problem. A key challenge is enabling AI agents to anticipate future behaviors of other agents when they are all learning simultaneously.

    Because of the complexity of this problem, current approaches tend to be myopic; the agents can only guess the next few moves of their teammates or competitors, which leads to poor performance in the long run. 

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a new approach that gives AI agents a farsighted perspective. Their machine-learning framework enables cooperative or competitive AI agents to consider what other agents will do as time approaches infinity, not just over a few next steps. The agents then adapt their behaviors accordingly to influence other agents’ future behaviors and arrive at an optimal, long-term solution.

    This framework could be used by a group of autonomous drones working together to find a lost hiker in a thick forest, or by self-driving cars that strive to keep passengers safe by anticipating future moves of other vehicles driving on a busy highway.

    “When AI agents are cooperating or competing, what matters most is when their behaviors converge at some point in the future. There are a lot of transient behaviors along the way that don’t matter very much in the long run. Reaching this converged behavior is what we really care about, and we now have a mathematical way to enable that,” says Dong-Ki Kim, a graduate student in the MIT Laboratory for Information and Decision Systems (LIDS) and lead author of a paper describing this framework.

    The senior author is Jonathan P. How, the Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Co-authors include others at the MIT-IBM Watson AI Lab, IBM Research, Mila-Quebec Artificial Intelligence Institute, and Oxford University. The research will be presented at the Conference on Neural Information Processing Systems.

    Play video

    In this demo video, the red robot, which has been trained using the researchers’ machine-learning system, is able to defeat the green robot by learning more effective behaviors that take advantage of the constantly changing strategy of its opponent.

    More agents, more problems

    The researchers focused on a problem known as multiagent reinforcement learning. Reinforcement learning is a form of machine learning in which an AI agent learns by trial and error. Researchers give the agent a reward for “good” behaviors that help it achieve a goal. The agent adapts its behavior to maximize that reward until it eventually becomes an expert at a task.

    But when many cooperative or competing agents are simultaneously learning, things become increasingly complex. As agents consider more future steps of their fellow agents, and how their own behavior influences others, the problem soon requires far too much computational power to solve efficiently. This is why other approaches only focus on the short term.

    “The AIs really want to think about the end of the game, but they don’t know when the game will end. They need to think about how to keep adapting their behavior into infinity so they can win at some far time in the future. Our paper essentially proposes a new objective that enables an AI to think about infinity,” says Kim.

    But since it is impossible to plug infinity into an algorithm, the researchers designed their system so agents focus on a future point where their behavior will converge with that of other agents, known as equilibrium. An equilibrium point determines the long-term performance of agents, and multiple equilibria can exist in a multiagent scenario. Therefore, an effective agent actively influences the future behaviors of other agents in such a way that they reach a desirable equilibrium from the agent’s perspective. If all agents influence each other, they converge to a general concept that the researchers call an “active equilibrium.”

    The machine-learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing acTive influence witH averagE Reward), enables agents to learn how to adapt their behaviors as they interact with other agents to achieve this active equilibrium.

    FURTHER does this using two machine-learning modules. The first, an inference module, enables an agent to guess the future behaviors of other agents and the learning algorithms they use, based solely on their prior actions.

    This information is fed into the reinforcement learning module, which the agent uses to adapt its behavior and influence other agents in a way that maximizes its reward.

    “The challenge was thinking about infinity. We had to use a lot of different mathematical tools to enable that, and make some assumptions to get it to work in practice,” Kim says.

    Winning in the long run

    They tested their approach against other multiagent reinforcement learning frameworks in several different scenarios, including a pair of robots fighting sumo-style and a battle pitting two 25-agent teams against one another. In both instances, the AI agents using FURTHER won the games more often.

    Since their approach is decentralized, which means the agents learn to win the games independently, it is also more scalable than other methods that require a central computer to control the agents, Kim explains.

    The researchers used games to test their approach, but FURTHER could be used to tackle any kind of multiagent problem. For instance, it could be applied by economists seeking to develop sound policy in situations where many interacting entitles have behaviors and interests that change over time.

    Economics is one application Kim is particularly excited about studying. He also wants to dig deeper into the concept of an active equilibrium and continue enhancing the FURTHER framework.

    This research is funded, in part, by the MIT-IBM Watson AI Lab. More

  • in

    Celebrating open data

    The inaugural MIT Prize for Open Data, which included a $2,500 cash prize, was recently awarded to 10 individual and group research projects. Presented jointly by the School of Science and the MIT Libraries, the prize recognizes MIT-affiliated researchers who make their data openly accessible and reusable by others. The prize winners and 16 honorable mention recipients were honored at the Open Data @ MIT event held Oct. 28 at Hayden Library. 

    “By making data open, researchers create opportunities for novel uses of their data and for new insights to be gleaned,” says Chris Bourg, director of MIT Libraries. “Open data accelerates scholarly progress and discovery, advances equity in scholarly participation, and increases transparency, replicability, and trust in science.” 

    Recognizing shared values

    Spearheaded by Bourg and Rebecca Saxe, associate dean of the School of Science and John W. Jarve (1978) Professor of Brain and Cognitive Sciences, the MIT Prize for Open Data was launched to highlight the value of open data at MIT and to encourage the next generation of researchers. Nominations were solicited from across the Institute, with a focus on trainees: research technicians, undergraduate or graduate students, or postdocs.

    “By launching an MIT-wide prize and event, we aimed to create visibility for the scholars who create, use, and advocate for open data,” says Saxe. “Highlighting this research and creating opportunities for networking would also help open-data advocates across campus find each other.” 

    Recognizing researchers who share data was also one of the recommendations of the Ad Hoc Task Force on Open Access to MIT’s Research, which Bourg co-chaired with Hal Abelson, Class of 1922 Professor, Department of Electrical Engineering and Computer Science. An annual award was one of the strategies put forth by the task force to further the Institute’s mission to disseminate the fruits of its research and scholarship as widely as possible.

    Strong competition

    Winners and honorable mentions were chosen from more than 70 nominees, representing all five schools, the MIT Schwarzman College of Computing, and several research centers across MIT. A committee composed of faculty, staff, and a graduate student made the selections:

    Yunsie Chung, graduate student in the Department of Chemical Engineering, won for SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds. 
    Matthew Groh, graduate student, MIT Media Lab, accepted on behalf of the team behind the Fitzpatrick 17k dataset, an open dataset consisting of nearly 17,000 images of skin disease alongside skin disease and skin tone annotations. 
    Tom Pollard, research scientist at the Institute for Medical Engineering and Science, accepted on behalf of the PhysioNet team. This data-sharing platform enables thousands of clinical and machine-learning research studies each year and allows researchers to share sensitive resources that would not be possible through typical data sharing platforms. 
    Joseph Replogle, graduate student with the Whitehead Institute for Biomedical Research, was recognized for the Genome-wide Perturb-seq dataset, the largest publicly available, single-cell transcriptional dataset collected to date. 
    Pedro Reynolds-Cuéllar, graduate student with the MIT Media Lab/Art, Culture, and Technology, and Diana Duarte, co-founder at Diversa, won for Retos, an open-data platform for detailed documentation and sharing of local innovations from under-resourced settings. 
    Maanas Sharma, an undergraduate student, led States of Emergency, a nationwide project analyzing and grading the responses of prison systems to Covid-19 using data scraped from public databases and manually collected data. 
    Djuna von Maydell, graduate student in the Department of Brain and Cognitive Sciences, created the first publicly available dataset of single-cell gene expression from postmortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene. 
    Raechel Walker, graduate researcher in the MIT Media Lab, and her collaborators created a Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program in Cambridge, Massachusetts. Students learned how to use data science to recognize, mitigate, and advocate for people who are disproportionately impacted by systemic inequality. 
    Suyeol Yun, graduate student in the Department of Political Science, was recognized for DeepWTO, a project creating open data for use in legal natural language processing research using cases from the World Trade Organization. 
    Jonathan Zheng, graduate student in the Department of Chemical Engineering, won for an open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution.
    A full list of winners and honorable mentions is available on the Open Data @ MIT website.

    A campus-wide celebration

    Awards were presented at a celebratory event held in the Nexus in Hayden Library during International Open Access Week. School of Science Dean Nergis Mavalvala kicked off the program by describing the long and proud history of open scholarship at MIT, citing the Institute-wide faculty open access policy and the launch of the open-source digital repository DSpace. “When I was a graduate student, we were trying to figure out how to share our theses during the days of the nascent internet,” she said, “With DSpace, MIT was figuring it out for us.” 

    The centerpiece of the program was a series of five-minute presentations from the prize winners on their research. Presenters detailed the ways they created, used, or advocated for open data, and the value that openness brings to their respective fields. Winner Djuna von Maydell, a graduate student in Professor Li-Huei Tsai’s lab who studies the genetic causes of neurodegeneration, underscored why it is important to share data, particularly data obtained from postmortem human brains. 

    “This is data generated from human brains, so every data point stems from a living, breathing human being, who presumably made this donation in the hope that we would use it to advance knowledge and uncover truth,” von Maydell said. “To maximize the probability of that happening, we have to make it available to the scientific community.” 

    MIT community members who would like to learn more about making their research data open can consult MIT Libraries’ Data Services team.  More

  • in

    Can your phone tell if a bridge is in good shape?

    Want to know if the Golden Gate Bridge is holding up well? There could be an app for that.

    A new study involving MIT researchers shows that mobile phones placed in vehicles, equipped with special software, can collect useful structural integrity data while crossing bridges. In so doing, they could become a less expensive alternative to sets of sensors attached to bridges themselves.

    “The core finding is that information about structural health of bridges can be extracted from smartphone-collected accelerometer data,” says Carlo Ratti, director of the MIT Sensable City Laboratory and co-author of a new paper summarizing the study’s findings.

    The research was conducted, in part, on the Golden Gate Bridge itself. The study showed that mobile devices can capture the same kind of information about bridge vibrations that stationary sensors compile. The researchers also estimate that, depending on the age of a road bridge, mobile-device monitoring could add from 15 percent to 30 percent more years to the structure’s lifespan.

    “These results suggest that massive and inexpensive datasets collected by smartphones could play an important role in monitoring the health of existing transportation infrastructure,” the authors write in their new paper.

    The study, “Crowdsourcing Bridge Vital Signs with Smartphone Vehicle Trips,” is being published in Communications Engineering.

    The authors are Thomas J. Matarazzo, an assistant professor of civil and mechanical engineering at the United States Military Academy at West Point; Daniel Kondor, a postdoc at the Complexity Science Hub in Vienna; Sebastiano Milardo, a researcher at the Senseable City Lab; Soheil S. Eshkevari, a senior research scientist at DiDi Labs and a former member of Senseable City Lab; Paolo Santi, principal research scientist at the Senseable City Lab and research director at the Italian National Research Council; Shamim N. Pakzad, a professor and chair of the Department of Civil and Environmental Engineering at Lehigh University; Markus J. Buehler, the Jerry McAfee Professor in Engineering and professor of civil and environmental engineering and of mechanical engineering at MIT; and Ratti, who is also professor of the practice in MIT’s Department of Urban Studies and Planning.

    Bridges naturally vibrate, and to study the essential “modal frequencies” of those vibrations in many directions, engineers typically place sensors, such as accelerometers, on bridges themselves. Changes in the modal frequencies over time may indicate changes in a bridge’s structural integrity.

    To conduct the study, the researchers developed an Android-based mobile phone application to collect accelerometer data when the devices were placed in vehicles passing over the bridge. They could then see how well those data matched up with data record by sensors on bridges themselves, to see if the mobile-phone method worked.

    “In our work, we designed a methodology for extracting modal vibration frequencies from noisy data collected from smartphones,” Santi says. “As data from multiple trips over a bridge are recorded, noise generated by engine, suspension and traffic vibrations, [and] asphalt, tend to cancel out, while the underlying dominant frequencies emerge.”

    In the case of the Golden Gate Bridge, the researchers drove over the bridge 102 times with their devices running, and the team used 72 trips by Uber drivers with activated phones as well. The team then compared the resulting data to that from a group of 240 sensors that had been placed on the Golden Gate Bridge for three months.

    The outcome was that the data from the phones converged with that from the bridge’s sensors; for 10 particular types of low-frequency vibrations engineers measure on the bridge, there was a close match, and in five cases, there was no discrepancy between the methods at all.

    “We were able to show that many of these frequencies correspond very accurately to the prominent modal frequencies of the bridge,” Santi says.  

    However, only 1 percent of all bridges in the U.S. are suspension bridges. About 41 percent are much smaller concrete span bridges. So, the researchers also examined how well their method would fare in that setting.

    To do so, they studied a bridge in Ciampino, Italy, comparing 280 vehicle trips over the bridge to six sensors that had been placed on the bridge for seven months. Here, the researchers were also encouraged by the findings, though they found up to a 2.3 percent divergence between methods for certain modal frequencies over all 280 trips, and a 5.5 percent divergence over a smaller sample. That suggests a larger volume of trips could yield more useful data.

    “Our initial results suggest that only a [modest amount] of trips over the span of a few weeks are sufficient to obtain useful information about bridge modal frequencies,” Santi says.

    Looking at the method as a whole, Buehler observes, “Vibrational signatures are emerging as a powerful tool to assess properties of large and complex systems, ranging from viral properties of pathogens to structural integrity of bridges as shown in this study. It’s a universal signal found widely in the natural and built environment that we’re just now beginning to explore as a diagnostic and generative tool in engineering.”

    As Ratti acknowledges, there are ways to refine and expand the research, including accounting for the effects of the smartphone mount in the vehicle, the influence of the vehicle type on the data, and more.

    “We still have work to do, but we believe that our approach could be scaled up easily — all the way to the level of an entire country,” Ratti says. “It might not reach the accuracy that one can get using fixed sensors installed on a bridge, but it could become a very interesting early-warning system. Small anomalies could then suggest when to carry out further analyses.”

    The researchers received support from Anas S.p.A., Allianz, Brose, Cisco, Dover Corporation, Ford, the Amsterdam Institute for Advanced Metropolitan Solutions, the Fraunhofer Institute, the former Kuwait-MIT Center for Natural Resources and the Environment, Lab Campus, RATP, Singapore–MIT Alliance for Research and Technology (SMART), SNCF Gares & Connexions, UBER, and the U.S. Department of Defense High-Performance Computing Modernization Program. More

  • in

    Methane research takes on new urgency at MIT

    One of the most notable climate change provisions in the 2022 Inflation Reduction Act is the first U.S. federal tax on a greenhouse gas (GHG). That the fee targets methane (CH4), rather than carbon dioxide (CO2), emissions is indicative of the urgency the scientific community has placed on reducing this short-lived but powerful gas. Methane persists in the air about 12 years — compared to more than 1,000 years for CO2 — yet it immediately causes about 120 times more warming upon release. The gas is responsible for at least a quarter of today’s gross warming. 

    “Methane has a disproportionate effect on near-term warming,” says Desiree Plata, the director of MIT Methane Network. “CH4 does more damage than CO2 no matter how long you run the clock. By removing methane, we could potentially avoid critical climate tipping points.” 

    Because GHGs have a runaway effect on climate, reductions made now will have a far greater impact than the same reductions made in the future. Cutting methane emissions will slow the thawing of permafrost, which could otherwise lead to massive methane releases, as well as reduce increasing emissions from wetlands.  

    “The goal of MIT Methane Network is to reduce methane emissions by 45 percent by 2030, which would save up to 0.5 degree C of warming by 2100,” says Plata, an associate professor of civil and environmental engineering at MIT and director of the Plata Lab. “When you consider that governments are trying for a 1.5-degree reduction of all GHGs by 2100, this is a big deal.” 

    Under normal concentrations, methane, like CO2, poses no health risks. Yet methane assists in the creation of high levels of ozone. In the lower atmosphere, ozone is a key component of air pollution, which leads to “higher rates of asthma and increased emergency room visits,” says Plata. 

    Methane-related projects at the Plata Lab include a filter made of zeolite — the same clay-like material used in cat litter — designed to convert methane into CO2 at dairy farms and coal mines. At first glance, the technology would appear to be a bit of a hard sell, since it converts one GHG into another. Yet the zeolite filter’s low carbon and dollar costs, combined with the disproportionate warming impact of methane, make it a potential game-changer.

    The sense of urgency about methane has been amplified by recent studies that show humans are generating far more methane emissions than previously estimated, and that the rates are rising rapidly. Exactly how much methane is in the air is uncertain. Current methods for measuring atmospheric methane, such as ground, drone, and satellite sensors, “are not readily abundant and do not always agree with each other,” says Plata.  

    The Plata Lab is collaborating with Tim Swager in the MIT Department of Chemistry to develop low-cost methane sensors. “We are developing chemiresisitive sensors that cost about a dollar that you could place near energy infrastructure to back-calculate where leaks are coming from,” says Plata.  

    The researchers are working on improving the accuracy of the sensors using machine learning techniques and are planning to integrate internet-of-things technology to transmit alerts. Plata and Swager are not alone in focusing on data collection: the Inflation Reduction Act adds significant funding for methane sensor research. 

    Other research at the Plata Lab includes the development of nanomaterials and heterogeneous catalysis techniques for environmental applications. The lab also explores mitigation solutions for industrial waste, particularly those related to the energy transition. Plata is the co-founder of an lithium-ion battery recycling startup called Nth Cycle. 

    On a more fundamental level, the Plata Lab is exploring how to develop products with environmental and social sustainability in mind. “Our overarching mission is to change the way that we invent materials and processes so that environmental objectives are incorporated along with traditional performance and cost metrics,” says Plata. “It is important to do that rigorous assessment early in the design process.”

    Play video

    MIT amps up methane research 

    The MIT Methane Network brings together 26 researchers from MIT along with representatives of other institutions “that are dedicated to the idea that we can reduce methane levels in our lifetime,” says Plata. The organization supports research such as Plata’s zeolite and sensor projects, as well as designing pipeline-fixing robots, developing methane-based fuels for clean hydrogen, and researching the capture and conversion of methane into liquid chemical precursors for pharmaceuticals and plastics. Other members are researching policies to encourage more sustainable agriculture and land use, as well as methane-related social justice initiatives. 

    “Methane is an especially difficult problem because it comes from all over the place,” says Plata. A recent Global Carbon Project study estimated that half of methane emissions are caused by humans. This is led by waste and agriculture (28 percent), including cow and sheep belching, rice paddies, and landfills.  

    Fossil fuels represent 18 percent of the total budget. Of this, about 63 percent is derived from oil and gas production and pipelines, 33 percent from coal mining activities, and 5 percent from industry and transportation. Human-caused biomass burning, primarily from slash-and-burn agriculture, emits about 4 percent of the global total.  

    The other half of the methane budget includes natural methane emissions from wetlands (20 percent) and other natural sources (30 percent). The latter includes permafrost melting and natural biomass burning, such as forest fires started by lightning.  

    With increases in global warming and population, the line between anthropogenic and natural causes is getting fuzzier. “Human activities are accelerating natural emissions,” says Plata. “Climate change increases the release of methane from wetlands and permafrost and leads to larger forest and peat fires.”  

    The calculations can get complicated. For example, wetlands provide benefits from CO2 capture, biological diversity, and sea level rise resiliency that more than compensate for methane releases. Meanwhile, draining swamps for development increases emissions. 

    Over 100 nations have signed onto the U.N.’s Global Methane Pledge to reduce at least 30 percent of anthropogenic emissions within the next 10 years. The U.N. report estimates that this goal can be achieved using proven technologies and that about 60 percent of these reductions can be accomplished at low cost. 

    Much of the savings would come from greater efficiencies in fossil fuel extraction, processing, and delivery. The methane fees in the Inflation Reduction Act are primarily focused on encouraging fossil fuel companies to accelerate ongoing efforts to cap old wells, flare off excess emissions, and tighten pipeline connections.  

    Fossil fuel companies have already made far greater pledges to reduce methane than they have with CO2, which is central to their business. This is due, in part, to the potential savings, as well as in preparation for methane regulations expected from the Environmental Protection Agency in late 2022. The regulations build upon existing EPA oversight of drilling operations, and will likely be exempt from the U.S. Supreme Court’s ruling that limits the federal government’s ability to regulate GHGs. 

    Zeolite filter targets methane in dairy and coal 

    The “low-hanging fruit” of gas stream mitigation addresses most of the 20 percent of total methane emissions in which the gas is released in sufficiently high concentrations for flaring. Plata’s zeolite filter aims to address the thornier challenge of reducing the 80 percent of non-flammable dilute emissions. 

    Plata found inspiration in decades-old catalysis research for turning methane into methanol. One strategy has been to use an abundant, low-cost aluminosilicate clay called zeolite.  

    “The methanol creation process is challenging because you need to separate a liquid, and it has very low efficiency,” says Plata. “Yet zeolite can be very efficient at converting methane into CO2, and it is much easier because it does not require liquid separation. Converting methane to CO2 sounds like a bad thing, but there is a major anti-warming benefit. And because methane is much more dilute than CO2, the relative CO2 contribution is minuscule.”  

    Using zeolite to create methanol requires highly concentrated methane, high temperatures and pressures, and industrial processing conditions. Yet Plata’s process, which dopes the zeolite with copper, operates in the presence of oxygen at much lower temperatures under typical pressures. “We let the methane proceed the way it wants from a thermodynamic perspective from methane to methanol down to CO2,” says Plata. 

    Researchers around the world are working on other dilute methane removal technologies. Projects include spraying iron salt aerosols into sea air where they react with natural chlorine or bromine radicals, thereby capturing methane. Most of these geoengineering solutions, however, are difficult to measure and would require massive scale to make a difference.  

    Plata is focusing her zeolite filters on environments where concentrations are high, but not so high as to be flammable. “We are trying to scale zeolite into filters that you could snap onto the side of a cross-ventilation fan in a dairy barn or in a ventilation air shaft in a coal mine,” says Plata. “For every packet of air we bring in, we take a lot of methane out, so we get more bang for our buck.”  

    The major challenge is creating a filter that can handle high flow rates without getting clogged or falling apart. Dairy barn air handlers can push air at up to 5,000 cubic feet per minute and coal mine handlers can approach 500,000 CFM. 

    Plata is exploring engineering options including fluidized bed reactors with floating catalyst particles. Another filter solution, based in part on catalytic converters, features “higher-order geometric structures where you have a porous material with a long path length where the gas can interact with the catalyst,” says Plata. “This avoids the challenge with fluidized beds of containing catalyst particles in the reactor. Instead, they are fixed within a structured material.”  

    Competing technologies for removing methane from mine shafts “operate at temperatures of 1,000 to 1,200 degrees C, requiring a lot of energy and risking explosion,” says Plata. “Our technology avoids safety concerns by operating at 300 to 400 degrees C. It reduces energy use and provides more tractable deployment costs.” 

    Potentially, energy and dollar costs could be further reduced in coal mines by capturing the heat generated by the conversion process. “In coal mines, you have enrichments above a half-percent methane, but below the 4 percent flammability threshold,” says Plata. “The excess heat from the process could be used to generate electricity using off-the-shelf converters.” 

    Plata’s dairy barn research is funded by the Gerstner Family Foundation and the coal mining project by the U.S. Department of Energy. “The DOE would like us to spin out the technology for scale-up within three years,” says Plata. “We cannot guarantee we will hit that goal, but we are trying to develop this as quickly as possible. Our society needs to start reducing methane emissions now.”  More

  • in

    Deep learning with light

    Ask a smart home device for the weather forecast, and it takes several seconds for the device to respond. One reason this latency occurs is because connected devices don’t have enough memory or power to store and run the enormous machine-learning models needed for the device to understand what a user is asking of it. The model is stored in a data center that may be hundreds of miles away, where the answer is computed and sent to the device.

    MIT researchers have created a new method for computing directly on these devices, which drastically reduces this latency. Their technique shifts the memory-intensive steps of running a machine-learning model to a central server where components of the model are encoded onto light waves.

    The waves are transmitted to a connected device using fiber optics, which enables tons of data to be sent lightning-fast through a network. The receiver then employs a simple optical device that rapidly performs computations using the parts of a model carried by those light waves.

    This technique leads to more than a hundredfold improvement in energy efficiency when compared to other methods. It could also improve security, since a user’s data do not need to be transferred to a central location for computation.

    This method could enable a self-driving car to make decisions in real-time while using just a tiny percentage of the energy currently required by power-hungry computers. It could also allow a user to have a latency-free conversation with their smart home device, be used for live video processing over cellular networks, or even enable high-speed image classification on a spacecraft millions of miles from Earth.

    “Every time you want to run a neural network, you have to run the program, and how fast you can run the program depends on how fast you can pipe the program in from memory. Our pipe is massive — it corresponds to sending a full feature-length movie over the internet every millisecond or so. That is how fast data comes into our system. And it can compute as fast as that,” says senior author Dirk Englund, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and member of the MIT Research Laboratory of Electronics.

    Joining Englund on the paper is lead author and EECS grad student Alexander Sludds; EECS grad student Saumil Bandyopadhyay, Research Scientist Ryan Hamerly, as well as others from MIT, the MIT Lincoln Laboratory, and Nokia Corporation. The research is published today in Science.

    Lightening the load

    Neural networks are machine-learning models that use layers of connected nodes, or neurons, to recognize patterns in datasets and perform tasks, like classifying images or recognizing speech. But these models can contain billions of weight parameters, which are numeric values that transform input data as they are processed. These weights must be stored in memory. At the same time, the data transformation process involves billions of algebraic computations, which require a great deal of power to perform.

    The process of fetching data (the weights of the neural network, in this case) from memory and moving them to the parts of a computer that do the actual computation is one of the biggest limiting factors to speed and energy efficiency, says Sludds.

    “So our thought was, why don’t we take all that heavy lifting — the process of fetching billions of weights from memory — move it away from the edge device and put it someplace where we have abundant access to power and memory, which gives us the ability to fetch those weights quickly?” he says.

    The neural network architecture they developed, Netcast, involves storing weights in a central server that is connected to a novel piece of hardware called a smart transceiver. This smart transceiver, a thumb-sized chip that can receive and transmit data, uses technology known as silicon photonics to fetch trillions of weights from memory each second.

    It receives weights as electrical signals and imprints them onto light waves. Since the weight data are encoded as bits (1s and 0s) the transceiver converts them by switching lasers; a laser is turned on for a 1 and off for a 0. It combines these light waves and then periodically transfers them through a fiber optic network so a client device doesn’t need to query the server to receive them.

    “Optics is great because there are many ways to carry data within optics. For instance, you can put data on different colors of light, and that enables a much higher data throughput and greater bandwidth than with electronics,” explains Bandyopadhyay.

    Trillions per second

    Once the light waves arrive at the client device, a simple optical component known as a broadband “Mach-Zehnder” modulator uses them to perform super-fast, analog computation. This involves encoding input data from the device, such as sensor information, onto the weights. Then it sends each individual wavelength to a receiver that detects the light and measures the result of the computation.

    The researchers devised a way to use this modulator to do trillions of multiplications per second, which vastly increases the speed of computation on the device while using only a tiny amount of power.   

    “In order to make something faster, you need to make it more energy efficient. But there is a trade-off. We’ve built a system that can operate with about a milliwatt of power but still do trillions of multiplications per second. In terms of both speed and energy efficiency, that is a gain of orders of magnitude,” Sludds says.

    They tested this architecture by sending weights over an 86-kilometer fiber that connects their lab to MIT Lincoln Laboratory. Netcast enabled machine-learning with high accuracy — 98.7 percent for image classification and 98.8 percent for digit recognition — at rapid speeds.

    “We had to do some calibration, but I was surprised by how little work we had to do to achieve such high accuracy out of the box. We were able to get commercially relevant accuracy,” adds Hamerly.

    Moving forward, the researchers want to iterate on the smart transceiver chip to achieve even better performance. They also want to miniaturize the receiver, which is currently the size of a shoe box, down to the size of a single chip so it could fit onto a smart device like a cell phone.

    “Using photonics and light as a platform for computing is a really exciting area of research with potentially huge implications on the speed and efficiency of our information technology landscape,” says Euan Allen, a Royal Academy of Engineering Research Fellow at the University of Bath, who was not involved with this work. “The work of Sludds et al. is an exciting step toward seeing real-world implementations of such devices, introducing a new and practical edge-computing scheme whilst also exploring some of the fundamental limitations of computation at very low (single-photon) light levels.”

    The research is funded, in part, by NTT Research, the National Science Foundation, the Air Force Office of Scientific Research, the Air Force Research Laboratory, and the Army Research Office. More

  • in

    Ad hoc committee releases report on remote teaching best practices for on-campus education

    The Ad Hoc Committee on Leveraging Best Practices from Remote Teaching for On-Campus Education has released a report that captures how instructors are weaving lessons learned from remote teaching into in-person classes. Despite the challenges imposed by teaching and learning remotely during the Covid-19 pandemic, the report says, “there were seeds planted then that, we hope, will bear fruit in the coming years.”

    “In the long run, one of the best things about having lived through our remote learning experience may be the intense and broad focus on pedagogy that it necessitated,” the report continues. “In a moment when nobody could just teach the way they had always done before, all of us had to go back to first principles and ask ourselves: What are our learning goals for our students? How can we best help them to achieve these goals?”

    The committee’s work is a direct response to one of the Refinement and Implementation Committees (RIC) formed as part of Task Force 2021 and Beyond. Led by co-chairs Krishna Rajagopal, the William A. M. Burden Professor of Physics, and Janet Rankin, director of the MIT Teaching + Learning Lab, the committee engaged with faculty and instructional staff, associate department heads, and undergraduate and graduate officers across MIT.

    The findings are distilled into four broad themes:

    Community, Well-being, and Belonging. Conversations revealed new ways that instructors cultivated these key interrelated concepts, all of which are fundamental to student learning and success. Many instructors focused more on supporting well-being and building community and belonging during the height of the pandemic precisely because the MIT community, and everyone in it, was under such great stress. Some of the resulting practices are continuing, the committee found. Examples include introducing simple gestures, such as start-of-class welcoming practices, and providing extensions and greater flexibility on student assignments. Also, many across MIT felt that the week-long Thanksgiving break offered in 2020 should become a permanent fixture in the academic calendar, because it enhances the well-being of both students and instructors at a time in the fall semester when everyone’s batteries need recharging. 
    Enhancing Engagement. The committee found a variety of practices that have enhanced engagement between students and instructors; among students; and among instructors. For example, many instructors have continued to offer some office hours on Zoom, which seems to reduce barriers to participation for many students, while offering in-person office hours for those who want to take advantage of opportunities for more open-ended conversations. Several departments increased their usage of undergraduate teaching assistants (UTAs) in ways that make students’ learning experience more engaging and give the UTAs a real teaching experience. In addition, many instructors are leveraging out-of-class communication spaces like Slack, Perusall, and Piazza so students can work together, ask questions, and share ideas. 
    Enriching and Augmenting the Learning Environment. The report presents two ways in which instructors have enhanced learning within the classroom: through blended learning and by incorporating authentic experiences. Although blended learning techniques are not new at MIT, after having made it through remote teaching many faculty have found new ways to combine synchronous in-person teaching with asynchronous activities for on-campus students, such as pre-class or pre-lab sequences of videos with exercises interspersed, take-home lab kits, auto-graded online problems that give students immediate feedback, and recorded lab experiences for subsequent review. In addition, instructors found many creative ways to make students’ learning more authentic by going on virtual field trips, using Zoom to bring experts from around the world into MIT classrooms or to enable interactions with students at other universities, and live-streaming experiments that students could not otherwise experience since they cannot be performed in a teaching lab.   
     Assessing Learning. For all its challenges, the report notes, remote teaching prompted instructors to take a step back and think about what they wanted students to learn, how to support it, and how to measure it. The committee found a variety of examples of alternatives to traditional assessments, such as papers or timed, written exams, that instructors tried during the pandemic and are continuing to use. These alternatives include shorter, more frequent, lower-stakes assessments; oral exams or debates; asynchronous, open-book/notes exams; virtual poster sessions; alternate grading schemes; and uploading paper psets and exams into Gradescope to use its logistics and rubrics to improve grading effectiveness and efficiency.
    A large portion of the report is devoted to an extensive, annotated list of best practices from remote instruction that are being used in the classroom. Interestingly, Rankin says, “so many of the strategies and practices developed and used during the pandemic are based on, and supported by, solid educational research.”

    The report concludes with one broad recommendation: that all faculty and instructors read the findings and experiment with some of the best practices in their own instruction. “Our hope is that the practices shared in the report will continue to be adopted, adapted, and expanded by members of the teaching community at MIT, and that instructors’ openness in sharing and learning from each will continue,” Rankin says.

    Two additional, specific recommendations are included in the report. First, the committee endorses the RIC 16 recommendation that a Classroom Advisory Board be created to provide strategic input grounded in evolving pedagogy about future classroom use and technology needs. In its conversations, the committee found a number of ways that remote teaching and learning have impacted students’ and instructors’ perceptions as they have returned to the classroom. For example, during the pandemic students benefited from being able to see everyone else’s faces on Zoom. As a result, some instructors would prefer classrooms that enable students to face each other, such as semi-circular classrooms instead of rectangular ones.

    More generally, the committee concluded, MIT needs classrooms with seats and tables that can be quickly and flexibly reconfigured to facilitate varying pedagogical objectives. The Classroom Advisory Board could also examine classroom technology; this includes the role of videoconferencing to create authentic engagement between MIT students and people far from campus, and blended learning that allows students to experience more of the in-classroom engagement with their peers and instructors from which the “magic of MIT” originates.

    Second, the committee recommends that an implementation group be formed to investigate the possibility of changing the MIT academic calendar to create a one-week break over Thanksgiving. “Finalizing an implementation plan will require careful consideration of various significant logistical challenges,” the report says. “However, the resulting gains to both well-being and learning from this change to the fall calendar make doing so worthwhile.”

    Rankin notes that the report findings dovetail with the recently released MIT Strategic Action Plan for Belonging, Achievement and Composition. “I believe that one of the most important things that became really apparent during remote teaching was that community, inclusion, and belonging really matter and are necessary for both learning and teaching, and that instructors can and should play a central role in creating structures and processes to support them in their classrooms and other learning environments,” she says.

    Rajagopal finds it inspiring that “during a time of intense stress — that nobody ever wants to relive — there was such an intense focus on how we teach and how our students learn that, today, in essentially every direction we look we see colleagues improving on-campus education for tomorrow. I hope that the report will help instructors across the Institute, and perhaps elsewhere, learn from each other. Its readers will see, as our committee did, new ways in which students and instructors are finding those moments, those interactions, where the magic of MIT is created.”

    In addition to the report, the co-chairs recommend two other valuable remote teaching resources: a video interview series, TLL’s Fresh Perspectives, and Open Learning’s collection of examples of how MIT faculty and instructors leveraged digital technology to support and transform teaching and learning during the heart of the pandemic. More