More stories

  • in

    Meet the 2022-23 Accenture Fellows

    Launched in October 2020, the MIT and Accenture Convergence Initiative for Industry and Technology underscores the ways in which industry and technology can collaborate to spur innovation. The five-year initiative aims to achieve its mission through research, education, and fellowships. To that end, Accenture has once again awarded five annual fellowships to MIT graduate students working on research in industry and technology convergence who are underrepresented, including by race, ethnicity, and gender.This year’s Accenture Fellows work across research areas including telemonitoring, human-computer interactions, operations research,  AI-mediated socialization, and chemical transformations. Their research covers a wide array of projects, including designing low-power processing hardware for telehealth applications; applying machine learning to streamline and improve business operations; improving mental health care through artificial intelligence; and using machine learning to understand the environmental and health consequences of complex chemical reactions.As part of the application process, student nominations were invited from each unit within the School of Engineering, as well as from the Institute’s four other schools and the MIT Schwarzman College of Computing. Five exceptional students were selected as fellows for the initiative’s third year.Drew Buzzell is a doctoral candidate in electrical engineering and computer science whose research concerns telemonitoring, a fast-growing sphere of telehealth in which information is collected through internet-of-things (IoT) connected devices and transmitted to the cloud. Currently, the high volume of information involved in telemonitoring — and the time and energy costs of processing it — make data analysis difficult. Buzzell’s work is focused on edge computing, a new computing architecture that seeks to address these challenges by managing data closer to the source, in a distributed network of IoT devices. Buzzell earned his BS in physics and engineering science and his MS in engineering science from the Pennsylvania State University.

    Mengying (Cathy) Fang is a master’s student in the MIT School of Architecture and Planning. Her research focuses on augmented reality and virtual reality platforms. Fang is developing novel sensors and machine components that combine computation, materials science, and engineering. Moving forward, she will explore topics including soft robotics techniques that could be integrated with clothes and wearable devices and haptic feedback in order to develop interactions with digital objects. Fang earned a BS in mechanical engineering and human-computer interaction from Carnegie Mellon University.

    Xiaoyue Gong is a doctoral candidate in operations research at the MIT Sloan School of Management. Her research aims to harness the power of machine learning and data science to reduce inefficiencies in the operation of businesses, organizations, and society. With the support of an Accenture Fellowship, Gong seeks to find solutions to operational problems by designing reinforcement learning methods and other machine learning techniques to embedded operational problems. Gong earned a BS in honors mathematics and interactive media arts from New York University.

    Ruby Liu is a doctoral candidate in medical engineering and medical physics. Their research addresses the growing pandemic of loneliness among older adults, which leads to poor health outcomes and presents particularly high risks for historically marginalized people, including members of the LGBTQ+ community and people of color. Liu is designing a network of interconnected AI agents that foster connections between user and agent, offering mental health care while strengthening and facilitating human-human connections. Liu received a BS in biomedical engineering from Johns Hopkins University.

    Joules Provenzano is a doctoral candidate in chemical engineering. Their work integrates machine learning and liquid chromatography-high resolution mass spectrometry (LC-HRMS) to improve our understanding of complex chemical reactions in the environment. As an Accenture Fellow, Provenzano will build upon recent advances in machine learning and LC-HRMS, including novel algorithms for processing real, experimental HR-MS data and new approaches in extracting structure-transformation rules and kinetics. Their research could speed the pace of discovery in the chemical sciences and benefits industries including oil and gas, pharmaceuticals, and agriculture. Provenzano earned a BS in chemical engineering and international and global studies from the Rochester Institute of Technology. More

  • in

    A faster way to preserve privacy online

    Searching the internet can reveal information a user would rather keep private. For instance, when someone looks up medical symptoms online, they could reveal their health conditions to Google, an online medical database like WebMD, and perhaps hundreds of these companies’ advertisers and business partners.

    For decades, researchers have been crafting techniques that enable users to search for and retrieve information from a database privately, but these methods remain too slow to be effectively used in practice.

    MIT researchers have now developed a scheme for private information retrieval that is about 30 times faster than other comparable methods. Their technique enables a user to search an online database without revealing their query to the server. Moreover, it is driven by a simple algorithm that would be easier to implement than the more complicated approaches from previous work.

    Their technique could enable private communication by preventing a messaging app from knowing what users are saying or who they are talking to. It could also be used to fetch relevant online ads without advertising servers learning a users’ interests.

    “This work is really about giving users back some control over their own data. In the long run, we’d like browsing the web to be as private as browsing a library. This work doesn’t achieve that yet, but it starts building the tools to let us do this sort of thing quickly and efficiently in practice,” says Alexandra Henzinger, a computer science graduate student and lead author of a paper introducing the technique.

    Co-authors include Matthew Hong, an MIT computer science graduate student; Henry Corrigan-Gibbs, the Douglas Ross Career Development Professor of Software Technology in the MIT Department of Electrical Engineering and Computer Science (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Sarah Meiklejohn, a professor in cryptography and security at University College London and a staff research scientist at Google; and senior author Vinod Vaikuntanathan, an EECS professor and principal investigator in CSAIL. The research will be presented at the 2023 USENIX Security Symposium. 

    Preserving privacy

    The first schemes for private information retrieval were developed in the 1990s, partly by researchers at MIT. These techniques enable a user to communicate with a remote server that holds a database, and read records from that database without the server knowing what the user is reading.

    To preserve privacy, these techniques force the server to touch every single item in the database, so it can’t tell which entry a user is searching for. If one area is left untouched, the server would learn that the client is not interested in that item. But touching every item when there may be millions of database entries slows down the query process.

    To speed things up, the MIT researchers developed a protocol, known as Simple PIR, in which the server performs much of the underlying cryptographic work in advance, before a client even sends a query. This preprocessing step produces a data structure that holds compressed information about the database contents, and which the client downloads before sending a query.

    In a sense, this data structure is like a hint for the client about what is in the database.

    “Once the client has this hint, it can make an unbounded number of queries, and these queries are going to be much smaller in both the size of the messages you are sending and the work that you need the server to do. This is what makes Simple PIR so much faster,” Henzinger explains.

    But the hint can be relatively large in size. For example, to query a 1-gigabyte database, the client would need to download a 124-megabyte hint. This drives up communication costs, which could make the technique difficult to implement on real-world devices.

    To reduce the size of the hint, the researchers developed a second technique, known as Double PIR, that basically involves running the Simple PIR scheme twice. This produces a much more compact hint that is fixed in size for any database.

    Using Double PIR, the hint for a 1 gigabyte database would only be 16 megabytes.

    “Our Double PIR scheme runs a little bit slower, but it will have much lower communication costs. For some applications, this is going to be a desirable tradeoff,” Henzinger says.

    Hitting the speed limit

    They tested the Simple PIR and Double PIR schemes by applying them to a task in which a client seeks to audit a specific piece of information about a website to ensure that website is safe to visit. To preserve privacy, the client cannot reveal the website it is auditing.

    The researchers’ fastest technique was able to successfully preserve privacy while running at about 10 gigabytes per second. Previous schemes could only achieve a throughput of about 300 megabytes per second.

    They show that their method approaches the theoretical speed limit for private information retrieval — it is nearly the fastest possible scheme one can build in which the server touches every record in the database, adds Corrigan-Gibbs.

    In addition, their method only requires a single server, making it much simpler than many top-performing techniques that require two separate servers with identical databases. Their method outperformed these more complex protocols.

    “I’ve been thinking about these schemes for some time, and I never thought this could be possible at this speed. The folklore was that any single-server scheme is going to be really slow. This work turns that whole notion on its head,” Corrigan-Gibbs says.

    While the researchers have shown that they can make PIR schemes much faster, there is still work to do before they would be able to deploy their techniques in real-world scenarios, says Henzinger. They would like to cut the communication costs of their schemes while still enabling them to achieve high speeds. In addition, they want to adapt their techniques to handle more complex queries, such as general SQL queries, and more demanding applications, such as a general Wikipedia search. And in the long run, they hope to develop better techniques that can preserve privacy without requiring a server to touch every database item. 

    “I’ve heard people emphatically claiming that PIR will never be practical. But I would never bet against technology. That is an optimistic lesson to learn from this work. There are always ways to innovate,” Vaikuntanathan says.

    “This work makes a major improvement to the practical cost of private information retrieval. While it was known that low-bandwidth PIR schemes imply public-key cryptography, which is typically orders of magnitude slower than private-key cryptography, this work develops an ingenious method to bridge the gap. This is done by making a clever use of special properties of a public-key encryption scheme due to Regev to push the vast majority of the computational work to a precomputation step, in which the server computes a short ‘hint’ about the database,” says Yuval Ishai, a professor of computer science at Technion (the Israel Institute of Technology), who was not involved in the study. “What makes their approach particularly appealing is that the same hint can be used an unlimited number of times, by any number of clients. This renders the (moderate) cost of computing the hint insignificant in a typical scenario where the same database is accessed many times.”

    This work is funded, in part, by the National Science Foundation, Google, Facebook, MIT’s Fintech@CSAIL Initiative, an NSF Graduate Research Fellowship, an EECS Great Educators Fellowship, the National Institutes of Health, the Defense Advanced Research Projects Agency, the MIT-IBM Watson AI Lab, Analog Devices, Microsoft, and a Thornton Family Faculty Research Innovation Fellowship. More

  • in

    A healthy wind

    Nearly 10 percent of today’s electricity in the United States comes from wind power. The renewable energy source benefits climate, air quality, and public health by displacing emissions of greenhouse gases and air pollutants that would otherwise be produced by fossil-fuel-based power plants.

    A new MIT study finds that the health benefits associated with wind power could more than quadruple if operators prioritized turning down output from the most polluting fossil-fuel-based power plants when energy from wind is available.

    In the study, published today in Science Advances, researchers analyzed the hourly activity of wind turbines, as well as the reported emissions from every fossil-fuel-based power plant in the country, between the years 2011 and 2017. They traced emissions across the country and mapped the pollutants to affected demographic populations. They then calculated the regional air quality and associated health costs to each community.

    The researchers found that in 2014, wind power that was associated with state-level policies improved air quality overall, resulting in $2 billion in health benefits across the country. However, only roughly 30 percent of these health benefits reached disadvantaged communities.

    The team further found that if the electricity industry were to reduce the output of the most polluting fossil-fuel-based power plants, rather than the most cost-saving plants, in times of wind-generated power, the overall health benefits could quadruple to $8.4 billion nationwide. However, the results would have a similar demographic breakdown.

    “We found that prioritizing health is a great way to maximize benefits in a widespread way across the U.S., which is a very positive thing. But it suggests it’s not going to address disparities,” says study co-author Noelle Selin, a professor in the Institute for Data, Systems, and Society and the Department of Earth, Atmospheric and Planetary Sciences at MIT. “In order to address air pollution disparities, you can’t just focus on the electricity sector or renewables and count on the overall air pollution benefits addressing these real and persistent racial and ethnic disparities. You’ll need to look at other air pollution sources, as well as the underlying systemic factors that determine where plants are sited and where people live.”

    Selin’s co-authors are lead author and former MIT graduate student Minghao Qiu PhD ’21, now at Stanford University, and Corwin Zigler at the University of Texas at Austin.

    Turn-down service

    In their new study, the team looked for patterns between periods of wind power generation and the activity of fossil-fuel-based power plants, to see how regional electricity markets adjusted the output of power plants in response to influxes of renewable energy.

    “One of the technical challenges, and the contribution of this work, is trying to identify which are the power plants that respond to this increasing wind power,” Qiu notes.

    To do so, the researchers compared two historical datasets from the period between 2011 and 2017: an hour-by-hour record of energy output of wind turbines across the country, and a detailed record of emissions measurements from every fossil-fuel-based power plant in the U.S. The datasets covered each of seven major regional electricity markets, each market providing energy to one or multiple states.

    “California and New York are each their own market, whereas the New England market covers around seven states, and the Midwest covers more,” Qiu explains. “We also cover about 95 percent of all the wind power in the U.S.”

    In general, they observed that, in times when wind power was available, markets adjusted by essentially scaling back the power output of natural gas and sub-bituminous coal-fired power plants. They noted that the plants that were turned down were likely chosen for cost-saving reasons, as certain plants were less costly to turn down than others.

    The team then used a sophisticated atmospheric chemistry model to simulate the wind patterns and chemical transport of emissions across the country, and determined where and at what concentrations the emissions generated fine particulates and ozone — two pollutants that are known to damage air quality and human health. Finally, the researchers mapped the general demographic populations across the country, based on U.S. census data, and applied a standard epidemiological approach to calculate a population’s health cost as a result of their pollution exposure.

    This analysis revealed that, in the year 2014, a general cost-saving approach to displacing fossil-fuel-based energy in times of wind energy resulted in $2 billion in health benefits, or savings, across the country. A smaller share of these benefits went to disadvantaged populations, such as communities of color and low-income communities, though this disparity varied by state.

    “It’s a more complex story than we initially thought,” Qiu says. “Certain population groups are exposed to a higher level of air pollution, and those would be low-income people and racial minority groups. What we see is, developing wind power could reduce this gap in certain states but further increase it in other states, depending on which fossil-fuel plants are displaced.”

    Tweaking power

    The researchers then examined how the pattern of emissions and the associated health benefits would change if they prioritized turning down different fossil-fuel-based plants in times of wind-generated power. They tweaked the emissions data to reflect several alternative scenarios: one in which the most health-damaging, polluting power plants are turned down first; and two other scenarios in which plants producing the most sulfur dioxide and carbon dioxide respectively, are first to reduce their output.

    They found that while each scenario increased health benefits overall, and the first scenario in particular could quadruple health benefits, the original disparity persisted: Communities of color and low-income communities still experienced smaller health benefits than more well-off communities.

    “We got to the end of the road and said, there’s no way we can address this disparity by being smarter in deciding which plants to displace,” Selin says.

    Nevertheless, the study can help identify ways to improve the health of the general population, says Julian Marshall, a professor of environmental engineering at the University of Washington.

    “The detailed information provided by the scenarios in this paper can offer a roadmap to electricity-grid operators and to state air-quality regulators regarding which power plants are highly damaging to human health and also are likely to noticeably reduce emissions if wind-generated electricity increases,” says Marshall, who was not involved in the study.

    “One of the things that makes me optimistic about this area is, there’s a lot more attention to environmental justice and equity issues,” Selin concludes. “Our role is to figure out the strategies that are most impactful in addressing those challenges.”

    This work was supported, in part, by the U.S. Environmental Protection Agency, and by the National Institutes of Health. More

  • in

    Large language models help decipher clinical notes

    Electronic health records (EHRs) need a new public relations manager. Ten years ago, the U.S. government passed a law that required hospitals to digitize their health records with the intent of improving and streamlining care. The enormous amount of information in these now-digital records could be used to answer very specific questions beyond the scope of clinical trials: What’s the right dose of this medication for patients with this height and weight? What about patients with a specific genomic profile?

    Unfortunately, most of the data that could answer these questions is trapped in doctor’s notes, full of jargon and abbreviations. These notes are hard for computers to understand using current techniques — extracting information requires training multiple machine learning models. Models trained for one hospital, also, don’t work well at others, and training each model requires domain experts to label lots of data, a time-consuming and expensive process. 

    An ideal system would use a single model that can extract many types of information, work well at multiple hospitals, and learn from a small amount of labeled data. But how? Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believed that to disentangle the data, they needed to call on something bigger: large language models. To pull that important medical information, they used a very big, GPT-3 style model to do tasks like expand overloaded jargon and acronyms and extract medication regimens. 

    For example, the system takes an input, which in this case is a clinical note, “prompts” the model with a question about the note, such as “expand this abbreviation, C-T-A.” The system returns an output such as “clear to auscultation,” as opposed to say, a CT angiography. The objective of extracting this clean data, the team says, is to eventually enable more personalized clinical recommendations. 

    Medical data is, understandably, a pretty tricky resource to navigate freely. There’s plenty of red tape around using public resources for testing the performance of large models because of data use restrictions, so the team decided to scrape together their own. Using a set of short, publicly available clinical snippets, they cobbled together a small dataset to enable evaluation of the extraction performance of large language models. 

    “It’s challenging to develop a single general-purpose clinical natural language processing system that will solve everyone’s needs and be robust to the huge variation seen across health datasets. As a result, until today, most clinical notes are not used in downstream analyses or for live decision support in electronic health records. These large language model approaches could potentially transform clinical natural language processing,” says David Sontag, MIT professor of electrical engineering and computer science, principal investigator in CSAIL and the Institute for Medical Engineering and Science, and supervising author on a paper about the work, which will be presented at the Conference on Empirical Methods in Natural Language Processing. “The research team’s advances in zero-shot clinical information extraction makes scaling possible. Even if you have hundreds of different use cases, no problem — you can build each model with a few minutes of work, versus having to label a ton of data for that particular task.”

    For example, without any labels at all, the researchers found these models could achieve 86 percent accuracy at expanding overloaded acronyms, and the team developed additional methods to boost this further to 90 percent accuracy, with still no labels required.

    Imprisoned in an EHR 

    Experts have been steadily building up large language models (LLMs) for quite some time, but they burst onto the mainstream with GPT-3’s widely covered ability to complete sentences. These LLMs are trained on a huge amount of text from the internet to finish sentences and predict the next most likely word. 

    While previous, smaller models like earlier GPT iterations or BERT have pulled off a good performance for extracting medical data, they still require substantial manual data-labeling effort. 

    For example, a note, “pt will dc vanco due to n/v” means that this patient (pt) was taking the antibiotic vancomycin (vanco) but experienced nausea and vomiting (n/v) severe enough for the care team to discontinue (dc) the medication. The team’s research avoids the status quo of training separate machine learning models for each task (extracting medication, side effects from the record, disambiguating common abbreviations, etc). In addition to expanding abbreviations, they investigated four other tasks, including if the models could parse clinical trials and extract detail-rich medication regimens.  

    “Prior work has shown that these models are sensitive to the prompt’s precise phrasing. Part of our technical contribution is a way to format the prompt so that the model gives you outputs in the correct format,” says Hunter Lang, CSAIL PhD student and author on the paper. “For these extraction problems, there are structured output spaces. The output space is not just a string. It can be a list. It can be a quote from the original input. So there’s more structure than just free text. Part of our research contribution is encouraging the model to give you an output with the correct structure. That significantly cuts down on post-processing time.”

    The approach can’t be applied to out-of-the-box health data at a hospital: that requires sending private patient information across the open internet to an LLM provider like OpenAI. The authors showed that it’s possible to work around this by distilling the model into a smaller one that could be used on-site.

    The model — sometimes just like humans — is not always beholden to the truth. Here’s what a potential problem might look like: Let’s say you’re asking the reason why someone took medication. Without proper guardrails and checks, the model might just output the most common reason for that medication, if nothing is explicitly mentioned in the note. This led to the team’s efforts to force the model to extract more quotes from data and less free text.

    Future work for the team includes extending to languages other than English, creating additional methods for quantifying uncertainty in the model, and pulling off similar results with open-sourced models. 

    “Clinical information buried in unstructured clinical notes has unique challenges compared to general domain text mostly due to large use of acronyms, and inconsistent textual patterns used across different health care facilities,” says Sadid Hasan, AI lead at Microsoft and former executive director of AI at CVS Health, who was not involved in the research. “To this end, this work sets forth an interesting paradigm of leveraging the power of general domain large language models for several important zero-/few-shot clinical NLP tasks. Specifically, the proposed guided prompt design of LLMs to generate more structured outputs could lead to further developing smaller deployable models by iteratively utilizing the model generated pseudo-labels.”

    “AI has accelerated in the last five years to the point at which these large models can predict contextualized recommendations with benefits rippling out across a variety of domains such as suggesting novel drug formulations, understanding unstructured text, code recommendations or create works of art inspired by any number of human artists or styles,” says Parminder Bhatia, who was formerly Head of Machine Learning at AWS Health AI and is currently Head of ML for low-code applications leveraging large language models at AWS AI Labs. “One of the applications of these large models [the team has] recently launched is Amazon CodeWhisperer, which is [an] ML-powered coding companion that helps developers in building applications.”

    As part of the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, Agrawal, Sontag, and Lang wrote the paper alongside Yoon Kim, MIT assistant professor and CSAIL principal investigator, and Stefan Hegselmann, a visiting PhD student from the University of Muenster. First-author Agrawal’s research was supported by a Takeda Fellowship, the MIT Deshpande Center for Technological Innovation, and the MLA@CSAIL Initiatives. More

  • in

    Busy GPUs: Sampling and pipelining method speeds up deep learning on large graphs

    Graphs, a potentially extensive web of nodes connected by edges, can be used to express and interrogate relationships between data, like social connections, financial transactions, traffic, energy grids, and molecular interactions. As researchers collect more data and build out these graphical pictures, researchers will need faster and more efficient methods, as well as more computational power, to conduct deep learning on them, in the way of graph neural networks (GNN).  

    Now, a new method, called SALIENT (SAmpling, sLIcing, and data movemeNT), developed by researchers at MIT and IBM Research, improves the training and inference performance by addressing three key bottlenecks in computation. This dramatically cuts down on the runtime of GNNs on large datasets, which, for example, contain on the scale of 100 million nodes and 1 billion edges. Further, the team found that the technique scales well when computational power is added from one to 16 graphical processing units (GPUs). The work was presented at the Fifth Conference on Machine Learning and Systems.

    “We started to look at the challenges current systems experienced when scaling state-of-the-art machine learning techniques for graphs to really big datasets. It turned out there was a lot of work to be done, because a lot of the existing systems were achieving good performance primarily on smaller datasets that fit into GPU memory,” says Tim Kaler, the lead author and a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

    By vast datasets, experts mean scales like the entire Bitcoin network, where certain patterns and data relationships could spell out trends or foul play. “There are nearly a billion Bitcoin transactions on the blockchain, and if we want to identify illicit activities inside such a joint network, then we are facing a graph of such a scale,” says co-author Jie Chen, senior research scientist and manager of IBM Research and the MIT-IBM Watson AI Lab. “We want to build a system that is able to handle that kind of graph and allows processing to be as efficient as possible, because every day we want to keep up with the pace of the new data that are generated.”

    Kaler and Chen’s co-authors include Nickolas Stathas MEng ’21 of Jump Trading, who developed SALIENT as part of his graduate work; former MIT-IBM Watson AI Lab intern and MIT graduate student Anne Ouyang; MIT CSAIL postdoc Alexandros-Stavros Iliopoulos; MIT CSAIL Research Scientist Tao B. Schardl; and Charles E. Leiserson, the Edwin Sibley Webster Professor of Electrical Engineering at MIT and a researcher with the MIT-IBM Watson AI Lab.     

    For this problem, the team took a systems-oriented approach in developing their method: SALIENT, says Kaler. To do this, the researchers implemented what they saw as important, basic optimizations of components that fit into existing machine-learning frameworks, such as PyTorch Geometric and the deep graph library (DGL), which are interfaces for building a machine-learning model. Stathas says the process is like swapping out engines to build a faster car. Their method was designed to fit into existing GNN architectures, so that domain experts could easily apply this work to their specified fields to expedite model training and tease out insights during inference faster. The trick, the team determined, was to keep all of the hardware (CPUs, data links, and GPUs) busy at all times: while the CPU samples the graph and prepares mini-batches of data that will then be transferred through the data link, the more critical GPU is working to train the machine-learning model or conduct inference. 

    The researchers began by analyzing the performance of a commonly used machine-learning library for GNNs (PyTorch Geometric), which showed a startlingly low utilization of available GPU resources. Applying simple optimizations, the researchers improved GPU utilization from 10 to 30 percent, resulting in a 1.4 to two times performance improvement relative to public benchmark codes. This fast baseline code could execute one complete pass over a large training dataset through the algorithm (an epoch) in 50.4 seconds.                          

    Seeking further performance improvements, the researchers set out to examine the bottlenecks that occur at the beginning of the data pipeline: the algorithms for graph sampling and mini-batch preparation. Unlike other neural networks, GNNs perform a neighborhood aggregation operation, which computes information about a node using information present in other nearby nodes in the graph — for example, in a social network graph, information from friends of friends of a user. As the number of layers in the GNN increase, the number of nodes the network has to reach out to for information can explode, exceeding the limits of a computer. Neighborhood sampling algorithms help by selecting a smaller random subset of nodes to gather; however, the researchers found that current implementations of this were too slow to keep up with the processing speed of modern GPUs. In response, they identified a mix of data structures, algorithmic optimizations, and so forth that improved sampling speed, ultimately improving the sampling operation alone by about three times, taking the per-epoch runtime from 50.4 to 34.6 seconds. They also found that sampling, at an appropriate rate, can be done during inference, improving overall energy efficiency and performance, a point that had been overlooked in the literature, the team notes.      

    In previous systems, this sampling step was a multi-process approach, creating extra data and unnecessary data movement between the processes. The researchers made their SALIENT method more nimble by creating a single process with lightweight threads that kept the data on the CPU in shared memory. Further, SALIENT takes advantage of a cache of modern processors, says Stathas, parallelizing feature slicing, which extracts relevant information from nodes of interest and their surrounding neighbors and edges, within the shared memory of the CPU core cache. This again reduced the overall per-epoch runtime from 34.6 to 27.8 seconds.

    The last bottleneck the researchers addressed was to pipeline mini-batch data transfers between the CPU and GPU using a prefetching step, which would prepare data just before it’s needed. The team calculated that this would maximize bandwidth usage in the data link and bring the method up to perfect utilization; however, they only saw around 90 percent. They identified and fixed a performance bug in a popular PyTorch library that caused unnecessary round-trip communications between the CPU and GPU. With this bug fixed, the team achieved a 16.5 second per-epoch runtime with SALIENT.

    “Our work showed, I think, that the devil is in the details,” says Kaler. “When you pay close attention to the details that impact performance when training a graph neural network, you can resolve a huge number of performance issues. With our solutions, we ended up being completely bottlenecked by GPU computation, which is the ideal goal of such a system.”

    SALIENT’s speed was evaluated on three standard datasets ogbn-arxiv, ogbn-products, and ogbn-papers100M, as well as in multi-machine settings, with different levels of fanout (amount of data that the CPU would prepare for the GPU), and across several architectures, including the most recent state-of-the-art one, GraphSAGE-RI. In each setting, SALIENT outperformed PyTorch Geometric, most notably on the large ogbn-papers100M dataset, containing 100 million nodes and over a billion edges Here, it was three times faster, running on one GPU, than the optimized baseline that was originally created for this work; with 16 GPUs, SALIENT was an additional eight times faster. 

    While other systems had slightly different hardware and experimental setups, so it wasn’t always a direct comparison, SALIENT still outperformed them. Among systems that achieved similar accuracy, representative performance numbers include 99 seconds using one GPU and 32 CPUs, and 13 seconds using 1,536 CPUs. In contrast, SALIENT’s runtime using one GPU and 20 CPUs was 16.5 seconds and was just two seconds with 16 GPUs and 320 CPUs. “If you look at the bottom-line numbers that prior work reports, our 16 GPU runtime (two seconds) is an order of magnitude faster than other numbers that have been reported previously on this dataset,” says Kaler. The researchers attributed their performance improvements, in part, to their approach of optimizing their code for a single machine before moving to the distributed setting. Stathas says that the lesson here is that for your money, “it makes more sense to use the hardware you have efficiently, and to its extreme, before you start scaling up to multiple computers,” which can provide significant savings on cost and carbon emissions that can come with model training.

    This new capacity will now allow researchers to tackle and dig deeper into bigger and bigger graphs. For example, the Bitcoin network that was mentioned earlier contained 100,000 nodes; the SALIENT system can capably handle a graph 1,000 times (or three orders of magnitude) larger.

    “In the future, we would be looking at not just running this graph neural network training system on the existing algorithms that we implemented for classifying or predicting the properties of each node, but we also want to do more in-depth tasks, such as identifying common patterns in a graph (subgraph patterns), [which] may be actually interesting for indicating financial crimes,” says Chen. “We also want to identify nodes in a graph that are similar in a sense that they possibly would be corresponding to the same bad actor in a financial crime. These tasks would require developing additional algorithms, and possibly also neural network architectures.”

    This research was supported by the MIT-IBM Watson AI Lab and in part by the U.S. Air Force Research Laboratory and the U.S. Air Force Artificial Intelligence Accelerator. More

  • in

    Breaking the scaling limits of analog computing

    As machine-learning models become larger and more complex, they require faster and more energy-efficient hardware to perform computations. Conventional digital computers are struggling to keep up.

    An analog optical neural network could perform the same tasks as a digital one, such as image classification or speech recognition, but because computations are performed using light instead of electrical signals, optical neural networks can run many times faster while consuming less energy.

    However, these analog devices are prone to hardware errors that can make computations less precise. Microscopic imperfections in hardware components are one cause of these errors. In an optical neural network that has many connected components, errors can quickly accumulate.

    Even with error-correction techniques, due to fundamental properties of the devices that make up an optical neural network, some amount of error is unavoidable. A network that is large enough to be implemented in the real world would be far too imprecise to be effective.

    MIT researchers have overcome this hurdle and found a way to effectively scale an optical neural network. By adding a tiny hardware component to the optical switches that form the network’s architecture, they can reduce even the uncorrectable errors that would otherwise accumulate in the device.

    Their work could enable a super-fast, energy-efficient, analog neural network that can function with the same accuracy as a digital one. With this technique, as an optical circuit becomes larger, the amount of error in its computations actually decreases.  

    “This is remarkable, as it runs counter to the intuition of analog systems, where larger circuits are supposed to have higher errors, so that errors set a limit on scalability. This present paper allows us to address the scalability question of these systems with an unambiguous ‘yes,’” says lead author Ryan Hamerly, a visiting scientist in the MIT Research Laboratory for Electronics (RLE) and Quantum Photonics Laboratory and senior scientist at NTT Research.

    Hamerly’s co-authors are graduate student Saumil Bandyopadhyay and senior author Dirk Englund, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), leader of the Quantum Photonics Laboratory, and member of the RLE. The research is published today in Nature Communications.

    Multiplying with light

    An optical neural network is composed of many connected components that function like reprogrammable, tunable mirrors. These tunable mirrors are called Mach-Zehnder Inferometers (MZI). Neural network data are encoded into light, which is fired into the optical neural network from a laser.

    A typical MZI contains two mirrors and two beam splitters. Light enters the top of an MZI, where it is split into two parts which interfere with each other before being recombined by the second beam splitter and then reflected out the bottom to the next MZI in the array. Researchers can leverage the interference of these optical signals to perform complex linear algebra operations, known as matrix multiplication, which is how neural networks process data.

    But errors that can occur in each MZI quickly accumulate as light moves from one device to the next. One can avoid some errors by identifying them in advance and tuning the MZIs so earlier errors are cancelled out by later devices in the array.

    “It is a very simple algorithm if you know what the errors are. But these errors are notoriously difficult to ascertain because you only have access to the inputs and outputs of your chip,” says Hamerly. “This motivated us to look at whether it is possible to create calibration-free error correction.”

    Hamerly and his collaborators previously demonstrated a mathematical technique that went a step further. They could successfully infer the errors and correctly tune the MZIs accordingly, but even this didn’t remove all the error.

    Due to the fundamental nature of an MZI, there are instances where it is impossible to tune a device so all light flows out the bottom port to the next MZI. If the device loses a fraction of light at each step and the array is very large, by the end there will only be a tiny bit of power left.

    “Even with error correction, there is a fundamental limit to how good a chip can be. MZIs are physically unable to realize certain settings they need to be configured to,” he says.

    So, the team developed a new type of MZI. The researchers added an additional beam splitter to the end of the device, calling it a 3-MZI because it has three beam splitters instead of two. Due to the way this additional beam splitter mixes the light, it becomes much easier for an MZI to reach the setting it needs to send all light from out through its bottom port.

    Importantly, the additional beam splitter is only a few micrometers in size and is a passive component, so it doesn’t require any extra wiring. Adding additional beam splitters doesn’t significantly change the size of the chip.

    Bigger chip, fewer errors

    When the researchers conducted simulations to test their architecture, they found that it can eliminate much of the uncorrectable error that hampers accuracy. And as the optical neural network becomes larger, the amount of error in the device actually drops — the opposite of what happens in a device with standard MZIs.

    Using 3-MZIs, they could potentially create a device big enough for commercial uses with error that has been reduced by a factor of 20, Hamerly says.

    The researchers also developed a variant of the MZI design specifically for correlated errors. These occur due to manufacturing imperfections — if the thickness of a chip is slightly wrong, the MZIs may all be off by about the same amount, so the errors are all about the same. They found a way to change the configuration of an MZI to make it robust to these types of errors. This technique also increased the bandwidth of the optical neural network so it can run three times faster.

    Now that they have showcased these techniques using simulations, Hamerly and his collaborators plan to test these approaches on physical hardware and continue driving toward an optical neural network they can effectively deploy in the real world.

    This research is funded, in part, by a National Science Foundation graduate research fellowship and the U.S. Air Force Office of Scientific Research. More

  • in

    A far-sighted approach to machine learning

    Picture two teams squaring off on a football field. The players can cooperate to achieve an objective, and compete against other players with conflicting interests. That’s how the game works.

    Creating artificial intelligence agents that can learn to compete and cooperate as effectively as humans remains a thorny problem. A key challenge is enabling AI agents to anticipate future behaviors of other agents when they are all learning simultaneously.

    Because of the complexity of this problem, current approaches tend to be myopic; the agents can only guess the next few moves of their teammates or competitors, which leads to poor performance in the long run. 

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a new approach that gives AI agents a farsighted perspective. Their machine-learning framework enables cooperative or competitive AI agents to consider what other agents will do as time approaches infinity, not just over a few next steps. The agents then adapt their behaviors accordingly to influence other agents’ future behaviors and arrive at an optimal, long-term solution.

    This framework could be used by a group of autonomous drones working together to find a lost hiker in a thick forest, or by self-driving cars that strive to keep passengers safe by anticipating future moves of other vehicles driving on a busy highway.

    “When AI agents are cooperating or competing, what matters most is when their behaviors converge at some point in the future. There are a lot of transient behaviors along the way that don’t matter very much in the long run. Reaching this converged behavior is what we really care about, and we now have a mathematical way to enable that,” says Dong-Ki Kim, a graduate student in the MIT Laboratory for Information and Decision Systems (LIDS) and lead author of a paper describing this framework.

    The senior author is Jonathan P. How, the Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Co-authors include others at the MIT-IBM Watson AI Lab, IBM Research, Mila-Quebec Artificial Intelligence Institute, and Oxford University. The research will be presented at the Conference on Neural Information Processing Systems.

    Play video

    In this demo video, the red robot, which has been trained using the researchers’ machine-learning system, is able to defeat the green robot by learning more effective behaviors that take advantage of the constantly changing strategy of its opponent.

    More agents, more problems

    The researchers focused on a problem known as multiagent reinforcement learning. Reinforcement learning is a form of machine learning in which an AI agent learns by trial and error. Researchers give the agent a reward for “good” behaviors that help it achieve a goal. The agent adapts its behavior to maximize that reward until it eventually becomes an expert at a task.

    But when many cooperative or competing agents are simultaneously learning, things become increasingly complex. As agents consider more future steps of their fellow agents, and how their own behavior influences others, the problem soon requires far too much computational power to solve efficiently. This is why other approaches only focus on the short term.

    “The AIs really want to think about the end of the game, but they don’t know when the game will end. They need to think about how to keep adapting their behavior into infinity so they can win at some far time in the future. Our paper essentially proposes a new objective that enables an AI to think about infinity,” says Kim.

    But since it is impossible to plug infinity into an algorithm, the researchers designed their system so agents focus on a future point where their behavior will converge with that of other agents, known as equilibrium. An equilibrium point determines the long-term performance of agents, and multiple equilibria can exist in a multiagent scenario. Therefore, an effective agent actively influences the future behaviors of other agents in such a way that they reach a desirable equilibrium from the agent’s perspective. If all agents influence each other, they converge to a general concept that the researchers call an “active equilibrium.”

    The machine-learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing acTive influence witH averagE Reward), enables agents to learn how to adapt their behaviors as they interact with other agents to achieve this active equilibrium.

    FURTHER does this using two machine-learning modules. The first, an inference module, enables an agent to guess the future behaviors of other agents and the learning algorithms they use, based solely on their prior actions.

    This information is fed into the reinforcement learning module, which the agent uses to adapt its behavior and influence other agents in a way that maximizes its reward.

    “The challenge was thinking about infinity. We had to use a lot of different mathematical tools to enable that, and make some assumptions to get it to work in practice,” Kim says.

    Winning in the long run

    They tested their approach against other multiagent reinforcement learning frameworks in several different scenarios, including a pair of robots fighting sumo-style and a battle pitting two 25-agent teams against one another. In both instances, the AI agents using FURTHER won the games more often.

    Since their approach is decentralized, which means the agents learn to win the games independently, it is also more scalable than other methods that require a central computer to control the agents, Kim explains.

    The researchers used games to test their approach, but FURTHER could be used to tackle any kind of multiagent problem. For instance, it could be applied by economists seeking to develop sound policy in situations where many interacting entitles have behaviors and interests that change over time.

    Economics is one application Kim is particularly excited about studying. He also wants to dig deeper into the concept of an active equilibrium and continue enhancing the FURTHER framework.

    This research is funded, in part, by the MIT-IBM Watson AI Lab. More

  • in

    Methane research takes on new urgency at MIT

    One of the most notable climate change provisions in the 2022 Inflation Reduction Act is the first U.S. federal tax on a greenhouse gas (GHG). That the fee targets methane (CH4), rather than carbon dioxide (CO2), emissions is indicative of the urgency the scientific community has placed on reducing this short-lived but powerful gas. Methane persists in the air about 12 years — compared to more than 1,000 years for CO2 — yet it immediately causes about 120 times more warming upon release. The gas is responsible for at least a quarter of today’s gross warming. 

    “Methane has a disproportionate effect on near-term warming,” says Desiree Plata, the director of MIT Methane Network. “CH4 does more damage than CO2 no matter how long you run the clock. By removing methane, we could potentially avoid critical climate tipping points.” 

    Because GHGs have a runaway effect on climate, reductions made now will have a far greater impact than the same reductions made in the future. Cutting methane emissions will slow the thawing of permafrost, which could otherwise lead to massive methane releases, as well as reduce increasing emissions from wetlands.  

    “The goal of MIT Methane Network is to reduce methane emissions by 45 percent by 2030, which would save up to 0.5 degree C of warming by 2100,” says Plata, an associate professor of civil and environmental engineering at MIT and director of the Plata Lab. “When you consider that governments are trying for a 1.5-degree reduction of all GHGs by 2100, this is a big deal.” 

    Under normal concentrations, methane, like CO2, poses no health risks. Yet methane assists in the creation of high levels of ozone. In the lower atmosphere, ozone is a key component of air pollution, which leads to “higher rates of asthma and increased emergency room visits,” says Plata. 

    Methane-related projects at the Plata Lab include a filter made of zeolite — the same clay-like material used in cat litter — designed to convert methane into CO2 at dairy farms and coal mines. At first glance, the technology would appear to be a bit of a hard sell, since it converts one GHG into another. Yet the zeolite filter’s low carbon and dollar costs, combined with the disproportionate warming impact of methane, make it a potential game-changer.

    The sense of urgency about methane has been amplified by recent studies that show humans are generating far more methane emissions than previously estimated, and that the rates are rising rapidly. Exactly how much methane is in the air is uncertain. Current methods for measuring atmospheric methane, such as ground, drone, and satellite sensors, “are not readily abundant and do not always agree with each other,” says Plata.  

    The Plata Lab is collaborating with Tim Swager in the MIT Department of Chemistry to develop low-cost methane sensors. “We are developing chemiresisitive sensors that cost about a dollar that you could place near energy infrastructure to back-calculate where leaks are coming from,” says Plata.  

    The researchers are working on improving the accuracy of the sensors using machine learning techniques and are planning to integrate internet-of-things technology to transmit alerts. Plata and Swager are not alone in focusing on data collection: the Inflation Reduction Act adds significant funding for methane sensor research. 

    Other research at the Plata Lab includes the development of nanomaterials and heterogeneous catalysis techniques for environmental applications. The lab also explores mitigation solutions for industrial waste, particularly those related to the energy transition. Plata is the co-founder of an lithium-ion battery recycling startup called Nth Cycle. 

    On a more fundamental level, the Plata Lab is exploring how to develop products with environmental and social sustainability in mind. “Our overarching mission is to change the way that we invent materials and processes so that environmental objectives are incorporated along with traditional performance and cost metrics,” says Plata. “It is important to do that rigorous assessment early in the design process.”

    Play video

    MIT amps up methane research 

    The MIT Methane Network brings together 26 researchers from MIT along with representatives of other institutions “that are dedicated to the idea that we can reduce methane levels in our lifetime,” says Plata. The organization supports research such as Plata’s zeolite and sensor projects, as well as designing pipeline-fixing robots, developing methane-based fuels for clean hydrogen, and researching the capture and conversion of methane into liquid chemical precursors for pharmaceuticals and plastics. Other members are researching policies to encourage more sustainable agriculture and land use, as well as methane-related social justice initiatives. 

    “Methane is an especially difficult problem because it comes from all over the place,” says Plata. A recent Global Carbon Project study estimated that half of methane emissions are caused by humans. This is led by waste and agriculture (28 percent), including cow and sheep belching, rice paddies, and landfills.  

    Fossil fuels represent 18 percent of the total budget. Of this, about 63 percent is derived from oil and gas production and pipelines, 33 percent from coal mining activities, and 5 percent from industry and transportation. Human-caused biomass burning, primarily from slash-and-burn agriculture, emits about 4 percent of the global total.  

    The other half of the methane budget includes natural methane emissions from wetlands (20 percent) and other natural sources (30 percent). The latter includes permafrost melting and natural biomass burning, such as forest fires started by lightning.  

    With increases in global warming and population, the line between anthropogenic and natural causes is getting fuzzier. “Human activities are accelerating natural emissions,” says Plata. “Climate change increases the release of methane from wetlands and permafrost and leads to larger forest and peat fires.”  

    The calculations can get complicated. For example, wetlands provide benefits from CO2 capture, biological diversity, and sea level rise resiliency that more than compensate for methane releases. Meanwhile, draining swamps for development increases emissions. 

    Over 100 nations have signed onto the U.N.’s Global Methane Pledge to reduce at least 30 percent of anthropogenic emissions within the next 10 years. The U.N. report estimates that this goal can be achieved using proven technologies and that about 60 percent of these reductions can be accomplished at low cost. 

    Much of the savings would come from greater efficiencies in fossil fuel extraction, processing, and delivery. The methane fees in the Inflation Reduction Act are primarily focused on encouraging fossil fuel companies to accelerate ongoing efforts to cap old wells, flare off excess emissions, and tighten pipeline connections.  

    Fossil fuel companies have already made far greater pledges to reduce methane than they have with CO2, which is central to their business. This is due, in part, to the potential savings, as well as in preparation for methane regulations expected from the Environmental Protection Agency in late 2022. The regulations build upon existing EPA oversight of drilling operations, and will likely be exempt from the U.S. Supreme Court’s ruling that limits the federal government’s ability to regulate GHGs. 

    Zeolite filter targets methane in dairy and coal 

    The “low-hanging fruit” of gas stream mitigation addresses most of the 20 percent of total methane emissions in which the gas is released in sufficiently high concentrations for flaring. Plata’s zeolite filter aims to address the thornier challenge of reducing the 80 percent of non-flammable dilute emissions. 

    Plata found inspiration in decades-old catalysis research for turning methane into methanol. One strategy has been to use an abundant, low-cost aluminosilicate clay called zeolite.  

    “The methanol creation process is challenging because you need to separate a liquid, and it has very low efficiency,” says Plata. “Yet zeolite can be very efficient at converting methane into CO2, and it is much easier because it does not require liquid separation. Converting methane to CO2 sounds like a bad thing, but there is a major anti-warming benefit. And because methane is much more dilute than CO2, the relative CO2 contribution is minuscule.”  

    Using zeolite to create methanol requires highly concentrated methane, high temperatures and pressures, and industrial processing conditions. Yet Plata’s process, which dopes the zeolite with copper, operates in the presence of oxygen at much lower temperatures under typical pressures. “We let the methane proceed the way it wants from a thermodynamic perspective from methane to methanol down to CO2,” says Plata. 

    Researchers around the world are working on other dilute methane removal technologies. Projects include spraying iron salt aerosols into sea air where they react with natural chlorine or bromine radicals, thereby capturing methane. Most of these geoengineering solutions, however, are difficult to measure and would require massive scale to make a difference.  

    Plata is focusing her zeolite filters on environments where concentrations are high, but not so high as to be flammable. “We are trying to scale zeolite into filters that you could snap onto the side of a cross-ventilation fan in a dairy barn or in a ventilation air shaft in a coal mine,” says Plata. “For every packet of air we bring in, we take a lot of methane out, so we get more bang for our buck.”  

    The major challenge is creating a filter that can handle high flow rates without getting clogged or falling apart. Dairy barn air handlers can push air at up to 5,000 cubic feet per minute and coal mine handlers can approach 500,000 CFM. 

    Plata is exploring engineering options including fluidized bed reactors with floating catalyst particles. Another filter solution, based in part on catalytic converters, features “higher-order geometric structures where you have a porous material with a long path length where the gas can interact with the catalyst,” says Plata. “This avoids the challenge with fluidized beds of containing catalyst particles in the reactor. Instead, they are fixed within a structured material.”  

    Competing technologies for removing methane from mine shafts “operate at temperatures of 1,000 to 1,200 degrees C, requiring a lot of energy and risking explosion,” says Plata. “Our technology avoids safety concerns by operating at 300 to 400 degrees C. It reduces energy use and provides more tractable deployment costs.” 

    Potentially, energy and dollar costs could be further reduced in coal mines by capturing the heat generated by the conversion process. “In coal mines, you have enrichments above a half-percent methane, but below the 4 percent flammability threshold,” says Plata. “The excess heat from the process could be used to generate electricity using off-the-shelf converters.” 

    Plata’s dairy barn research is funded by the Gerstner Family Foundation and the coal mining project by the U.S. Department of Energy. “The DOE would like us to spin out the technology for scale-up within three years,” says Plata. “We cannot guarantee we will hit that goal, but we are trying to develop this as quickly as possible. Our society needs to start reducing methane emissions now.”  More