More stories

  • in

    Large language models help decipher clinical notes

    Electronic health records (EHRs) need a new public relations manager. Ten years ago, the U.S. government passed a law that required hospitals to digitize their health records with the intent of improving and streamlining care. The enormous amount of information in these now-digital records could be used to answer very specific questions beyond the scope of clinical trials: What’s the right dose of this medication for patients with this height and weight? What about patients with a specific genomic profile?

    Unfortunately, most of the data that could answer these questions is trapped in doctor’s notes, full of jargon and abbreviations. These notes are hard for computers to understand using current techniques — extracting information requires training multiple machine learning models. Models trained for one hospital, also, don’t work well at others, and training each model requires domain experts to label lots of data, a time-consuming and expensive process. 

    An ideal system would use a single model that can extract many types of information, work well at multiple hospitals, and learn from a small amount of labeled data. But how? Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believed that to disentangle the data, they needed to call on something bigger: large language models. To pull that important medical information, they used a very big, GPT-3 style model to do tasks like expand overloaded jargon and acronyms and extract medication regimens. 

    For example, the system takes an input, which in this case is a clinical note, “prompts” the model with a question about the note, such as “expand this abbreviation, C-T-A.” The system returns an output such as “clear to auscultation,” as opposed to say, a CT angiography. The objective of extracting this clean data, the team says, is to eventually enable more personalized clinical recommendations. 

    Medical data is, understandably, a pretty tricky resource to navigate freely. There’s plenty of red tape around using public resources for testing the performance of large models because of data use restrictions, so the team decided to scrape together their own. Using a set of short, publicly available clinical snippets, they cobbled together a small dataset to enable evaluation of the extraction performance of large language models. 

    “It’s challenging to develop a single general-purpose clinical natural language processing system that will solve everyone’s needs and be robust to the huge variation seen across health datasets. As a result, until today, most clinical notes are not used in downstream analyses or for live decision support in electronic health records. These large language model approaches could potentially transform clinical natural language processing,” says David Sontag, MIT professor of electrical engineering and computer science, principal investigator in CSAIL and the Institute for Medical Engineering and Science, and supervising author on a paper about the work, which will be presented at the Conference on Empirical Methods in Natural Language Processing. “The research team’s advances in zero-shot clinical information extraction makes scaling possible. Even if you have hundreds of different use cases, no problem — you can build each model with a few minutes of work, versus having to label a ton of data for that particular task.”

    For example, without any labels at all, the researchers found these models could achieve 86 percent accuracy at expanding overloaded acronyms, and the team developed additional methods to boost this further to 90 percent accuracy, with still no labels required.

    Imprisoned in an EHR 

    Experts have been steadily building up large language models (LLMs) for quite some time, but they burst onto the mainstream with GPT-3’s widely covered ability to complete sentences. These LLMs are trained on a huge amount of text from the internet to finish sentences and predict the next most likely word. 

    While previous, smaller models like earlier GPT iterations or BERT have pulled off a good performance for extracting medical data, they still require substantial manual data-labeling effort. 

    For example, a note, “pt will dc vanco due to n/v” means that this patient (pt) was taking the antibiotic vancomycin (vanco) but experienced nausea and vomiting (n/v) severe enough for the care team to discontinue (dc) the medication. The team’s research avoids the status quo of training separate machine learning models for each task (extracting medication, side effects from the record, disambiguating common abbreviations, etc). In addition to expanding abbreviations, they investigated four other tasks, including if the models could parse clinical trials and extract detail-rich medication regimens.  

    “Prior work has shown that these models are sensitive to the prompt’s precise phrasing. Part of our technical contribution is a way to format the prompt so that the model gives you outputs in the correct format,” says Hunter Lang, CSAIL PhD student and author on the paper. “For these extraction problems, there are structured output spaces. The output space is not just a string. It can be a list. It can be a quote from the original input. So there’s more structure than just free text. Part of our research contribution is encouraging the model to give you an output with the correct structure. That significantly cuts down on post-processing time.”

    The approach can’t be applied to out-of-the-box health data at a hospital: that requires sending private patient information across the open internet to an LLM provider like OpenAI. The authors showed that it’s possible to work around this by distilling the model into a smaller one that could be used on-site.

    The model — sometimes just like humans — is not always beholden to the truth. Here’s what a potential problem might look like: Let’s say you’re asking the reason why someone took medication. Without proper guardrails and checks, the model might just output the most common reason for that medication, if nothing is explicitly mentioned in the note. This led to the team’s efforts to force the model to extract more quotes from data and less free text.

    Future work for the team includes extending to languages other than English, creating additional methods for quantifying uncertainty in the model, and pulling off similar results with open-sourced models. 

    “Clinical information buried in unstructured clinical notes has unique challenges compared to general domain text mostly due to large use of acronyms, and inconsistent textual patterns used across different health care facilities,” says Sadid Hasan, AI lead at Microsoft and former executive director of AI at CVS Health, who was not involved in the research. “To this end, this work sets forth an interesting paradigm of leveraging the power of general domain large language models for several important zero-/few-shot clinical NLP tasks. Specifically, the proposed guided prompt design of LLMs to generate more structured outputs could lead to further developing smaller deployable models by iteratively utilizing the model generated pseudo-labels.”

    “AI has accelerated in the last five years to the point at which these large models can predict contextualized recommendations with benefits rippling out across a variety of domains such as suggesting novel drug formulations, understanding unstructured text, code recommendations or create works of art inspired by any number of human artists or styles,” says Parminder Bhatia, who was formerly Head of Machine Learning at AWS Health AI and is currently Head of ML for low-code applications leveraging large language models at AWS AI Labs. “One of the applications of these large models [the team has] recently launched is Amazon CodeWhisperer, which is [an] ML-powered coding companion that helps developers in building applications.”

    As part of the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, Agrawal, Sontag, and Lang wrote the paper alongside Yoon Kim, MIT assistant professor and CSAIL principal investigator, and Stefan Hegselmann, a visiting PhD student from the University of Muenster. First-author Agrawal’s research was supported by a Takeda Fellowship, the MIT Deshpande Center for Technological Innovation, and the MLA@CSAIL Initiatives. More

  • in

    Communications system achieves fastest laser link from space yet

    In May 2022, the TeraByte InfraRed Delivery (TBIRD) payload onboard a small CubeSat satellite was launched into orbit 300 miles above Earth’s surface. Since then, TBIRD has delivered terabytes of data at record-breaking rates of up to 100 gigabits per second — 100 times faster than the fastest internet speeds in most cities — via an optical communication link to a ground-based receiver in California. This data rate is more than 1,000 times higher than that of the radio-frequency links traditionally used for satellite communication and the highest ever achieved by a laser link from space to ground. And these record-setting speeds were all made possible by a communications payload roughly the size of a tissue box.

    MIT Lincoln Laboratory conceptualized the TBIRD mission in 2014 as a means of providing unprecedented capability to science missions at low cost. Science instruments in space today routinely generate more data than can be returned to Earth over typical space-to-ground communications links. With small, low-cost space and ground terminals, TBIRD can enable scientists from around the world to fully take advantage of laser communications to downlink all the data they could ever dream of.

    Designed and built at Lincoln Laboratory, the TBIRD communications payload was integrated onto a CubeSat manufactured by Terran Orbital as part of NASA’s Pathfinder Technology Demonstrator program. NASA Ames Research Center established this program to develop a CubeSat bus (the “vehicle” that powers and steers the payload) for bringing science and technology demonstrators into orbit more quickly and inexpensively. Weighing approximately 25 pounds and the size of two stacked cereal boxes, the CubeSat was launched into low-Earth orbit (LEO) aboard Space X’s Transporter-5 rideshare mission from Cape Canaveral Space Force Station in Florida in May 2022. The optical ground station is located in Table Mountain, California, where most weather takes place below the mountain’s summit, making this part of the sky relatively clear for laser communication. This ground station leverages the one-meter telescope and adaptive optics (to correct for distortions caused by atmospheric turbulence) at the NASA Jet Propulsion Laboratory Optical Communications Telescope Laboratory, with Lincoln Laboratory providing the TBIRD-specific ground communications hardware.

    “We’ve demonstrated a higher data rate than ever before in a smaller package than ever before,” says Jade Wang, the laboratory’s program manager for the TBIRD payload and ground communications and assistant leader of the Optical and Quantum Communications Technology Group. “While sending data from space using lasers may sound futuristic, the same technical concept is behind the fiber-optic internet we use every day. The difference is that the laser transmissions are taking place in the open atmosphere, rather than in contained fibers.”

    From radio waves to laser light

    Whether video conferencing, gaming, or streaming movies in high definition, you are using high-data-rate links that run across optical fibers made of glass (or sometimes plastic). About the diameter of a strand of human hair, these fibers are bundled into cables, which transmit data via fast-traveling pulses of light from a laser or other source. Fiber-optic communications are paramount to the internet age, in which large amounts of data must be quickly and reliably distributed across the globe every day.

    For satellites, however, a high-speed internet based on laser communications does not yet exist. Since the beginning of spaceflight in the 1950s, missions have relied on radio frequencies to send data to and from space. Compared to radio waves, the infrared light employed in laser communications has a much higher frequency (or shorter wavelength), which allows more data to be packed into each transmission. Laser communications will enable scientists to send 100 to 1,000 times more data than today’s radio-frequency systems — akin to our terrestrial switch from dial-up to high-speed internet.

    From Earth observation to space exploration, many science missions will benefit from this speedup, especially as instrument capabilities advance to capture larger troves of high-resolution data, experiments involve more remote control, and spacecraft voyage further from Earth into deep space.  

    However, laser-based space communication comes with several engineering challenges. Unlike radio waves, laser light forms a narrow beam. For successful data transmission, this narrow beam must be pointed precisely toward a receiver (e.g., telescope) located on the ground. And though laser light can travel long distances in space, laser beams can be distorted because of atmospheric effects and weather conditions. This distortion causes the beam to experience power loss, which can result in data loss.

    For the past 40 years, Lincoln Laboratory been tackling these and related challenges through various programs. At this point, these challenges have been reliably solved, and laser communications is rapidly becoming widely adopted. Industry has begun a proliferation of LEO cross-links using laser communications, with the intent to enhance the existing terrestrial backbone, as well as to provide a potential internet backbone to serve users in rural locations. Last year, NASA launched the Laser Communications Relay Demonstration (LCRD), a two-way optical communications system based on a laboratory design. In upcoming missions, a laboratory-developed laser communications terminal will be launched to the International Space Station, where the terminal will “talk” to LCRD, and support Artemis II, a crewed program that will fly by the moon in advance of a future crewed lunar landing.

    “With the expanding interest and development in space-based laser communications, Lincoln Laboratory continues to push the envelope of what is possible,” says Wang. “TBIRD heralds a new approach with the potential to further increase data rate capabilities; shrink size, weight, and power; and reduce lasercom mission costs.”

    One way that TBIRD aims to reduce these costs is by utilizing commercial off-the-shelf components originally developed for terrestrial fiber-optic networks. However, terrestrial components are not designed to survive the rigors of space, and their operation can be impacted by atmospheric effects. With TBIRD, the laboratory developed solutions to both challenges.

    Commercial components adapted for space

    The TBIRD payload integrates three key commercial off-the-shelf components: a high-rate optical modem, a large high-speed storage drive, and an optical signal amplifier.

    All these hardware components underwent shock and vibration, thermal-vacuum, and radiation testing to inform how the hardware might fare in space, where it would be subject to powerful forces, extreme temperatures, and high radiation levels. When the team first tested the amplifier through a thermal test simulating the space environment, the fibers melted. As Wang explains, in vacuum, no atmosphere exists, so heat gets trapped and cannot be released by convection. The team worked with the vendor to modify the amplifier to release heat through conduction instead.

    To deal with data loss from atmospheric effects, the laboratory developed its own version of Automatic Repeat Request (ARQ), a protocol for controlling errors in data transmission over a communications link. With ARQ, the receiver (in this case, the ground terminal) alerts the sender (satellite) through a low-rate uplink signal to re-transmit any block of data (frame) that has been lost or damaged.

    “If the signal drops out, data can be re-transmitted, but if done inefficiently — meaning you spend all your time sending repeat data instead of new data — you can lose a lot of throughput,” explains TBIRD system engineer Curt Schieler, a technical staff member in Wang’s group. “With our ARQ protocol, the receiver tells the payload which frames it received correctly, so the payload knows which ones to re-transmit.”

    Another aspect of TBIRD that is new is its lack of a gimbal, a mechanism for pointing the narrow laser beam. Instead, TBIRD relies on a laboratory-developed error-signaling concept for precision body pointing of the spacecraft. Error signals are provided to the CubeSat bus so it knows how exactly to point the body of the entire satellite toward the ground station. Without a gimbal, the payload can be even further miniaturized.

    “We intended to demonstrate a low-cost technology capable of quickly downlinking a large volume of data from LEO to Earth, in support of science missions,” says Wang. “In just a few weeks of operations, we have already accomplished this goal, achieving unprecedented transmission rates of up to 100 gigabits per second. Next, we plan to exercise additional features of the TBIRD system, including increasing rates to 200 gigabits per second, enabling the downlink of more than 2 terabytes of data — equivalent to 1,000 high-definition movies — in a single five-minute pass over a ground station.”

    Lincoln Laboratory developed the TBIRD mission and technology in partnership with NASA Goddard Space Flight Center. More

  • in

    3 Questions: Why cybersecurity is on the agenda for corporate boards of directors

    Organizations of every size and in every industry are vulnerable to cybersecurity risks — a dynamic landscape of threats and vulnerabilities and a corresponding overload of possible mitigating controls. MIT Senior Lecturer Keri Pearlson, who is also the executive director of the research consortium Cybersecurity at MIT Sloan (CAMS) and an instructor for the new MIT Sloan Executive Education course Cybersecurity Governance for the Board of Directors, knows how business can get ahead of this risk. Here, she describes the current threat and explores how boards can mitigate their risk against cybercrime.

    Q: What does the current state of cyberattacks mean for businesses in 2023?

    A: Last year we were discussing how the pandemic heightened fear, uncertainty, doubt and chaos, opening new doors for malicious actors to do their cyber mischief in our organizations and our families. We saw an increase in ransomware and other cyber attacks, and we saw an increase in concern from operating executives and board of directors wondering how to keep the organization secure. Since then, we have seen a continued escalation of cyber incidents, many of which no longer make the headlines unless they are wildly unique, damaging, or different than previous incidents. For every new technology that cybersecurity professionals invent, it’s only a matter of time until malicious actors find a way around it. New leadership approaches are needed for 2023 as we move into the next phase of securing our organizations.

    In great part, this means ensuring deep cybersecurity competencies on our boards of directors. Cyber risk is so significant that a responsible board can no longer ignore it or just delegate it to risk management experts. In fact, an organization’s board of directors holds a uniquely vital role in safeguarding data and systems for the future because of their fiduciary responsibility to shareholders and their responsibility to oversee and mitigate business risk.

    As these cyber threats increase, and as companies bolster their cybersecurity budgets accordingly, the regulatory community is also advancing new requirements of companies. In March of this year, the SEC issued a proposed rule titled Cybersecurity Risk Management, Strategy, Governance, and Incident Disclosure. In it, the SEC describes its intention to require public companies to disclose whether their boards have members with cybersecurity expertise. Specifically, registrants will be required to disclose whether the entire board, a specific board member, or a board committee is responsible for the oversight of cyber risks; the processes by which the board is informed about cyber risks, and the frequency of its discussions on this topic; and whether and how the board or specified board committee considers cyber risks as part of its business strategy, risk management, and financial oversight.

    Q: How can boards help their organizations mitigate cyber risk?

    A: According to the studies I’ve conducted with my CAMS colleagues, most organizations focus on cyber protection rather than cyber resilience, and we believe that is a mistake. A company that invests only in protection is not managing the risk associated with getting up and running again in the event of a cyber incident, and they are not going to be able to respond appropriately to new regulations, either. Resiliency means having a practical plan for recovery and business continuation.

    Certainly, protection is part of the resilience equation, but if the pandemic taught us anything, it taught us that resilience is the ability to weather an attack and recover quickly with minimal impact to our operations. The ultimate goal of a cyber-resilient organization would be zero disruption from a cyber breach — no impact on operations, finances, technologies, supply chain or reputation. Board members should ask, What would it take for this to be the case? And they should ensure that executives and managers have made proper and appropriate preparations to respond and recover.

    Being a knowledgeable board member does not mean becoming a cybersecurity expert, but it does mean understanding basic concepts, risks, frameworks, and approaches. And it means having the ability to assess whether management appropriately comprehends related threats, has an appropriate cyber strategy, and can measure its effectiveness. Board members today require focused training on these critical areas to carry out their mission. Unfortunately, many enterprises fail to leverage their boards of directors in this capacity or prepare board members to actively contribute to strategy, protocols, and emergency action plans.

    Alongside my CAMS colleagues Stuart Madnick and Kevin Powers, I’m teaching a new  MIT Sloan Executive Education course, Cybersecurity Governance for the Board of Directors, designed to help organizations and their boards get up to speed. Participants will explore the board’s role in cybersecurity, as well as breach planning, response, and mitigation. And we will discuss the impact and requirements of the many new regulations coming forward, not just from the SEC, but also White House, Congress, and most states and countries around the world, which are imposing more high-level responsibilities on companies.

    Q: What are some examples of how companies, and specifically boards of directors, have successfully upped their cybersecurity game?

    A: To ensure boardroom skills reflect the patterns of the marketplace, companies such as FedEx, Hasbro, PNC, and UPS have transformed their approach to governing cyber risk, starting with board cyber expertise. In companies like these, building resiliency started with a clear plan — from the boardroom — built on business and economic analysis.

    In one company we looked at, the CEO realized his board was not well versed in the business context or financial exposure risk from a cyber attack, so he hired a third-party consulting firm to conduct a cybersecurity maturity assessment. The company CISO presented the results of the report to the enterprise risk management subcommittee, creating a productive dialogue around the business and financial impact of different investments in cybersecurity.  

    Another organization focused their board on the alignment of their cybersecurity program and operational risk. The CISO, chief risk officer, and board collaborated to understand the exposure of the organization from a risk perspective, resulting in optimizing their cyber insurance policy to mitigate the newly understood risk.

    One important takeaway from these examples is the importance of using the language of risk, resiliency, and reputation to bridge the gaps between technical cybersecurity needs and the oversight responsibilities executed by boards. Boards need to understand the financial exposure resulting from cyber risk, not just the technical components typically found in cyber presentations.

    Cyber risk is not going away. It’s escalating and becoming more sophisticated every day. Getting your board “on board” is key to meeting new guidelines, providing sufficient oversight to cybersecurity plans, and making organizations more resilient. More

  • in

    Busy GPUs: Sampling and pipelining method speeds up deep learning on large graphs

    Graphs, a potentially extensive web of nodes connected by edges, can be used to express and interrogate relationships between data, like social connections, financial transactions, traffic, energy grids, and molecular interactions. As researchers collect more data and build out these graphical pictures, researchers will need faster and more efficient methods, as well as more computational power, to conduct deep learning on them, in the way of graph neural networks (GNN).  

    Now, a new method, called SALIENT (SAmpling, sLIcing, and data movemeNT), developed by researchers at MIT and IBM Research, improves the training and inference performance by addressing three key bottlenecks in computation. This dramatically cuts down on the runtime of GNNs on large datasets, which, for example, contain on the scale of 100 million nodes and 1 billion edges. Further, the team found that the technique scales well when computational power is added from one to 16 graphical processing units (GPUs). The work was presented at the Fifth Conference on Machine Learning and Systems.

    “We started to look at the challenges current systems experienced when scaling state-of-the-art machine learning techniques for graphs to really big datasets. It turned out there was a lot of work to be done, because a lot of the existing systems were achieving good performance primarily on smaller datasets that fit into GPU memory,” says Tim Kaler, the lead author and a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

    By vast datasets, experts mean scales like the entire Bitcoin network, where certain patterns and data relationships could spell out trends or foul play. “There are nearly a billion Bitcoin transactions on the blockchain, and if we want to identify illicit activities inside such a joint network, then we are facing a graph of such a scale,” says co-author Jie Chen, senior research scientist and manager of IBM Research and the MIT-IBM Watson AI Lab. “We want to build a system that is able to handle that kind of graph and allows processing to be as efficient as possible, because every day we want to keep up with the pace of the new data that are generated.”

    Kaler and Chen’s co-authors include Nickolas Stathas MEng ’21 of Jump Trading, who developed SALIENT as part of his graduate work; former MIT-IBM Watson AI Lab intern and MIT graduate student Anne Ouyang; MIT CSAIL postdoc Alexandros-Stavros Iliopoulos; MIT CSAIL Research Scientist Tao B. Schardl; and Charles E. Leiserson, the Edwin Sibley Webster Professor of Electrical Engineering at MIT and a researcher with the MIT-IBM Watson AI Lab.     

    For this problem, the team took a systems-oriented approach in developing their method: SALIENT, says Kaler. To do this, the researchers implemented what they saw as important, basic optimizations of components that fit into existing machine-learning frameworks, such as PyTorch Geometric and the deep graph library (DGL), which are interfaces for building a machine-learning model. Stathas says the process is like swapping out engines to build a faster car. Their method was designed to fit into existing GNN architectures, so that domain experts could easily apply this work to their specified fields to expedite model training and tease out insights during inference faster. The trick, the team determined, was to keep all of the hardware (CPUs, data links, and GPUs) busy at all times: while the CPU samples the graph and prepares mini-batches of data that will then be transferred through the data link, the more critical GPU is working to train the machine-learning model or conduct inference. 

    The researchers began by analyzing the performance of a commonly used machine-learning library for GNNs (PyTorch Geometric), which showed a startlingly low utilization of available GPU resources. Applying simple optimizations, the researchers improved GPU utilization from 10 to 30 percent, resulting in a 1.4 to two times performance improvement relative to public benchmark codes. This fast baseline code could execute one complete pass over a large training dataset through the algorithm (an epoch) in 50.4 seconds.                          

    Seeking further performance improvements, the researchers set out to examine the bottlenecks that occur at the beginning of the data pipeline: the algorithms for graph sampling and mini-batch preparation. Unlike other neural networks, GNNs perform a neighborhood aggregation operation, which computes information about a node using information present in other nearby nodes in the graph — for example, in a social network graph, information from friends of friends of a user. As the number of layers in the GNN increase, the number of nodes the network has to reach out to for information can explode, exceeding the limits of a computer. Neighborhood sampling algorithms help by selecting a smaller random subset of nodes to gather; however, the researchers found that current implementations of this were too slow to keep up with the processing speed of modern GPUs. In response, they identified a mix of data structures, algorithmic optimizations, and so forth that improved sampling speed, ultimately improving the sampling operation alone by about three times, taking the per-epoch runtime from 50.4 to 34.6 seconds. They also found that sampling, at an appropriate rate, can be done during inference, improving overall energy efficiency and performance, a point that had been overlooked in the literature, the team notes.      

    In previous systems, this sampling step was a multi-process approach, creating extra data and unnecessary data movement between the processes. The researchers made their SALIENT method more nimble by creating a single process with lightweight threads that kept the data on the CPU in shared memory. Further, SALIENT takes advantage of a cache of modern processors, says Stathas, parallelizing feature slicing, which extracts relevant information from nodes of interest and their surrounding neighbors and edges, within the shared memory of the CPU core cache. This again reduced the overall per-epoch runtime from 34.6 to 27.8 seconds.

    The last bottleneck the researchers addressed was to pipeline mini-batch data transfers between the CPU and GPU using a prefetching step, which would prepare data just before it’s needed. The team calculated that this would maximize bandwidth usage in the data link and bring the method up to perfect utilization; however, they only saw around 90 percent. They identified and fixed a performance bug in a popular PyTorch library that caused unnecessary round-trip communications between the CPU and GPU. With this bug fixed, the team achieved a 16.5 second per-epoch runtime with SALIENT.

    “Our work showed, I think, that the devil is in the details,” says Kaler. “When you pay close attention to the details that impact performance when training a graph neural network, you can resolve a huge number of performance issues. With our solutions, we ended up being completely bottlenecked by GPU computation, which is the ideal goal of such a system.”

    SALIENT’s speed was evaluated on three standard datasets ogbn-arxiv, ogbn-products, and ogbn-papers100M, as well as in multi-machine settings, with different levels of fanout (amount of data that the CPU would prepare for the GPU), and across several architectures, including the most recent state-of-the-art one, GraphSAGE-RI. In each setting, SALIENT outperformed PyTorch Geometric, most notably on the large ogbn-papers100M dataset, containing 100 million nodes and over a billion edges Here, it was three times faster, running on one GPU, than the optimized baseline that was originally created for this work; with 16 GPUs, SALIENT was an additional eight times faster. 

    While other systems had slightly different hardware and experimental setups, so it wasn’t always a direct comparison, SALIENT still outperformed them. Among systems that achieved similar accuracy, representative performance numbers include 99 seconds using one GPU and 32 CPUs, and 13 seconds using 1,536 CPUs. In contrast, SALIENT’s runtime using one GPU and 20 CPUs was 16.5 seconds and was just two seconds with 16 GPUs and 320 CPUs. “If you look at the bottom-line numbers that prior work reports, our 16 GPU runtime (two seconds) is an order of magnitude faster than other numbers that have been reported previously on this dataset,” says Kaler. The researchers attributed their performance improvements, in part, to their approach of optimizing their code for a single machine before moving to the distributed setting. Stathas says that the lesson here is that for your money, “it makes more sense to use the hardware you have efficiently, and to its extreme, before you start scaling up to multiple computers,” which can provide significant savings on cost and carbon emissions that can come with model training.

    This new capacity will now allow researchers to tackle and dig deeper into bigger and bigger graphs. For example, the Bitcoin network that was mentioned earlier contained 100,000 nodes; the SALIENT system can capably handle a graph 1,000 times (or three orders of magnitude) larger.

    “In the future, we would be looking at not just running this graph neural network training system on the existing algorithms that we implemented for classifying or predicting the properties of each node, but we also want to do more in-depth tasks, such as identifying common patterns in a graph (subgraph patterns), [which] may be actually interesting for indicating financial crimes,” says Chen. “We also want to identify nodes in a graph that are similar in a sense that they possibly would be corresponding to the same bad actor in a financial crime. These tasks would require developing additional algorithms, and possibly also neural network architectures.”

    This research was supported by the MIT-IBM Watson AI Lab and in part by the U.S. Air Force Research Laboratory and the U.S. Air Force Artificial Intelligence Accelerator. More

  • in

    A breakthrough on “loss and damage,” but also disappointment, at UN climate conference

    As the 2022 United Nations climate change conference, known as COP27, stretched into its final hours on Saturday, Nov. 19, it was uncertain what kind of agreement might emerge from two weeks of intensive international negotiations.

    In the end, COP27 produced mixed results: on the one hand, a historic agreement for wealthy countries to compensate low-income countries for “loss and damage,” but on the other, limited progress on new plans for reducing the greenhouse gas emissions that are warming the planet.

    “We need to drastically reduce emissions now — and this is an issue this COP did not address,” said U.N. Secretary-General António Guterres in a statement at the conclusion of COP27. “A fund for loss and damage is essential — but it’s not an answer if the climate crisis washes a small island state off the map — or turns an entire African country to desert.”

    Throughout the two weeks of the conference, a delegation of MIT students, faculty, and staff was at the Sharm El-Sheikh International Convention Center to observe the negotiations, conduct and share research, participate in panel discussions, and forge new connections with researchers, policymakers, and advocates from around the world.

    Loss and damage

    A key issue coming in to COP27 (COP stands for “conference of the parties” to the U.N. Framework Convention on Climate Change, held for the 27th time) was loss and damage: a term used by the U.N. to refer to harms caused by climate change — either through acute catastrophes like extreme weather events or slower-moving impacts like sea level rise — to which communities and countries are unable to adapt. 

    Ultimately, a deal on loss and damage proved to be COP27’s most prominent accomplishment. Negotiators reached an eleventh-hour agreement to “establish new funding arrangements for assisting developing countries that are particularly vulnerable to the adverse effects of climate change.” 

    “Providing financial assistance to developing countries so they can better respond to climate-related loss and damage is not only a moral issue, but also a pragmatic one,” said Michael Mehling, deputy director of the MIT Center for Energy and Environmental Policy Research, who attended COP27 and participated in side events. “Future emissions growth will be squarely centered in the developing world, and offering support through different channels is key to building the trust needed for more robust global cooperation on mitigation.”

    Youssef Shaker, a graduate student in the MIT Technology and Policy Program and a research assistant with the MIT Energy Initiative, attended the second week of the conference, where he followed the negotiations over loss and damage closely. 

    “While the creation of a fund is certainly an achievement,” Shaker said, “significant questions remain to be answered, such as the size of the funding available as well as which countries receive access to it.” A loss-and-damage fund that is not adequately funded, Shaker noted, “would not be an impactful outcome.” 

    The agreement on loss and damage created a new committee, made up of 24 country representatives, to “operationalize” the new funding arrangements, including identifying funding sources. The committee is tasked with delivering a set of recommendations at COP28, which will take place next year in Dubai.

    Advising the U.N. on net zero

    Though the decisions reached at COP27 did not include major new commitments on reducing emissions from the combustion of fossil fuels, the transition to a clean global energy system was nevertheless a key topic of conversation throughout the conference.

    The Council of Engineers for the Energy Transition (CEET), an independent, international body of engineers and energy systems experts formed to provide advice to the U.N. on achieving net-zero emissions globally by 2050, convened for the first time at COP27. Jessika Trancik, a professor in the MIT Institute for Data, Systems, and Society and a member of CEET, spoke on a U.N.-sponsored panel on solutions for the transition to clean energy.

    Trancik noted that the energy transition will look different in different regions of the world. “As engineers, we need to understand those local contexts and design solutions around those local contexts — that’s absolutely essential to support a rapid and equitable energy transition.”

    At the same time, Trancik noted that there is now a set of “low-cost, ready-to-scale tools” available to every region — tools that resulted from a globally competitive process of innovation, stimulated by public policies in different countries, that dramatically drove down the costs of technologies like solar energy and lithium-ion batteries. The key, Trancik said, is for regional transition strategies to “tap into global processes of innovation.”

    Reinventing climate adaptation

    Elfatih Eltahir, the H. M. King Bhumibol Professor of Hydrology and Climate, traveled to COP27 to present plans for the Jameel Observatory Climate Resilience Early Warning System (CREWSnet), one of the five projects selected in April 2022 as a flagship in MIT’s Climate Grand Challenges initiative. CREWSnet focuses on climate adaptation, the term for adapting to climate impacts that are unavoidable.

    The aim of CREWSnet, Eltahir told the audience during a panel discussion, is “nothing short of reinventing the process of climate change adaptation,” so that it is proactive rather than reactive; community-led; data-driven and evidence-based; and so that it integrates different climate risks, from heat waves to sea level rise, rather than treating them individually.

    “However, it’s easy to talk about these changes,” said Eltahir. “The real challenge, which we are now just launching and engaging in, is to demonstrate that on the ground.” Eltahir said that early demonstrations will happen in a couple of key locations, including southwest Bangladesh, where multiple climate risks — rising sea levels, increasing soil salinity, and intensifying heat waves and cyclones — are combining to threaten the area’s agricultural production.

    Building on COP26

    Some members of MIT’s delegation attended COP27 to advance efforts that had been formally announced at last year’s U.N. climate conference, COP26, in Glasgow, Scotland.

    At an official U.N. side event co-organized by MIT on Nov. 11, Greg Sixt, the director of the Food and Climate Systems Transformation (FACT) Alliance led by the Abdul Latif Jameel Water and Food Systems Lab, provided an update on the alliance’s work since its launch at COP26.

    Food systems are a major source of greenhouse gas emissions — and are increasingly vulnerable to climate impacts. The FACT Alliance works to better connect researchers to farmers, food businesses, policymakers, and other food systems stakeholders to make food systems (which include food production, consumption, and waste) more sustainable and resilient. 

    Sixt told the audience that the FACT Alliance now counts over 20 research and stakeholder institutions around the world among its members, but also collaborates with other institutions in an “open network model” to advance work in key areas — such as a new research project exploring how climate scenarios could affect global food supply chains.

    Marcela Angel, research program director for the Environmental Solutions Initiative (ESI), helped convene a meeting at COP27 of the Afro-InterAmerican Forum on Climate Change, which also launched at COP26. The forum works with Afro-descendant leaders across the Americas to address significant environmental issues, including climate risks and biodiversity loss. 

    At the event — convened with the Colombian government and the nonprofit Conservation International — ESI brought together leaders from six countries in the Americas and presented recent work that estimates that there are over 178 million individuals who identify as Afro-descendant living in the Americas, in lands of global environmental importance. 

    “There is a significant overlap between biodiversity hot spots, protected areas, and areas of high Afro-descendant presence,” said Angel. “But the role and climate contributions of these communities is understudied, and often made invisible.”    

    Limiting methane emissions

    Methane is a short-lived but potent greenhouse gas: When released into the atmosphere, it immediately traps about 120 times more heat than carbon dioxide does. More than 150 countries have now signed the Global Methane Pledge, launched at COP26, which aims to reduce methane emissions by at least 30 percent by 2030 compared to 2020 levels.

    Sergey Paltsev, the deputy director of the Joint Program on the Science and Policy of Global Change and a senior research scientist at the MIT Energy Initiative, gave the keynote address at a Nov. 17 event on methane, where he noted the importance of methane reductions from the oil and gas sector to meeting the 2030 goal.

    “The oil and gas sector is where methane emissions reductions could be achieved the fastest,” said Paltsev. “We also need to employ an integrated approach to address methane emissions in all sectors and all regions of the world because methane emissions reductions provide a near-term pathway to avoiding dangerous tipping points in the global climate system.”

    “Keep fighting relentlessly”

    Arina Khotimsky, a senior majoring in materials science and engineering and a co-president of the MIT Energy and Climate Club, attended the first week of COP27. She reflected on the experience in a social media post after returning home. 

    “COP will always have its haters. Is there greenwashing? Of course! Is everyone who should have a say in this process in the room? Not even close,” wrote Khotimsky. “So what does it take for COP to matter? It takes everyone who attended to not only put ‘climate’ on front-page news for two weeks, but to return home and keep fighting relentlessly against climate change. I know that I will.” More

  • in

    Breaking the scaling limits of analog computing

    As machine-learning models become larger and more complex, they require faster and more energy-efficient hardware to perform computations. Conventional digital computers are struggling to keep up.

    An analog optical neural network could perform the same tasks as a digital one, such as image classification or speech recognition, but because computations are performed using light instead of electrical signals, optical neural networks can run many times faster while consuming less energy.

    However, these analog devices are prone to hardware errors that can make computations less precise. Microscopic imperfections in hardware components are one cause of these errors. In an optical neural network that has many connected components, errors can quickly accumulate.

    Even with error-correction techniques, due to fundamental properties of the devices that make up an optical neural network, some amount of error is unavoidable. A network that is large enough to be implemented in the real world would be far too imprecise to be effective.

    MIT researchers have overcome this hurdle and found a way to effectively scale an optical neural network. By adding a tiny hardware component to the optical switches that form the network’s architecture, they can reduce even the uncorrectable errors that would otherwise accumulate in the device.

    Their work could enable a super-fast, energy-efficient, analog neural network that can function with the same accuracy as a digital one. With this technique, as an optical circuit becomes larger, the amount of error in its computations actually decreases.  

    “This is remarkable, as it runs counter to the intuition of analog systems, where larger circuits are supposed to have higher errors, so that errors set a limit on scalability. This present paper allows us to address the scalability question of these systems with an unambiguous ‘yes,’” says lead author Ryan Hamerly, a visiting scientist in the MIT Research Laboratory for Electronics (RLE) and Quantum Photonics Laboratory and senior scientist at NTT Research.

    Hamerly’s co-authors are graduate student Saumil Bandyopadhyay and senior author Dirk Englund, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), leader of the Quantum Photonics Laboratory, and member of the RLE. The research is published today in Nature Communications.

    Multiplying with light

    An optical neural network is composed of many connected components that function like reprogrammable, tunable mirrors. These tunable mirrors are called Mach-Zehnder Inferometers (MZI). Neural network data are encoded into light, which is fired into the optical neural network from a laser.

    A typical MZI contains two mirrors and two beam splitters. Light enters the top of an MZI, where it is split into two parts which interfere with each other before being recombined by the second beam splitter and then reflected out the bottom to the next MZI in the array. Researchers can leverage the interference of these optical signals to perform complex linear algebra operations, known as matrix multiplication, which is how neural networks process data.

    But errors that can occur in each MZI quickly accumulate as light moves from one device to the next. One can avoid some errors by identifying them in advance and tuning the MZIs so earlier errors are cancelled out by later devices in the array.

    “It is a very simple algorithm if you know what the errors are. But these errors are notoriously difficult to ascertain because you only have access to the inputs and outputs of your chip,” says Hamerly. “This motivated us to look at whether it is possible to create calibration-free error correction.”

    Hamerly and his collaborators previously demonstrated a mathematical technique that went a step further. They could successfully infer the errors and correctly tune the MZIs accordingly, but even this didn’t remove all the error.

    Due to the fundamental nature of an MZI, there are instances where it is impossible to tune a device so all light flows out the bottom port to the next MZI. If the device loses a fraction of light at each step and the array is very large, by the end there will only be a tiny bit of power left.

    “Even with error correction, there is a fundamental limit to how good a chip can be. MZIs are physically unable to realize certain settings they need to be configured to,” he says.

    So, the team developed a new type of MZI. The researchers added an additional beam splitter to the end of the device, calling it a 3-MZI because it has three beam splitters instead of two. Due to the way this additional beam splitter mixes the light, it becomes much easier for an MZI to reach the setting it needs to send all light from out through its bottom port.

    Importantly, the additional beam splitter is only a few micrometers in size and is a passive component, so it doesn’t require any extra wiring. Adding additional beam splitters doesn’t significantly change the size of the chip.

    Bigger chip, fewer errors

    When the researchers conducted simulations to test their architecture, they found that it can eliminate much of the uncorrectable error that hampers accuracy. And as the optical neural network becomes larger, the amount of error in the device actually drops — the opposite of what happens in a device with standard MZIs.

    Using 3-MZIs, they could potentially create a device big enough for commercial uses with error that has been reduced by a factor of 20, Hamerly says.

    The researchers also developed a variant of the MZI design specifically for correlated errors. These occur due to manufacturing imperfections — if the thickness of a chip is slightly wrong, the MZIs may all be off by about the same amount, so the errors are all about the same. They found a way to change the configuration of an MZI to make it robust to these types of errors. This technique also increased the bandwidth of the optical neural network so it can run three times faster.

    Now that they have showcased these techniques using simulations, Hamerly and his collaborators plan to test these approaches on physical hardware and continue driving toward an optical neural network they can effectively deploy in the real world.

    This research is funded, in part, by a National Science Foundation graduate research fellowship and the U.S. Air Force Office of Scientific Research. More

  • in

    MIT Policy Hackathon produces new solutions for technology policy challenges

    Almost three years ago, the Covid-19 pandemic changed the world. Many are still looking to uncover a “new normal.”

    “Instead of going back to normal, [there’s a new generation that] wants to build back something different, something better,” says Jorge Sandoval, a second-year graduate student in MIT’s Technology and Policy Program (TPP) at the Institute for Data, Systems and Society (IDSS). “How do we communicate this mindset to others, that the world cannot be the same as before?”

    This was the inspiration behind “A New (Re)generation,” this year’s theme for the IDSS-student-run MIT Policy Hackathon, which Sandoval helped to organize as the event chair. The Policy Hackathon is a weekend-long, interdisciplinary competition that brings together participants from around the globe to explore potential solutions to some of society’s greatest challenges. 

    Unlike other competitions of its kind, Sandoval says MIT’s event emphasizes a humanistic approach. “The idea of our hackathon is to promote applications of technology that are humanistic or human-centered,” he says. “We take the opportunity to examine aspects of technology in the spaces where they tend to interact with society and people, an opportunity most technical competitions don’t offer because their primary focus is on the technology.”

    The competition started with 50 teams spread across four challenge categories. This year’s categories included Internet and Cybersecurity, Environmental Justice, Logistics, and Housing and City Planning. While some people come into the challenge with friends, Sandoval said most teams form organically during an online networking meeting hosted by MIT.

    “We encourage people to pair up with others outside of their country and to form teams of different diverse backgrounds and ages,” Sandoval says. “We try to give people who are often not invited to the decision-making table the opportunity to be a policymaker, bringing in those with backgrounds in not only law, policy, or politics, but also medicine, and people who have careers in engineering or experience working in nonprofits.”

    Once an in-person event, the Policy Hackathon has gone through its own regeneration process these past three years, according to Sandoval. After going entirely online during the pandemic’s height, last year they successfully hosted the first hybrid version of the event, which served as their model again this year.

    “The hybrid version of the event gives us the opportunity to allow people to connect in a way that is lost if it is only online, while also keeping the wide range of accessibility, allowing people to join from anywhere in the world, regardless of nationality or income, to provide their input,” Sandoval says.

    For Swetha Tadisina, an undergraduate computer science major at Lafayette College and participant in the internet and cybersecurity category, the hackathon was a unique opportunity to meet and work with people much more advanced in their careers. “I was surprised how such a diverse team that had never met before was able to work so efficiently and creatively,” Tadisina says.

    Erika Spangler, a public high school teacher from Massachusetts and member of the environmental justice category’s winning team, says that while each member of “Team Slime Mold” came to the table with a different set of skills, they managed to be in sync from the start — even working across the nine-and-a-half-hour time difference the four-person team faced when working with policy advocate Shruti Nandy from Calcutta, India.

    “We divided the project into data, policy, and research and trusted each other’s expertise,” Spangler says, “Despite having separate areas of focus, we made sure to have regular check-ins to problem-solve and cross-pollinate ideas.”

    During the 48-hour period, her team proposed the creation of an algorithm to identify high-quality brownfields that could be cleaned up and used as sites for building renewable energy. Their corresponding policy sought to mandate additional requirements for renewable energy businesses seeking tax credits from the Inflation Reduction Act.

    “Their policy memo had the most in-depth technical assessment, including deep dives in a few key cities to show the impact of their proposed approach for site selection at a very granular level,” says Amanda Levin, director of policy analysis for the Natural Resources Defense Council (NRDC). Levin acted as both a judge and challenge provider for the environmental justice category.

    “They also presented their policy recommendations in the memo in a well-thought-out way, clearly noting the relevant actor,” she adds. This clarity around what can be done, and who would be responsible for those actions, is highly valuable for those in policy.”

    Levin says the NRDC, one of the largest environmental nonprofits in the United States, provided five “challenge questions,” making it clear that teams did not need to address all of them. She notes that this gave teams significant leeway, bringing a wide variety of recommendations to the table. 

    “As a challenge partner, the work put together by all the teams is already being used to help inform discussions about the implementation of the Inflation Reduction Act,” Levin says. “Being able to tap into the collective intelligence of the hackathon helped uncover new perspectives and policy solutions that can help make an impact in addressing the important policy challenges we face today.”

    While having partners with experience in data science and policy definitely helped, fellow Team Slime Mold member Sara Sheffels, a PhD candidate in MIT’s biomaterials program, says she was surprised how much her experiences outside of science and policy were relevant to the challenge: “My experience organizing MIT’s Graduate Student Union shaped my ideas about more meaningful community involvement in renewables projects on brownfields. It is not meaningful to merely educate people about the importance of renewables or ask them to sign off on a pre-planned project without addressing their other needs.”

    “I wanted to test my limits, gain exposure, and expand my world,” Tadisina adds. “The exposure, friendships, and experiences you gain in such a short period of time are incredible.”

    For Willy R. Vasquez, an electrical and computer engineering PhD student at the University of Texas, the hackathon is not to be missed. “If you’re interested in the intersection of tech, society, and policy, then this is a must-do experience.” More

  • in

    A far-sighted approach to machine learning

    Picture two teams squaring off on a football field. The players can cooperate to achieve an objective, and compete against other players with conflicting interests. That’s how the game works.

    Creating artificial intelligence agents that can learn to compete and cooperate as effectively as humans remains a thorny problem. A key challenge is enabling AI agents to anticipate future behaviors of other agents when they are all learning simultaneously.

    Because of the complexity of this problem, current approaches tend to be myopic; the agents can only guess the next few moves of their teammates or competitors, which leads to poor performance in the long run. 

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a new approach that gives AI agents a farsighted perspective. Their machine-learning framework enables cooperative or competitive AI agents to consider what other agents will do as time approaches infinity, not just over a few next steps. The agents then adapt their behaviors accordingly to influence other agents’ future behaviors and arrive at an optimal, long-term solution.

    This framework could be used by a group of autonomous drones working together to find a lost hiker in a thick forest, or by self-driving cars that strive to keep passengers safe by anticipating future moves of other vehicles driving on a busy highway.

    “When AI agents are cooperating or competing, what matters most is when their behaviors converge at some point in the future. There are a lot of transient behaviors along the way that don’t matter very much in the long run. Reaching this converged behavior is what we really care about, and we now have a mathematical way to enable that,” says Dong-Ki Kim, a graduate student in the MIT Laboratory for Information and Decision Systems (LIDS) and lead author of a paper describing this framework.

    The senior author is Jonathan P. How, the Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Co-authors include others at the MIT-IBM Watson AI Lab, IBM Research, Mila-Quebec Artificial Intelligence Institute, and Oxford University. The research will be presented at the Conference on Neural Information Processing Systems.

    Play video

    In this demo video, the red robot, which has been trained using the researchers’ machine-learning system, is able to defeat the green robot by learning more effective behaviors that take advantage of the constantly changing strategy of its opponent.

    More agents, more problems

    The researchers focused on a problem known as multiagent reinforcement learning. Reinforcement learning is a form of machine learning in which an AI agent learns by trial and error. Researchers give the agent a reward for “good” behaviors that help it achieve a goal. The agent adapts its behavior to maximize that reward until it eventually becomes an expert at a task.

    But when many cooperative or competing agents are simultaneously learning, things become increasingly complex. As agents consider more future steps of their fellow agents, and how their own behavior influences others, the problem soon requires far too much computational power to solve efficiently. This is why other approaches only focus on the short term.

    “The AIs really want to think about the end of the game, but they don’t know when the game will end. They need to think about how to keep adapting their behavior into infinity so they can win at some far time in the future. Our paper essentially proposes a new objective that enables an AI to think about infinity,” says Kim.

    But since it is impossible to plug infinity into an algorithm, the researchers designed their system so agents focus on a future point where their behavior will converge with that of other agents, known as equilibrium. An equilibrium point determines the long-term performance of agents, and multiple equilibria can exist in a multiagent scenario. Therefore, an effective agent actively influences the future behaviors of other agents in such a way that they reach a desirable equilibrium from the agent’s perspective. If all agents influence each other, they converge to a general concept that the researchers call an “active equilibrium.”

    The machine-learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing acTive influence witH averagE Reward), enables agents to learn how to adapt their behaviors as they interact with other agents to achieve this active equilibrium.

    FURTHER does this using two machine-learning modules. The first, an inference module, enables an agent to guess the future behaviors of other agents and the learning algorithms they use, based solely on their prior actions.

    This information is fed into the reinforcement learning module, which the agent uses to adapt its behavior and influence other agents in a way that maximizes its reward.

    “The challenge was thinking about infinity. We had to use a lot of different mathematical tools to enable that, and make some assumptions to get it to work in practice,” Kim says.

    Winning in the long run

    They tested their approach against other multiagent reinforcement learning frameworks in several different scenarios, including a pair of robots fighting sumo-style and a battle pitting two 25-agent teams against one another. In both instances, the AI agents using FURTHER won the games more often.

    Since their approach is decentralized, which means the agents learn to win the games independently, it is also more scalable than other methods that require a central computer to control the agents, Kim explains.

    The researchers used games to test their approach, but FURTHER could be used to tackle any kind of multiagent problem. For instance, it could be applied by economists seeking to develop sound policy in situations where many interacting entitles have behaviors and interests that change over time.

    Economics is one application Kim is particularly excited about studying. He also wants to dig deeper into the concept of an active equilibrium and continue enhancing the FURTHER framework.

    This research is funded, in part, by the MIT-IBM Watson AI Lab. More