More stories

  • in

    Zero-trust architecture may hold the answer to cybersecurity insider threats

    For years, organizations have taken a defensive “castle-and-moat” approach to cybersecurity, seeking to secure the perimeters of their networks to block out any malicious actors. Individuals with the right credentials were assumed to be trustworthy and allowed access to a network’s systems and data without having to reauthorize themselves at each access attempt. However, organizations today increasingly store data in the cloud and allow employees to connect to the network remotely, both of which create vulnerabilities to this traditional approach. A more secure future may require a “zero-trust architecture,” in which users must prove their authenticity each time they access a network application or data.

    In May 2021, President Joe Biden’s Executive Order on Improving the Nation’s Cybersecurity outlined a goal for federal agencies to implement zero-trust security. Since then, MIT Lincoln Laboratory has been performing a study on zero-trust architectures, with the goals of reviewing their implementation in government and industry, identifying technical gaps and opportunities, and developing a set of recommendations for the United States’ approach to a zero-trust system.

    The study team’s first step was to define the term “zero trust” and understand the misperceptions in the field surrounding the concept. Some of these misperceptions suggest that a zero-trust architecture requires entirely new equipment to implement, or that it makes systems so “locked down” they’re not usable. 

    “Part of the reason why there is a lot of confusion about what zero trust is, is because it takes what the cybersecurity world has known about for many years and applies it in a different way,” says Jeffrey Gottschalk, the assistant head of Lincoln Laboratory’s Cyber Security and Information Sciences Division and study’s co-lead. “It is a paradigm shift in terms of how to think about security, but holistically it takes a lot of things that we already know how to do — such as multi-factor authentication, encryption, and software-defined networking­ — and combines them in different ways.”

    Play video

    Presentation: Overview of Zero Trust Architectures

    Recent high-profile cybersecurity incidents — such as those involving the National Security Agency, the U.S. Office of Personnel Management, Colonial Pipeline, SolarWinds, and Sony Pictures — highlight the vulnerability of systems and the need to rethink cybersecurity approaches.

    The study team reviewed recent, impactful cybersecurity incidents to identify which security principles were most responsible for the scale and impact of the attack. “We noticed that while a number of these attacks exploited previously unknown implementation vulnerabilities (also known as ‘zero-days’), the vast majority actually were due to the exploitation of operational security principles,” says Christopher Roeser, study co-lead and the assistant head of the Homeland Protection and Air Traffic Control Division, “that is, the gaining of individuals’ credentials, and the movement within a well-connected network that allows users to gather a significant amount of information or have very widespread effects.”

    In other words, the malicious actor had “breached the moat” and effectively became an insider.

    Zero-trust security principles could protect against this type of insider threat by treating every component, service, and user of a system as continuously exposed to and potentially compromised by a malicious actor. A user’s identity is verified each time that they request to access a new resource, and every access is mediated, logged, and analyzed. It’s like putting trip wires all over the inside of a network system, says Gottschalk. “So, when an adversary trips over that trip wire, you’ll get a signal and can validate that signal and see what’s going on.”

    In practice, a zero-trust approach could look like replacing a single-sign-on system, which lets users sign in just once for access to multiple applications, with a cloud-based identity that is known and verified. “Today, a lot of organizations have different ways that people authenticate and log onto systems, and many of those have been aggregated for expediency into single-sign-on capabilities, just to make it easier for people to log onto their systems. But we envision a future state that embraces zero trust, where identity verification is enabled by cloud-based identity that’s portable and ubiquitous, and very secure itself.”

    While conducting their study, the team spoke to approximately 10 companies and government organizations that have adopted zero-trust implementations — either through cloud services, in-house management, or a combination of both. They found the hybrid approach to be a good model for government organizations to adopt. They also found that the implementation could take from three to five years. “We talked to organizations that have actually done implementations of zero trust, and all of them have indicated that significant organizational commitment and change was required to be able to implement them,” Gottschalk says.

    But a key takeaway from the study is that there isn’t a one-size-fits-all approach to zero trust. “It’s why we think that having test-bed and pilot efforts are going to be very important to balance out zero-trust security with the mission needs of those systems,” Gottschalk says. The team also recognizes the importance of conducting ongoing research and development beyond initial zero-trust implementations, to continue to address evolving threats.

    Lincoln Laboratory will present further findings from the study at its upcoming Cyber Technology for National Security conference, which will be held June 28-29. The conference will also offer a short course for attendees to learn more about the benefits and implementations of zero-trust architectures.  More

  • in

    On the road to cleaner, greener, and faster driving

    No one likes sitting at a red light. But signalized intersections aren’t just a minor nuisance for drivers; vehicles consume fuel and emit greenhouse gases while waiting for the light to change.

    What if motorists could time their trips so they arrive at the intersection when the light is green? While that might be just a lucky break for a human driver, it could be achieved more consistently by an autonomous vehicle that uses artificial intelligence to control its speed.

    In a new study, MIT researchers demonstrate a machine-learning approach that can learn to control a fleet of autonomous vehicles as they approach and travel through a signalized intersection in a way that keeps traffic flowing smoothly.

    Using simulations, they found that their approach reduces fuel consumption and emissions while improving average vehicle speed. The technique gets the best results if all cars on the road are autonomous, but even if only 25 percent use their control algorithm, it still leads to substantial fuel and emissions benefits.

    “This is a really interesting place to intervene. No one’s life is better because they were stuck at an intersection. With a lot of other climate change interventions, there is a quality-of-life difference that is expected, so there is a barrier to entry there. Here, the barrier is much lower,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in the Department of Civil and Environmental Engineering and a member of the Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS).

    The lead author of the study is Vindula Jayawardana, a graduate student in LIDS and the Department of Electrical Engineering and Computer Science. The research will be presented at the European Control Conference.

    Intersection intricacies

    While humans may drive past a green light without giving it much thought, intersections can present billions of different scenarios depending on the number of lanes, how the signals operate, the number of vehicles and their speeds, the presence of pedestrians and cyclists, etc.

    Typical approaches for tackling intersection control problems use mathematical models to solve one simple, ideal intersection. That looks good on paper, but likely won’t hold up in the real world, where traffic patterns are often about as messy as they come.

    Wu and Jayawardana shifted gears and approached the problem using a model-free technique known as deep reinforcement learning. Reinforcement learning is a trial-and-error method where the control algorithm learns to make a sequence of decisions. It is rewarded when it finds a good sequence. With deep reinforcement learning, the algorithm leverages assumptions learned by a neural network to find shortcuts to good sequences, even if there are billions of possibilities.

    This is useful for solving a long-horizon problem like this; the control algorithm must issue upwards of 500 acceleration instructions to a vehicle over an extended time period, Wu explains.

    “And we have to get the sequence right before we know that we have done a good job of mitigating emissions and getting to the intersection at a good speed,” she adds.

    But there’s an additional wrinkle. The researchers want the system to learn a strategy that reduces fuel consumption and limits the impact on travel time. These goals can be conflicting.

    “To reduce travel time, we want the car to go fast, but to reduce emissions, we want the car to slow down or not move at all. Those competing rewards can be very confusing to the learning agent,” Wu says.

    While it is challenging to solve this problem in its full generality, the researchers employed a workaround using a technique known as reward shaping. With reward shaping, they give the system some domain knowledge it is unable to learn on its own. In this case, they penalized the system whenever the vehicle came to a complete stop, so it would learn to avoid that action.

    Traffic tests

    Once they developed an effective control algorithm, they evaluated it using a traffic simulation platform with a single intersection. The control algorithm is applied to a fleet of connected autonomous vehicles, which can communicate with upcoming traffic lights to receive signal phase and timing information and observe their immediate surroundings. The control algorithm tells each vehicle how to accelerate and decelerate.

    Their system didn’t create any stop-and-go traffic as vehicles approached the intersection. (Stop-and-go traffic occurs when cars are forced to come to a complete stop due to stopped traffic ahead). In simulations, more cars made it through in a single green phase, which outperformed a model that simulates human drivers. When compared to other optimization methods also designed to avoid stop-and-go traffic, their technique resulted in larger fuel consumption and emissions reductions. If every vehicle on the road is autonomous, their control system can reduce fuel consumption by 18 percent and carbon dioxide emissions by 25 percent, while boosting travel speeds by 20 percent.

    “A single intervention having 20 to 25 percent reduction in fuel or emissions is really incredible. But what I find interesting, and was really hoping to see, is this non-linear scaling. If we only control 25 percent of vehicles, that gives us 50 percent of the benefits in terms of fuel and emissions reduction. That means we don’t have to wait until we get to 100 percent autonomous vehicles to get benefits from this approach,” she says.

    Down the road, the researchers want to study interaction effects between multiple intersections. They also plan to explore how different intersection set-ups (number of lanes, signals, timings, etc.) can influence travel time, emissions, and fuel consumption. In addition, they intend to study how their control system could impact safety when autonomous vehicles and human drivers share the road. For instance, even though autonomous vehicles may drive differently than human drivers, slower roadways and roadways with more consistent speeds could improve safety, Wu says.

    While this work is still in its early stages, Wu sees this approach as one that could be more feasibly implemented in the near-term.

    “The aim in this work is to move the needle in sustainable mobility. We want to dream, as well, but these systems are big monsters of inertia. Identifying points of intervention that are small changes to the system but have significant impact is something that gets me up in the morning,” she says.  

    This work was supported, in part, by the MIT-IBM Watson AI Lab. More

  • in

    Technique protects privacy when making online recommendations

    Algorithms recommend products while we shop online or suggest songs we might like as we listen to music on streaming apps.

    These algorithms work by using personal information like our past purchases and browsing history to generate tailored recommendations. The sensitive nature of such data makes preserving privacy extremely important, but existing methods for solving this problem rely on heavy cryptographic tools requiring enormous amounts of computation and bandwidth.

    MIT researchers may have a better solution. They developed a privacy-preserving protocol that is so efficient it can run on a smartphone over a very slow network. Their technique safeguards personal data while ensuring recommendation results are accurate.

    In addition to user privacy, their protocol minimizes the unauthorized transfer of information from the database, known as leakage, even if a malicious agent tries to trick a database into revealing secret information.

    The new protocol could be especially useful in situations where data leaks could violate user privacy laws, like when a health care provider uses a patient’s medical history to search a database for other patients who had similar symptoms or when a company serves targeted advertisements to users under European privacy regulations.

    “This is a really hard problem. We relied on a whole string of cryptographic and algorithmic tricks to arrive at our protocol,” says Sacha Servan-Schreiber, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper that presents this new protocol.

    Servan-Schreiber wrote the paper with fellow CSAIL graduate student Simon Langowski and their advisor and senior author Srinivas Devadas, the Edwin Sibley Webster Professor of Electrical Engineering. The research will be presented at the IEEE Symposium on Security and Privacy.

    The data next door

    The technique at the heart of algorithmic recommendation engines is known as a nearest neighbor search, which involves finding the data point in a database that is closest to a query point. Data points that are mapped nearby share similar attributes and are called neighbors.

    These searches involve a server that is linked with an online database which contains concise representations of data point attributes. In the case of a music streaming service, those attributes, known as feature vectors, could be the genre or popularity of different songs.

    To find a song recommendation, the client (user) sends a query to the server that contains a certain feature vector, like a genre of music the user likes or a compressed history of their listening habits. The server then provides the ID of a feature vector in the database that is closest to the client’s query, without revealing the actual vector. In the case of music streaming, that ID would likely be a song title. The client learns the recommended song title without learning the feature vector associated with it.

    “The server has to be able to do this computation without seeing the numbers it is doing the computation on. It can’t actually see the features, but still needs to give you the closest thing in the database,” says Langowski.

    To achieve this, the researchers created a protocol that relies on two separate servers that access the same database. Using two servers makes the process more efficient and enables the use of a cryptographic technique known as private information retrieval. This technique allows a client to query a database without revealing what it is searching for, Servan-Schreiber explains.

    Overcoming security challenges

    But while private information retrieval is secure on the client side, it doesn’t provide database privacy on its own. The database offers a set of candidate vectors — possible nearest neighbors — for the client, which are typically winnowed down later by the client using brute force. However, doing so can reveal a lot about the database to the client. The additional privacy challenge is to prevent the client from learning those extra vectors. 

    The researchers employed a tuning technique that eliminates many of the extra vectors in the first place, and then used a different trick, which they call oblivious masking, to hide any additional data points except for the actual nearest neighbor. This efficiently preserves database privacy, so the client won’t learn anything about the feature vectors in the database.  

    Once they designed this protocol, they tested it with a nonprivate implementation on four real-world datasets to determine how to tune the algorithm to maximize accuracy. Then, they used their protocol to conduct private nearest neighbor search queries on those datasets.

    Their technique requires a few seconds of server processing time per query and less than 10 megabytes of communication between the client and servers, even with databases that contained more than 10 million items. By contrast, other secure methods can require gigabytes of communication or hours of computation time. With each query, their method achieved greater than 95 percent accuracy (meaning that nearly every time it found the actual approximate nearest neighbor to the query point). 

    The techniques they used to enable database privacy will thwart a malicious client even if it sends false queries to try and trick the server into leaking information.

    “A malicious client won’t learn much more information than an honest client following protocol. And it protects against malicious servers, too. If one deviates from protocol, you might not get the right result, but they will never learn what the client’s query was,” Langowski says.

    In the future, the researchers plan to adjust the protocol so it can preserve privacy using only one server. This could enable it to be applied in more real-world situations, since it would not require the use of two noncolluding entities (which don’t share information with each other) to manage the database.  

    “Nearest neighbor search undergirds many critical machine-learning driven applications, from providing users with content recommendations to classifying medical conditions. However, it typically requires sharing a lot of data with a central system to aggregate and enable the search,” says Bayan Bruss, head of applied machine-learning research at Capital One, who was not involved with this work. “This research provides a key step towards ensuring that the user receives the benefits from nearest neighbor search while having confidence that the central system will not use their data for other purposes.” More

  • in

    MIT to launch new Office of Research Computing and Data

    As the computing and data needs of MIT’s research community continue to grow — both in their quantity and complexity — the Institute is launching a new effort to ensure that researchers have access to the advanced computing resources and data management services they need to do their best work. 

    At the core of this effort is the creation of the new Office of Research Computing and Data (ORCD), to be led by Professor Peter Fisher, who will step down as head of the Department of Physics to serve as the office’s inaugural director. The office, which formally opens in September, will build on and replace the MIT Research Computing Project, an initiative supported by the Office of the Vice President for Research, which contributed in recent years to improving the computing resources available to MIT researchers.

    “Almost every scientific field makes use of research computing to carry out our mission at MIT — and computing needs vary between different research groups. In my world, high-energy physics experiments need large amounts of storage and many identical general-purpose CPUs, while astrophysical theorists simulating the formation of galaxy clusters need relatively little storage, but many CPUs with high-speed connections between them,” says Fisher, the Thomas A. Frank (1977) Professor of Physics, who will take up the mantle of ORCD director on Sept. 1.

    “I envision ORCD to be, at a minimum, a centralized system with a spectrum of different capabilities to allow our MIT researchers to start their projects and understand the computational resources needed to execute them,” Fisher adds.

    The Office of Research Computing and Data will provide services spanning hardware, software, and cloud solutions, including data storage and retrieval, and offer advice, training, documentation, and data curation for MIT’s research community. It will also work to develop innovative solutions that address emerging or highly specialized needs, and it will advance strategic collaborations with industry.

    The exceptional performance of MIT’s endowment last year has provided a unique opportunity for MIT to distribute endowment funds to accelerate progress on an array of Institute priorities in fiscal year 2023, beginning July 1, 2022. On the basis of community input and visiting committee feedback, MIT’s leadership identified research computing as one such priority, enabling the expanded effort that the Institute commenced today. Future operation of ORCD will incorporate a cost-recovery model.

    In his new role, Fisher will report to Maria Zuber, MIT’s vice president for research, and coordinate closely with MIT Information Systems and Technology (IS&T), MIT Libraries, and the deans of the five schools and the MIT Schwarzman College of Computing, among others. He will also work closely with Provost Cindy Barnhart.

    “I am thrilled that Peter has agreed to take on this important role,” says Zuber. “Under his leadership, I am confident that we’ll be able to build on the important progress of recent years to deliver to MIT researchers best-in-class infrastructure, services, and expertise so they can maximize the performance of their research.”

    MIT’s research computing capabilities have grown significantly in recent years. Ten years ago, the Institute joined with a number of other Massachusetts universities to establish the Massachusetts Green High-Performance Computing Center (MGHPCC) in Holyoke to provide the high-performance, low-carbon computing power necessary to carry out cutting-edge research while reducing its environmental impact. MIT’s capacity at the MGHPCC is now almost fully utilized, however, and an expansion is underway.

    The need for more advanced computing capacity is not the only issue to be addressed. Over the last decade, there have been considerable advances in cloud computing, which is increasingly used in research computing, requiring the Institute to take a new look at how it works with cloud services providers and then allocates cloud resources to departments, labs, and centers. And MIT’s longstanding model for research computing — which has been mostly decentralized — can lead to inefficiencies and inequities among departments, even as it offers flexibility.

    The Institute has been carefully assessing how to address these issues for several years, including in connection with the establishment of the MIT Schwarzman College of Computing. In August 2019, a college task force on computing infrastructure found a “campus-wide preference for an overarching organizational model of computing infrastructure that transcends a college or school and most logically falls under senior leadership.” The task force’s report also addressed the need for a better balance between centralized and decentralized research computing resources.

    “The needs for computing infrastructure and support vary considerably across disciplines,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “With the new Office of Research Computing and Data, the Institute is seizing the opportunity to transform its approach to supporting research computing and data, including not only hardware and cloud computing but also expertise. This move is a critical step forward in supporting MIT’s research and scholarship.”

    Over time, ORCD (pronounced “orchid”) aims to recruit a staff of professionals, including data scientists and engineers and system and hardware administrators, who will enhance, support, and maintain MIT’s research computing infrastructure, and ensure that all researchers on campus have access to a minimum level of advanced computing and data management.

    The new research computing and data effort is part of a broader push to modernize MIT’s information technology infrastructure and systems. “We are at an inflection point, where we have a significant opportunity to invest in core needs, replace or upgrade aging systems, and respond fully to the changing needs of our faculty, students, and staff,” says Mark Silis, MIT’s vice president for information systems and technology. “We are thrilled to have a new partner in the Office of Research Computing and Data as we embark on this important work.” More

  • in

    Artificial intelligence system learns concepts shared across video, audio, and text

    Humans observe the world through a combination of different modalities, like vision, hearing, and our understanding of language. Machines, on the other hand, interpret the world through data that algorithms can process.

    So, when a machine “sees” a photo, it must encode that photo into data it can use to perform a task like image classification. This process becomes more complicated when inputs come in multiple formats, like videos, audio clips, and images.

    “The main challenge here is, how can a machine align those different modalities? As humans, this is easy for us. We see a car and then hear the sound of a car driving by, and we know these are the same thing. But for machine learning, it is not that straightforward,” says Alexander Liu, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and first author of a paper tackling this problem. 

    Liu and his collaborators developed an artificial intelligence technique that learns to represent data in a way that captures concepts which are shared between visual and audio modalities. For instance, their method can learn that the action of a baby crying in a video is related to the spoken word “crying” in an audio clip.

    Using this knowledge, their machine-learning model can identify where a certain action is taking place in a video and label it.

    It performs better than other machine-learning methods at cross-modal retrieval tasks, which involve finding a piece of data, like a video, that matches a user’s query given in another form, like spoken language. Their model also makes it easier for users to see why the machine thinks the video it retrieved matches their query.

    This technique could someday be utilized to help robots learn about concepts in the world through perception, more like the way humans do.

    Joining Liu on the paper are CSAIL postdoc SouYoung Jin; grad students Cheng-I Jeff Lai and Andrew Rouditchenko; Aude Oliva, senior research scientist in CSAIL and MIT director of the MIT-IBM Watson AI Lab; and senior author James Glass, senior research scientist and head of the Spoken Language Systems Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

    Learning representations

    The researchers focus their work on representation learning, which is a form of machine learning that seeks to transform input data to make it easier to perform a task like classification or prediction.

    The representation learning model takes raw data, such as videos and their corresponding text captions, and encodes them by extracting features, or observations about objects and actions in the video. Then it maps those data points in a grid, known as an embedding space. The model clusters similar data together as single points in the grid. Each of these data points, or vectors, is represented by an individual word.

    For instance, a video clip of a person juggling might be mapped to a vector labeled “juggling.”

    The researchers constrain the model so it can only use 1,000 words to label vectors. The model can decide which actions or concepts it wants to encode into a single vector, but it can only use 1,000 vectors. The model chooses the words it thinks best represent the data.

    Rather than encoding data from different modalities onto separate grids, their method employs a shared embedding space where two modalities can be encoded together. This enables the model to learn the relationship between representations from two modalities, like video that shows a person juggling and an audio recording of someone saying “juggling.”

    To help the system process data from multiple modalities, they designed an algorithm that guides the machine to encode similar concepts into the same vector.

    “If there is a video about pigs, the model might assign the word ‘pig’ to one of the 1,000 vectors. Then if the model hears someone saying the word ‘pig’ in an audio clip, it should still use the same vector to encode that,” Liu explains.

    A better retriever

    They tested the model on cross-modal retrieval tasks using three datasets: a video-text dataset with video clips and text captions, a video-audio dataset with video clips and spoken audio captions, and an image-audio dataset with images and spoken audio captions.

    For example, in the video-audio dataset, the model chose 1,000 words to represent the actions in the videos. Then, when the researchers fed it audio queries, the model tried to find the clip that best matched those spoken words.

    “Just like a Google search, you type in some text and the machine tries to tell you the most relevant things you are searching for. Only we do this in the vector space,” Liu says.

    Not only was their technique more likely to find better matches than the models they compared it to, it is also easier to understand.

    Because the model could only use 1,000 total words to label vectors, a user can more see easily which words the machine used to conclude that the video and spoken words are similar. This could make the model easier to apply in real-world situations where it is vital that users understand how it makes decisions, Liu says.

    The model still has some limitations they hope to address in future work. For one, their research focused on data from two modalities at a time, but in the real world humans encounter many data modalities simultaneously, Liu says.

    “And we know 1,000 words works on this kind of dataset, but we don’t know if it can be generalized to a real-world problem,” he adds.

    Plus, the images and videos in their datasets contained simple objects or straightforward actions; real-world data are much messier. They also want to determine how well their method scales up when there is a wider diversity of inputs.

    This research was supported, in part, by the MIT-IBM Watson AI Lab and its member companies, Nexplore and Woodside, and by the MIT Lincoln Laboratory. More

  • in

    What words can convey

    From search engines to voice assistants, computers are getting better at understanding what we mean. That’s thanks to language-processing programs that make sense of a staggering number of words, without ever being told explicitly what those words mean. Such programs infer meaning instead through statistics — and a new study reveals that this computational approach can assign many kinds of information to a single word, just like the human brain.

    The study, published April 14 in the journal Nature Human Behavior, was co-led by Gabriel Grand, a graduate student in electrical engineering and computer science who is affiliated with MIT’s Computer Science and Artificial Intelligence Laboratory, and Idan Blank PhD ’16, an assistant professor at the University of California at Los Angeles. The work was supervised by McGovern Institute for Brain Research investigator Ev Fedorenko, a cognitive neuroscientist who studies how the human brain uses and understands language, and Francisco Pereira at the National Institute of Mental Health. Fedorenko says the rich knowledge her team was able to find within computational language models demonstrates just how much can be learned about the world through language alone.

    The research team began its analysis of statistics-based language processing models in 2015, when the approach was new. Such models derive meaning by analyzing how often pairs of words co-occur in texts and using those relationships to assess the similarities of words’ meanings. For example, such a program might conclude that “bread” and “apple” are more similar to one another than they are to “notebook,” because “bread” and “apple” are often found in proximity to words like “eat” or “snack,” whereas “notebook” is not.

    The models were clearly good at measuring words’ overall similarity to one another. But most words carry many kinds of information, and their similarities depend on which qualities are being evaluated. “Humans can come up with all these different mental scales to help organize their understanding of words,” explains Grand, a former undergraduate researcher in the Fedorenko lab. For example, he says, “dolphins and alligators might be similar in size, but one is much more dangerous than the other.”

    Grand and Blank, who was then a graduate student at the McGovern Institute, wanted to know whether the models captured that same nuance. And if they did, how was the information organized?

    To learn how the information in such a model stacked up to humans’ understanding of words, the team first asked human volunteers to score words along many different scales: Were the concepts those words conveyed big or small, safe or dangerous, wet or dry? Then, having mapped where people position different words along these scales, they looked to see whether language processing models did the same.

    Grand explains that distributional semantic models use co-occurrence statistics to organize words into a huge, multidimensional matrix. The more similar words are to one another, the closer they are within that space. The dimensions of the space are vast, and there is no inherent meaning built into its structure. “In these word embeddings, there are hundreds of dimensions, and we have no idea what any dimension means,” he says. “We’re really trying to peer into this black box and say, ‘is there structure in here?’”

    Specifically, they asked whether the semantic scales they had asked their volunteers use were represented in the model. So they looked to see where words in the space lined up along vectors defined by the extremes of those scales. Where did dolphins and tigers fall on line from “big” to “small,” for example? And were they closer together along that line than they were on a line representing danger (“safe” to “dangerous”)?

    Across more than 50 sets of world categories and semantic scales, they found that the model had organized words very much like the human volunteers. Dolphins and tigers were judged to be similar in terms of size, but far apart on scales measuring danger or wetness. The model had organized the words in a way that represented many kinds of meaning — and it had done so based entirely on the words’ co-occurrences.

    That, Fedorenko says, tells us something about the power of language. “The fact that we can recover so much of this rich semantic information from just these simple word co-occurrence statistics suggests that this is one very powerful source of learning about things that you may not even have direct perceptual experience with.” More

  • in

    Engineers use artificial intelligence to capture the complexity of breaking waves

    Waves break once they swell to a critical height, before cresting and crashing into a spray of droplets and bubbles. These waves can be as large as a surfer’s point break and as small as a gentle ripple rolling to shore. For decades, the dynamics of how and when a wave breaks have been too complex to predict.

    Now, MIT engineers have found a new way to model how waves break. The team used machine learning along with data from wave-tank experiments to tweak equations that have traditionally been used to predict wave behavior. Engineers typically rely on such equations to help them design resilient offshore platforms and structures. But until now, the equations have not been able to capture the complexity of breaking waves.

    The updated model made more accurate predictions of how and when waves break, the researchers found. For instance, the model estimated a wave’s steepness just before breaking, and its energy and frequency after breaking, more accurately than the conventional wave equations.

    Their results, published today in the journal Nature Communications, will help scientists understand how a breaking wave affects the water around it. Knowing precisely how these waves interact can help hone the design of offshore structures. It can also improve predictions for how the ocean interacts with the atmosphere. Having better estimates of how waves break can help scientists predict, for instance, how much carbon dioxide and other atmospheric gases the ocean can absorb.

    “Wave breaking is what puts air into the ocean,” says study author Themis Sapsis, an associate professor of mechanical and ocean engineering and an affiliate of the Institute for Data, Systems, and Society at MIT. “It may sound like a detail, but if you multiply its effect over the area of the entire ocean, wave breaking starts becoming fundamentally important to climate prediction.”

    The study’s co-authors include lead author and MIT postdoc Debbie Eeltink, Hubert Branger and Christopher Luneau of Aix-Marseille University, Amin Chabchoub of Kyoto University, Jerome Kasparian of the University of Geneva, and T.S. van den Bremer of Delft University of Technology.

    Learning tank

    To predict the dynamics of a breaking wave, scientists typically take one of two approaches: They either attempt to precisely simulate the wave at the scale of individual molecules of water and air, or they run experiments to try and characterize waves with actual measurements. The first approach is computationally expensive and difficult to simulate even over a small area; the second requires a huge amount of time to run enough experiments to yield statistically significant results.

    The MIT team instead borrowed pieces from both approaches to develop a more efficient and accurate model using machine learning. The researchers started with a set of equations that is considered the standard description of wave behavior. They aimed to improve the model by “training” the model on data of breaking waves from actual experiments.

    “We had a simple model that doesn’t capture wave breaking, and then we had the truth, meaning experiments that involve wave breaking,” Eeltink explains. “Then we wanted to use machine learning to learn the difference between the two.”

    The researchers obtained wave breaking data by running experiments in a 40-meter-long tank. The tank was fitted at one end with a paddle which the team used to initiate each wave. The team set the paddle to produce a breaking wave in the middle of the tank. Gauges along the length of the tank measured the water’s height as waves propagated down the tank.

    “It takes a lot of time to run these experiments,” Eeltink says. “Between each experiment you have to wait for the water to completely calm down before you launch the next experiment, otherwise they influence each other.”

    Safe harbor

    In all, the team ran about 250 experiments, the data from which they used to train a type of machine-learning algorithm known as a neural network. Specifically, the algorithm is trained to compare the real waves in experiments with the predicted waves in the simple model, and based on any differences between the two, the algorithm tunes the model to fit reality.

    After training the algorithm on their experimental data, the team introduced the model to entirely new data — in this case, measurements from two independent experiments, each run at separate wave tanks with different dimensions. In these tests, they found the updated model made more accurate predictions than the simple, untrained model, for instance making better estimates of a breaking wave’s steepness.

    The new model also captured an essential property of breaking waves known as the “downshift,” in which the frequency of a wave is shifted to a lower value. The speed of a wave depends on its frequency. For ocean waves, lower frequencies move faster than higher frequencies. Therefore, after the downshift, the wave will move faster. The new model predicts the change in frequency, before and after each breaking wave, which could be especially relevant in preparing for coastal storms.

    “When you want to forecast when high waves of a swell would reach a harbor, and you want to leave the harbor before those waves arrive, then if you get the wave frequency wrong, then the speed at which the waves are approaching is wrong,” Eeltink says.

    The team’s updated wave model is in the form of an open-source code that others could potentially use, for instance in climate simulations of the ocean’s potential to absorb carbon dioxide and other atmospheric gases. The code can also be worked into simulated tests of offshore platforms and coastal structures.

    “The number one purpose of this model is to predict what a wave will do,” Sapsis says. “If you don’t model wave breaking right, it would have tremendous implications for how structures behave. With this, you could simulate waves to help design structures better, more efficiently, and without huge safety factors.”

    This research is supported, in part, by the Swiss National Science Foundation, and by the U.S. Office of Naval Research. More

  • in

    Seven from MIT elected to American Academy of Arts and Sciences for 2022

    Seven MIT faculty members are among more than 250 leaders from academia, the arts, industry, public policy, and research elected to the American Academy of Arts and Sciences, the academy announced Thursday.

    One of the nation’s most prestigious honorary societies, the academy is also a leading center for independent policy research. Members contribute to academy publications, as well as studies of science and technology policy, energy and global security, social policy and American institutions, the humanities and culture, and education.

    Those elected from MIT this year are:

    Alberto Abadie, professor of economics and associate director of the Institute for Data, Systems, and Society
    Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health
    Roman Bezrukavnikov, professor of mathematics
    Michale S. Fee, the Glen V. and Phyllis F. Dorflinger Professor and head of the Department of Brain and Cognitive Sciences
    Dina Katabi, the Thuan and Nicole Pham Professor
    Ronald T. Raines, the Roger and Georges Firmenich Professor of Natural Products Chemistry
    Rebecca R. Saxe, the John W. Jarve Professor of Brain and Cognitive Sciences

    “We are celebrating a depth of achievements in a breadth of areas,” says David Oxtoby, president of the American Academy. “These individuals excel in ways that excite us and inspire us at a time when recognizing excellence, commending expertise, and working toward the common good is absolutely essential to realizing a better future.”

    Since its founding in 1780, the academy has elected leading thinkers from each generation, including George Washington and Benjamin Franklin in the 18th century, Maria Mitchell and Daniel Webster in the 19th century, and Toni Morrison and Albert Einstein in the 20th century. The current membership includes more than 250 Nobel and Pulitzer Prize winners. More