More stories

  • in

    MIT to launch new Office of Research Computing and Data

    As the computing and data needs of MIT’s research community continue to grow — both in their quantity and complexity — the Institute is launching a new effort to ensure that researchers have access to the advanced computing resources and data management services they need to do their best work. 

    At the core of this effort is the creation of the new Office of Research Computing and Data (ORCD), to be led by Professor Peter Fisher, who will step down as head of the Department of Physics to serve as the office’s inaugural director. The office, which formally opens in September, will build on and replace the MIT Research Computing Project, an initiative supported by the Office of the Vice President for Research, which contributed in recent years to improving the computing resources available to MIT researchers.

    “Almost every scientific field makes use of research computing to carry out our mission at MIT — and computing needs vary between different research groups. In my world, high-energy physics experiments need large amounts of storage and many identical general-purpose CPUs, while astrophysical theorists simulating the formation of galaxy clusters need relatively little storage, but many CPUs with high-speed connections between them,” says Fisher, the Thomas A. Frank (1977) Professor of Physics, who will take up the mantle of ORCD director on Sept. 1.

    “I envision ORCD to be, at a minimum, a centralized system with a spectrum of different capabilities to allow our MIT researchers to start their projects and understand the computational resources needed to execute them,” Fisher adds.

    The Office of Research Computing and Data will provide services spanning hardware, software, and cloud solutions, including data storage and retrieval, and offer advice, training, documentation, and data curation for MIT’s research community. It will also work to develop innovative solutions that address emerging or highly specialized needs, and it will advance strategic collaborations with industry.

    The exceptional performance of MIT’s endowment last year has provided a unique opportunity for MIT to distribute endowment funds to accelerate progress on an array of Institute priorities in fiscal year 2023, beginning July 1, 2022. On the basis of community input and visiting committee feedback, MIT’s leadership identified research computing as one such priority, enabling the expanded effort that the Institute commenced today. Future operation of ORCD will incorporate a cost-recovery model.

    In his new role, Fisher will report to Maria Zuber, MIT’s vice president for research, and coordinate closely with MIT Information Systems and Technology (IS&T), MIT Libraries, and the deans of the five schools and the MIT Schwarzman College of Computing, among others. He will also work closely with Provost Cindy Barnhart.

    “I am thrilled that Peter has agreed to take on this important role,” says Zuber. “Under his leadership, I am confident that we’ll be able to build on the important progress of recent years to deliver to MIT researchers best-in-class infrastructure, services, and expertise so they can maximize the performance of their research.”

    MIT’s research computing capabilities have grown significantly in recent years. Ten years ago, the Institute joined with a number of other Massachusetts universities to establish the Massachusetts Green High-Performance Computing Center (MGHPCC) in Holyoke to provide the high-performance, low-carbon computing power necessary to carry out cutting-edge research while reducing its environmental impact. MIT’s capacity at the MGHPCC is now almost fully utilized, however, and an expansion is underway.

    The need for more advanced computing capacity is not the only issue to be addressed. Over the last decade, there have been considerable advances in cloud computing, which is increasingly used in research computing, requiring the Institute to take a new look at how it works with cloud services providers and then allocates cloud resources to departments, labs, and centers. And MIT’s longstanding model for research computing — which has been mostly decentralized — can lead to inefficiencies and inequities among departments, even as it offers flexibility.

    The Institute has been carefully assessing how to address these issues for several years, including in connection with the establishment of the MIT Schwarzman College of Computing. In August 2019, a college task force on computing infrastructure found a “campus-wide preference for an overarching organizational model of computing infrastructure that transcends a college or school and most logically falls under senior leadership.” The task force’s report also addressed the need for a better balance between centralized and decentralized research computing resources.

    “The needs for computing infrastructure and support vary considerably across disciplines,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “With the new Office of Research Computing and Data, the Institute is seizing the opportunity to transform its approach to supporting research computing and data, including not only hardware and cloud computing but also expertise. This move is a critical step forward in supporting MIT’s research and scholarship.”

    Over time, ORCD (pronounced “orchid”) aims to recruit a staff of professionals, including data scientists and engineers and system and hardware administrators, who will enhance, support, and maintain MIT’s research computing infrastructure, and ensure that all researchers on campus have access to a minimum level of advanced computing and data management.

    The new research computing and data effort is part of a broader push to modernize MIT’s information technology infrastructure and systems. “We are at an inflection point, where we have a significant opportunity to invest in core needs, replace or upgrade aging systems, and respond fully to the changing needs of our faculty, students, and staff,” says Mark Silis, MIT’s vice president for information systems and technology. “We are thrilled to have a new partner in the Office of Research Computing and Data as we embark on this important work.” More

  • in

    Artificial intelligence system learns concepts shared across video, audio, and text

    Humans observe the world through a combination of different modalities, like vision, hearing, and our understanding of language. Machines, on the other hand, interpret the world through data that algorithms can process.

    So, when a machine “sees” a photo, it must encode that photo into data it can use to perform a task like image classification. This process becomes more complicated when inputs come in multiple formats, like videos, audio clips, and images.

    “The main challenge here is, how can a machine align those different modalities? As humans, this is easy for us. We see a car and then hear the sound of a car driving by, and we know these are the same thing. But for machine learning, it is not that straightforward,” says Alexander Liu, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and first author of a paper tackling this problem. 

    Liu and his collaborators developed an artificial intelligence technique that learns to represent data in a way that captures concepts which are shared between visual and audio modalities. For instance, their method can learn that the action of a baby crying in a video is related to the spoken word “crying” in an audio clip.

    Using this knowledge, their machine-learning model can identify where a certain action is taking place in a video and label it.

    It performs better than other machine-learning methods at cross-modal retrieval tasks, which involve finding a piece of data, like a video, that matches a user’s query given in another form, like spoken language. Their model also makes it easier for users to see why the machine thinks the video it retrieved matches their query.

    This technique could someday be utilized to help robots learn about concepts in the world through perception, more like the way humans do.

    Joining Liu on the paper are CSAIL postdoc SouYoung Jin; grad students Cheng-I Jeff Lai and Andrew Rouditchenko; Aude Oliva, senior research scientist in CSAIL and MIT director of the MIT-IBM Watson AI Lab; and senior author James Glass, senior research scientist and head of the Spoken Language Systems Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

    Learning representations

    The researchers focus their work on representation learning, which is a form of machine learning that seeks to transform input data to make it easier to perform a task like classification or prediction.

    The representation learning model takes raw data, such as videos and their corresponding text captions, and encodes them by extracting features, or observations about objects and actions in the video. Then it maps those data points in a grid, known as an embedding space. The model clusters similar data together as single points in the grid. Each of these data points, or vectors, is represented by an individual word.

    For instance, a video clip of a person juggling might be mapped to a vector labeled “juggling.”

    The researchers constrain the model so it can only use 1,000 words to label vectors. The model can decide which actions or concepts it wants to encode into a single vector, but it can only use 1,000 vectors. The model chooses the words it thinks best represent the data.

    Rather than encoding data from different modalities onto separate grids, their method employs a shared embedding space where two modalities can be encoded together. This enables the model to learn the relationship between representations from two modalities, like video that shows a person juggling and an audio recording of someone saying “juggling.”

    To help the system process data from multiple modalities, they designed an algorithm that guides the machine to encode similar concepts into the same vector.

    “If there is a video about pigs, the model might assign the word ‘pig’ to one of the 1,000 vectors. Then if the model hears someone saying the word ‘pig’ in an audio clip, it should still use the same vector to encode that,” Liu explains.

    A better retriever

    They tested the model on cross-modal retrieval tasks using three datasets: a video-text dataset with video clips and text captions, a video-audio dataset with video clips and spoken audio captions, and an image-audio dataset with images and spoken audio captions.

    For example, in the video-audio dataset, the model chose 1,000 words to represent the actions in the videos. Then, when the researchers fed it audio queries, the model tried to find the clip that best matched those spoken words.

    “Just like a Google search, you type in some text and the machine tries to tell you the most relevant things you are searching for. Only we do this in the vector space,” Liu says.

    Not only was their technique more likely to find better matches than the models they compared it to, it is also easier to understand.

    Because the model could only use 1,000 total words to label vectors, a user can more see easily which words the machine used to conclude that the video and spoken words are similar. This could make the model easier to apply in real-world situations where it is vital that users understand how it makes decisions, Liu says.

    The model still has some limitations they hope to address in future work. For one, their research focused on data from two modalities at a time, but in the real world humans encounter many data modalities simultaneously, Liu says.

    “And we know 1,000 words works on this kind of dataset, but we don’t know if it can be generalized to a real-world problem,” he adds.

    Plus, the images and videos in their datasets contained simple objects or straightforward actions; real-world data are much messier. They also want to determine how well their method scales up when there is a wider diversity of inputs.

    This research was supported, in part, by the MIT-IBM Watson AI Lab and its member companies, Nexplore and Woodside, and by the MIT Lincoln Laboratory. More

  • in

    What words can convey

    From search engines to voice assistants, computers are getting better at understanding what we mean. That’s thanks to language-processing programs that make sense of a staggering number of words, without ever being told explicitly what those words mean. Such programs infer meaning instead through statistics — and a new study reveals that this computational approach can assign many kinds of information to a single word, just like the human brain.

    The study, published April 14 in the journal Nature Human Behavior, was co-led by Gabriel Grand, a graduate student in electrical engineering and computer science who is affiliated with MIT’s Computer Science and Artificial Intelligence Laboratory, and Idan Blank PhD ’16, an assistant professor at the University of California at Los Angeles. The work was supervised by McGovern Institute for Brain Research investigator Ev Fedorenko, a cognitive neuroscientist who studies how the human brain uses and understands language, and Francisco Pereira at the National Institute of Mental Health. Fedorenko says the rich knowledge her team was able to find within computational language models demonstrates just how much can be learned about the world through language alone.

    The research team began its analysis of statistics-based language processing models in 2015, when the approach was new. Such models derive meaning by analyzing how often pairs of words co-occur in texts and using those relationships to assess the similarities of words’ meanings. For example, such a program might conclude that “bread” and “apple” are more similar to one another than they are to “notebook,” because “bread” and “apple” are often found in proximity to words like “eat” or “snack,” whereas “notebook” is not.

    The models were clearly good at measuring words’ overall similarity to one another. But most words carry many kinds of information, and their similarities depend on which qualities are being evaluated. “Humans can come up with all these different mental scales to help organize their understanding of words,” explains Grand, a former undergraduate researcher in the Fedorenko lab. For example, he says, “dolphins and alligators might be similar in size, but one is much more dangerous than the other.”

    Grand and Blank, who was then a graduate student at the McGovern Institute, wanted to know whether the models captured that same nuance. And if they did, how was the information organized?

    To learn how the information in such a model stacked up to humans’ understanding of words, the team first asked human volunteers to score words along many different scales: Were the concepts those words conveyed big or small, safe or dangerous, wet or dry? Then, having mapped where people position different words along these scales, they looked to see whether language processing models did the same.

    Grand explains that distributional semantic models use co-occurrence statistics to organize words into a huge, multidimensional matrix. The more similar words are to one another, the closer they are within that space. The dimensions of the space are vast, and there is no inherent meaning built into its structure. “In these word embeddings, there are hundreds of dimensions, and we have no idea what any dimension means,” he says. “We’re really trying to peer into this black box and say, ‘is there structure in here?’”

    Specifically, they asked whether the semantic scales they had asked their volunteers use were represented in the model. So they looked to see where words in the space lined up along vectors defined by the extremes of those scales. Where did dolphins and tigers fall on line from “big” to “small,” for example? And were they closer together along that line than they were on a line representing danger (“safe” to “dangerous”)?

    Across more than 50 sets of world categories and semantic scales, they found that the model had organized words very much like the human volunteers. Dolphins and tigers were judged to be similar in terms of size, but far apart on scales measuring danger or wetness. The model had organized the words in a way that represented many kinds of meaning — and it had done so based entirely on the words’ co-occurrences.

    That, Fedorenko says, tells us something about the power of language. “The fact that we can recover so much of this rich semantic information from just these simple word co-occurrence statistics suggests that this is one very powerful source of learning about things that you may not even have direct perceptual experience with.” More

  • in

    Engineers use artificial intelligence to capture the complexity of breaking waves

    Waves break once they swell to a critical height, before cresting and crashing into a spray of droplets and bubbles. These waves can be as large as a surfer’s point break and as small as a gentle ripple rolling to shore. For decades, the dynamics of how and when a wave breaks have been too complex to predict.

    Now, MIT engineers have found a new way to model how waves break. The team used machine learning along with data from wave-tank experiments to tweak equations that have traditionally been used to predict wave behavior. Engineers typically rely on such equations to help them design resilient offshore platforms and structures. But until now, the equations have not been able to capture the complexity of breaking waves.

    The updated model made more accurate predictions of how and when waves break, the researchers found. For instance, the model estimated a wave’s steepness just before breaking, and its energy and frequency after breaking, more accurately than the conventional wave equations.

    Their results, published today in the journal Nature Communications, will help scientists understand how a breaking wave affects the water around it. Knowing precisely how these waves interact can help hone the design of offshore structures. It can also improve predictions for how the ocean interacts with the atmosphere. Having better estimates of how waves break can help scientists predict, for instance, how much carbon dioxide and other atmospheric gases the ocean can absorb.

    “Wave breaking is what puts air into the ocean,” says study author Themis Sapsis, an associate professor of mechanical and ocean engineering and an affiliate of the Institute for Data, Systems, and Society at MIT. “It may sound like a detail, but if you multiply its effect over the area of the entire ocean, wave breaking starts becoming fundamentally important to climate prediction.”

    The study’s co-authors include lead author and MIT postdoc Debbie Eeltink, Hubert Branger and Christopher Luneau of Aix-Marseille University, Amin Chabchoub of Kyoto University, Jerome Kasparian of the University of Geneva, and T.S. van den Bremer of Delft University of Technology.

    Learning tank

    To predict the dynamics of a breaking wave, scientists typically take one of two approaches: They either attempt to precisely simulate the wave at the scale of individual molecules of water and air, or they run experiments to try and characterize waves with actual measurements. The first approach is computationally expensive and difficult to simulate even over a small area; the second requires a huge amount of time to run enough experiments to yield statistically significant results.

    The MIT team instead borrowed pieces from both approaches to develop a more efficient and accurate model using machine learning. The researchers started with a set of equations that is considered the standard description of wave behavior. They aimed to improve the model by “training” the model on data of breaking waves from actual experiments.

    “We had a simple model that doesn’t capture wave breaking, and then we had the truth, meaning experiments that involve wave breaking,” Eeltink explains. “Then we wanted to use machine learning to learn the difference between the two.”

    The researchers obtained wave breaking data by running experiments in a 40-meter-long tank. The tank was fitted at one end with a paddle which the team used to initiate each wave. The team set the paddle to produce a breaking wave in the middle of the tank. Gauges along the length of the tank measured the water’s height as waves propagated down the tank.

    “It takes a lot of time to run these experiments,” Eeltink says. “Between each experiment you have to wait for the water to completely calm down before you launch the next experiment, otherwise they influence each other.”

    Safe harbor

    In all, the team ran about 250 experiments, the data from which they used to train a type of machine-learning algorithm known as a neural network. Specifically, the algorithm is trained to compare the real waves in experiments with the predicted waves in the simple model, and based on any differences between the two, the algorithm tunes the model to fit reality.

    After training the algorithm on their experimental data, the team introduced the model to entirely new data — in this case, measurements from two independent experiments, each run at separate wave tanks with different dimensions. In these tests, they found the updated model made more accurate predictions than the simple, untrained model, for instance making better estimates of a breaking wave’s steepness.

    The new model also captured an essential property of breaking waves known as the “downshift,” in which the frequency of a wave is shifted to a lower value. The speed of a wave depends on its frequency. For ocean waves, lower frequencies move faster than higher frequencies. Therefore, after the downshift, the wave will move faster. The new model predicts the change in frequency, before and after each breaking wave, which could be especially relevant in preparing for coastal storms.

    “When you want to forecast when high waves of a swell would reach a harbor, and you want to leave the harbor before those waves arrive, then if you get the wave frequency wrong, then the speed at which the waves are approaching is wrong,” Eeltink says.

    The team’s updated wave model is in the form of an open-source code that others could potentially use, for instance in climate simulations of the ocean’s potential to absorb carbon dioxide and other atmospheric gases. The code can also be worked into simulated tests of offshore platforms and coastal structures.

    “The number one purpose of this model is to predict what a wave will do,” Sapsis says. “If you don’t model wave breaking right, it would have tremendous implications for how structures behave. With this, you could simulate waves to help design structures better, more efficiently, and without huge safety factors.”

    This research is supported, in part, by the Swiss National Science Foundation, and by the U.S. Office of Naval Research. More

  • in

    Estimating the informativeness of data

    Not all data are created equal. But how much information is any piece of data likely to contain? This question is central to medical testing, designing scientific experiments, and even to everyday human learning and thinking. MIT researchers have developed a new way to solve this problem, opening up new applications in medicine, scientific discovery, cognitive science, and artificial intelligence.

    In theory, the 1948 paper, “A Mathematical Theory of Communication,” by the late MIT Professor Emeritus Claude Shannon answered this question definitively. One of Shannon’s breakthrough results is the idea of entropy, which lets us quantify the amount of information inherent in any random object, including random variables that model observed data. Shannon’s results created the foundations of information theory and modern telecommunications. The concept of entropy has also proven central to computer science and machine learning.

    The challenge of estimating entropy

    Unfortunately, the use of Shannon’s formula can quickly become computationally intractable. It requires precisely calculating the probability of the data, which in turn requires calculating every possible way the data could have arisen under a probabilistic model. If the data-generating process is very simple — for example, a single toss of a coin or roll of a loaded die — then calculating entropies is straightforward. But consider the problem of medical testing, where a positive test result is the result of hundreds of interacting variables, all unknown. With just 10 unknowns, there are already 1,000 possible explanations for the data. With a few hundred, there are more possible explanations than atoms in the known universe, which makes calculating the entropy exactly an unmanageable problem.

    MIT researchers have developed a new method to estimate good approximations to many information quantities such as Shannon entropy by using probabilistic inference. The work appears in a paper presented at AISTATS 2022 by authors Feras Saad ’16, MEng ’16, a PhD candidate in electrical engineering and computer science; Marco-Cusumano Towner PhD ’21; and Vikash Mansinghka ’05, MEng ’09, PhD ’09, a principal research scientist in the Department of Brain and Cognitive Sciences. The key insight is, rather than enumerate all explanations, to instead use probabilistic inference algorithms to first infer which explanations are probable and then use these probable explanations to construct high-quality entropy estimates. The paper shows that this inference-based approach can be much faster and more accurate than previous approaches.

    Estimating entropy and information in a probabilistic model is fundamentally hard because it often requires solving a high-dimensional integration problem. Many previous works have developed estimators of these quantities for certain special cases, but the new estimators of entropy via inference (EEVI) offer the first approach that can deliver sharp upper and lower bounds on a broad set of information-theoretic quantities. An upper and lower bound means that although we don’t know the true entropy, we can get a number that is smaller than it and a number that is higher than it.

    “The upper and lower bounds on entropy delivered by our method are particularly useful for three reasons,” says Saad. “First, the difference between the upper and lower bounds gives a quantitative sense of how confident we should be about the estimates. Second, by using more computational effort we can drive the difference between the two bounds to zero, which ‘squeezes’ the true value with a high degree of accuracy. Third, we can compose these bounds to form estimates of many other quantities that tell us how informative different variables in a model are of one another.”

    Solving fundamental problems with data-driven expert systems

    Saad says he is most excited about the possibility that this method gives for querying probabilistic models in areas like machine-assisted medical diagnoses. He says one goal of the EEVI method is to be able to solve new queries using rich generative models for things like liver disease and diabetes that have already been developed by experts in the medical domain. For example, suppose we have a patient with a set of observed attributes (height, weight, age, etc.) and observed symptoms (nausea, blood pressure, etc.). Given these attributes and symptoms, EEVI can be used to help determine which medical tests for symptoms the physician should conduct to maximize information about the absence or presence of a given liver disease (like cirrhosis or primary biliary cholangitis).

    For insulin diagnosis, the authors showed how to use the method for computing optimal times to take blood glucose measurements that maximize information about a patient’s insulin sensitivity, given an expert-built probabilistic model of insulin metabolism and the patient’s personalized meal and medication schedule. As routine medical tracking like glucose monitoring moves away from doctor’s offices and toward wearable devices, there are even more opportunities to improve data acquisition, if the value of the data can be estimated accurately in advance.

    Vikash Mansinghka, senior author on the paper, adds, “We’ve shown that probabilistic inference algorithms can be used to estimate rigorous bounds on information measures that AI engineers often think of as intractable to calculate. This opens up many new applications. It also shows that inference may be more computationally fundamental than we thought. It also helps to explain how human minds might be able to estimate the value of information so pervasively, as a central building block of everyday cognition, and help us engineer AI expert systems that have these capabilities.”

    The paper, “Estimators of Entropy and Information via Inference in Probabilistic Models,” was presented at AISTATS 2022. More

  • in

    A new state of the art for unsupervised vision

    Labeling data can be a chore. It’s the main source of sustenance for computer-vision models; without it, they’d have a lot of difficulty identifying objects, people, and other important image characteristics. Yet producing just an hour of tagged and labeled data can take a whopping 800 hours of human time. Our high-fidelity understanding of the world develops as machines can better perceive and interact with our surroundings. But they need more help.

    Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Microsoft, and Cornell University have attempted to solve this problem plaguing vision models by creating “STEGO,” an algorithm that can jointly discover and segment objects without any human labels at all, down to the pixel.

    STEGO learns something called “semantic segmentation” — fancy speak for the process of assigning a label to every pixel in an image. Semantic segmentation is an important skill for today’s computer-vision systems because images can be cluttered with objects. Even more challenging is that these objects don’t always fit into literal boxes; algorithms tend to work better for discrete “things” like people and cars as opposed to “stuff” like vegetation, sky, and mashed potatoes. A previous system might simply perceive a nuanced scene of a dog playing in the park as just a dog, but by assigning every pixel of the image a label, STEGO can break the image into its main ingredients: a dog, sky, grass, and its owner.

    Play video

    A new state of the art for unsupervised computer vision

    Assigning every single pixel of the world a label is ambitious — especially without any kind of feedback from humans. The majority of algorithms today get their knowledge from mounds of labeled data, which can take painstaking human-hours to source. Just imagine the excitement of labeling every pixel of 100,000 images! To discover these objects without a human’s helpful guidance, STEGO looks for similar objects that appear throughout a dataset. It then associates these similar objects together to construct a consistent view of the world across all of the images it learns from.

    Seeing the world

    Machines that can “see” are crucial for a wide array of new and emerging technologies like self-driving cars and predictive modeling for medical diagnostics. Since STEGO can learn without labels, it can detect objects in many different domains, even those that humans don’t yet understand fully. 

    “If you’re looking at oncological scans, the surface of planets, or high-resolution biological images, it’s hard to know what objects to look for without expert knowledge. In emerging domains, sometimes even human experts don’t know what the right objects should be,” says Mark Hamilton, a PhD student in electrical engineering and computer science at MIT, research affiliate of MIT CSAIL, software engineer at Microsoft, and lead author on a new paper about STEGO. “In these types of situations where you want to design a method to operate at the boundaries of science, you can’t rely on humans to figure it out before machines do.”

    STEGO was tested on a slew of visual domains spanning general images, driving images, and high-altitude aerial photographs. In each domain, STEGO was able to identify and segment relevant objects that were closely aligned with human judgments. STEGO’s most diverse benchmark was the COCO-Stuff dataset, which is made up of diverse images from all over the world, from indoor scenes to people playing sports to trees and cows. In most cases, the previous state-of-the-art system could capture a low-resolution gist of a scene, but struggled on fine-grained details: A human was a blob, a motorcycle was captured as a person, and it couldn’t recognize any geese. On the same scenes, STEGO doubled the performance of previous systems and discovered concepts like animals, buildings, people, furniture, and many others.

    STEGO not only doubled the performance of prior systems on the COCO-Stuff benchmark, but made similar leaps forward in other visual domains. When applied to driverless car datasets, STEGO successfully segmented out roads, people, and street signs with much higher resolution and granularity than previous systems. On images from space, the system broke down every single square foot of the surface of the Earth into roads, vegetation, and buildings. 

    Connecting the pixels

    STEGO — which stands for “Self-supervised Transformer with Energy-based Graph Optimization” — builds on top of the DINO algorithm, which learned about the world through 14 million images from the ImageNet database. STEGO refines the DINO backbone through a learning process that mimics our own way of stitching together pieces of the world to make meaning. 

    For example, you might consider two images of dogs walking in the park. Even though they’re different dogs, with different owners, in different parks, STEGO can tell (without humans) how each scene’s objects relate to each other. The authors even probe STEGO’s mind to see how each little, brown, furry thing in the images are similar, and likewise with other shared objects like grass and people. By connecting objects across images, STEGO builds a consistent view of the word.

    “The idea is that these types of algorithms can find consistent groupings in a largely automated fashion so we don’t have to do that ourselves,” says Hamilton. “It might have taken years to understand complex visual datasets like biological imagery, but if we can avoid spending 1,000 hours combing through data and labeling it, we can find and discover new information that we might have missed. We hope this will help us understand the visual word in a more empirically grounded way.”

    Looking ahead

    Despite its improvements, STEGO still faces certain challenges. One is that labels can be arbitrary. For example, the labels of the COCO-Stuff dataset distinguish between “food-things” like bananas and chicken wings, and “food-stuff” like grits and pasta. STEGO doesn’t see much of a distinction there. In other cases, STEGO was confused by odd images — like one of a banana sitting on a phone receiver — where the receiver was labeled “foodstuff,” instead of “raw material.” 

    For upcoming work, they’re planning to explore giving STEGO a bit more flexibility than just labeling pixels into a fixed number of classes as things in the real world can sometimes be multiple things at the same time (like “food”, “plant” and “fruit”). The authors hope this will give the algorithm room for uncertainty, trade-offs, and more abstract thinking.

    “In making a general tool for understanding potentially complicated datasets, we hope that this type of an algorithm can automate the scientific process of object discovery from images. There’s a lot of different domains where human labeling would be prohibitively expensive, or humans simply don’t even know the specific structure, like in certain biological and astrophysical domains. We hope that future work enables application to a very broad scope of datasets. Since you don’t need any human labels, we can now start to apply ML tools more broadly,” says Hamilton.

    “STEGO is simple, elegant, and very effective. I consider unsupervised segmentation to be a benchmark for progress in image understanding, and a very difficult problem. The research community has made terrific progress in unsupervised image understanding with the adoption of transformer architectures,” says Andrea Vedaldi, professor of computer vision and machine learning and a co-lead of the Visual Geometry Group at the engineering science department of the University of Oxford. “This research provides perhaps the most direct and effective demonstration of this progress on unsupervised segmentation.” 

    Hamilton wrote the paper alongside MIT CSAIL PhD student Zhoutong Zhang, Assistant Professor Bharath Hariharan of Cornell University, Associate Professor Noah Snavely of Cornell Tech, and MIT professor William T. Freeman. They will present the paper at the 2022 International Conference on Learning Representations (ICLR).  More

  • in

    Looking forward to forecast the risks of a changing climate

    On April 11, MIT announced five multiyear flagship projects in the first-ever Climate Grand Challenges, a new initiative to tackle complex climate problems and deliver breakthrough solutions to the world as quickly as possible. This article is the third in a five-part series highlighting the most promising concepts to emerge from the competition, and the interdisciplinary research teams behind them.

    Extreme weather events that were once considered rare have become noticeably less so, from intensifying hurricane activity in the North Atlantic to wildfires generating massive clouds of ozone-damaging smoke. But current climate models are unprepared when it comes to estimating the risk that these increasingly extreme events pose — and without adequate modeling, governments are left unable to take necessary precautions to protect their communities.

    MIT Department of Earth, Atmospheric and Planetary Science (EAPS) Professor Paul O’Gorman researches this trend by studying how climate affects the atmosphere and incorporating what he learns into climate models to improve their accuracy. One particular focus for O’Gorman has been changes in extreme precipitation and midlatitude storms that hit areas like New England.

    “These extreme events are having a lot of impact, but they’re also difficult to model or study,” he says. Seeing the pressing need for better climate models that can be used to develop preparedness plans and climate change mitigation strategies, O’Gorman and collaborators Kerry Emanuel, the Cecil and Ida Green Professor of Atmospheric Science in EAPS, and Miho Mazereeuw, associate professor in MIT’s Department of Architecture, are leading an interdisciplinary group of scientists, engineers, and designers to tackle this problem with their MIT Climate Grand Challenges flagship project, “Preparing for a new world of weather and climate extremes.”

    “We know already from observations and from climate model predictions that weather and climate extremes are changing and will change more,” O’Gorman says. “The grand challenge is preparing for those changing extremes.”

    Their proposal is one of five flagship projects recently announced by the MIT Climate Grand Challenges initiative — an Institute-wide effort catalyzing novel research and engineering innovations to address the climate crisis. Selected from a field of almost 100 submissions, the team will receive additional funding and exposure to help accelerate and scale their project goals. Other MIT collaborators on the proposal include researchers from the School of Engineering, the School of Architecture and Planning, the Office of Sustainability, the Center for Global Change Science, and the Institute for Data, Systems and Society.

    Weather risk modeling

    Fifteen years ago, Kerry Emanuel developed a simple hurricane model. It was based on physics equations, rather than statistics, and could run in real time, making it useful for modeling risk assessment. Emanuel wondered if similar models could be used for long-term risk assessment of other things, such as changes in extreme weather because of climate change.

    “I discovered, somewhat to my surprise and dismay, that almost all extant estimates of long-term weather risks in the United States are based not on physical models, but on historical statistics of the hazards,” says Emanuel. “The problem with relying on historical records is that they’re too short; while they can help estimate common events, they don’t contain enough information to make predictions for more rare events.”

    Another limitation of weather risk models which rely heavily on statistics: They have a built-in assumption that the climate is static.

    “Historical records rely on the climate at the time they were recorded; they can’t say anything about how hurricanes grow in a warmer climate,” says Emanuel. The models rely on fixed relationships between events; they assume that hurricane activity will stay the same, even while science is showing that warmer temperatures will most likely push typical hurricane activity beyond the tropics and into a much wider band of latitudes.

    As a flagship project, the goal is to eliminate this reliance on the historical record by emphasizing physical principles (e.g., the laws of thermodynamics and fluid mechanics) in next-generation models. The downside to this is that there are many variables that have to be included. Not only are there planetary-scale systems to consider, such as the global circulation of the atmosphere, but there are also small-scale, extremely localized events, like thunderstorms, that influence predictive outcomes.

    Trying to compute all of these at once is costly and time-consuming — and the results often can’t tell you the risk in a specific location. But there is a way to correct for this: “What’s done is to use a global model, and then use a method called downscaling, which tries to infer what would happen on very small scales that aren’t properly resolved by the global model,” explains O’Gorman. The team hopes to improve downscaling techniques so that they can be used to calculate the risk of very rare but impactful weather events.

    Global climate models, or general circulation models (GCMs), Emanuel explains, are constructed a bit like a jungle gym. Like the playground bars, the Earth is sectioned in an interconnected three-dimensional framework — only it’s divided 100 to 200 square kilometers at a time. Each node comprises a set of computations for characteristics like wind, rainfall, atmospheric pressure, and temperature within its bounds; the outputs of each node are connected to its neighbor. This framework is useful for creating a big picture idea of Earth’s climate system, but if you tried to zoom in on a specific location — like, say, to see what’s happening in Miami or Mumbai — the connecting nodes are too far apart to make predictions on anything specific to those areas.

    Scientists work around this problem by using downscaling. They use the same blueprint of the jungle gym, but within the nodes they weave a mesh of smaller features, incorporating equations for things like topography and vegetation or regional meteorological models to fill in the blanks. By creating a finer mesh over smaller areas they can predict local effects without needing to run the entire global model.

    Of course, even this finer-resolution solution has its trade-offs. While we might be able to gain a clearer picture of what’s happening in a specific region by nesting models within models, it can still make for a computing challenge to crunch all that data at once, with the trade-off being expense and time, or predictions that are limited to shorter windows of duration — where GCMs can be run considering decades or centuries, a particularly complex local model may be restricted to predictions on timescales of just a few years at a time.

    “I’m afraid that most of the downscaling at present is brute force, but I think there’s room to do it in better ways,” says Emanuel, who sees the problem of finding new and novel methods of achieving this goal as an intellectual challenge. “I hope that through the Grand Challenges project we might be able to get students, postdocs, and others interested in doing this in a very creative way.”

    Adapting to weather extremes for cities and renewable energy

    Improving climate modeling is more than a scientific exercise in creativity, however. There’s a very real application for models that can accurately forecast risk in localized regions.

    Another problem is that progress in climate modeling has not kept up with the need for climate mitigation plans, especially in some of the most vulnerable communities around the globe.

    “It is critical for stakeholders to have access to this data for their own decision-making process. Every community is composed of a diverse population with diverse needs, and each locality is affected by extreme weather events in unique ways,” says Mazereeuw, the director of the MIT Urban Risk Lab. 

    A key piece of the team’s project is building on partnerships the Urban Risk Lab has developed with several cities to test their models once they have a usable product up and running. The cities were selected based on their vulnerability to increasing extreme weather events, such as tropical cyclones in Broward County, Florida, and Toa Baja, Puerto Rico, and extratropical storms in Boston, Massachusetts, and Cape Town, South Africa.

    In their proposal, the team outlines a variety of deliverables that the cities can ultimately use in their climate change preparations, with ideas such as online interactive platforms and workshops with stakeholders — such as local governments, developers, nonprofits, and residents — to learn directly what specific tools they need for their local communities. By doing so, they can craft plans addressing different scenarios in their region, involving events such as sea-level rise or heat waves, while also providing information and means of developing adaptation strategies for infrastructure under these conditions that will be the most effective and efficient for them.

    “We are acutely aware of the inequity of resources both in mitigating impacts and recovering from disasters. Working with diverse communities through workshops allows us to engage a lot of people, listen, discuss, and collaboratively design solutions,” says Mazereeuw.

    By the end of five years, the team is hoping that they’ll have better risk assessment and preparedness tool kits, not just for the cities that they’re partnering with, but for others as well.

    “MIT is well-positioned to make progress in this area,” says O’Gorman, “and I think it’s an important problem where we can make a difference.” More

  • in

    Frequent encounters build familiarity

    Do better spatial networks make for better neighbors? There is evidence that they do, according to Paige Bollen, a sixth-year political science graduate student at MIT. The networks Bollen works with are not virtual but physical, part of the built environment in which we are all embedded. Her research on urban spaces suggests that the routes bringing people together or keeping them apart factor significantly in whether individuals see each other as friend or foe.

    “We all live in networks of streets, and come across different types of people,” says Bollen. “Just passing by others provides information that informs our political and social views of the world.” In her doctoral research, Bollen is revealing how physical context matters in determining whether such ordinary encounters engender suspicion or even hostility, while others can lead to cooperation and tolerance.

    Through her in-depth studies mapping the movement of people in urban communities in Ghana and South Africa, Bollen is demonstrating that even in diverse communities, “when people repeatedly come into contact, even if that contact is casual, they can build understanding that can lead to cooperation and positive outcomes,” she says. “My argument is that frequent, casual contact, facilitated by street networks, can make people feel more comfortable with those unlike themselves,” she says.

    Mapping urban networks

    Bollen’s case for the benefits of casual contact emerged from her pursuit of several related questions: Why do people in urban areas who regard other ethnic groups with prejudice and economic envy nevertheless manage to collaborate for a collective good? How do you reduce fears that arise from differences? How do the configuration of space and the built environment influence contact patterns among people?

    While other social science research suggests that there are weak ties in ethnically mixed urban communities, with casual contact exacerbating hostility, Bollen noted that there were plenty of examples of “cooperation across ethnic divisions in ethnically mixed communities.” She absorbed the work of psychologist Stanley Milgram, whose 1972 research showed that strangers seen frequently in certain places become familiar — less anonymous or threatening. So she set out to understand precisely how “the built environment of a neighborhood interacts with its demography to create distinct patterns of contact between social groups.”

    With the support of MIT Global Diversity Lab and MIT GOV/LAB, Bollen set out to develop measures of intergroup contact in cities in Ghana and South Africa. She uses street network data to predict contact patterns based on features of the built environment and then combines these measures with mobility data on peoples’ actual movement.

    “I created a huge dataset for every intersection in these cities, to determine the central nodes where many people are passing through,” she says. She combined these datasets with census data to determine which social groups were most likely to use specific intersections based on their position in a particular street network. She mapped these measures of casual contact to outcomes, such as inter-ethnic cooperation in Ghana and voting behavior in South Africa.

    “My analysis [in Ghana] showed that in areas that are more ethnically heterogeneous and where there are more people passing through intersections, we find more interconnections among people and more cooperation within communities in community development efforts,” she says.

    In a related survey experiment conducted on Facebook with 1,200 subjects, Bollen asked Accra residents if they would help an unknown non-co-ethnic in need with a financial gift. She found that the likelihood of offering such help was strongly linked to the frequency of interactions. “Helping behavior occurred when the subjects believed they would see this person again, even when they did not know the person in need well,” says Bollen. “They figured if they helped, they could count on this person’s reciprocity in the future.”

    For Bollen, this was “a powerful gut check” for her hypothesis that “frequency builds familiarity, because frequency provides information and drives expectations, which means it can reduce uncertainty and fear of the other.”

    In research underway in South Africa, a nation increasingly dealing with anti-immigrant violence, Bollen is investigating whether frequency of contact reduces prejudice against foreigners. Using her detailed street maps, 1.1 billion unique geolocated cellphone pings, and election data, she finds that frequent contact opportunities with immigrants are associated with lower support for anti-immigrant party voting.    Passion for places and spaces

    Bollen never anticipated becoming a political scientist. The daughter of two academics, she was “bent on becoming a data scientist.” But she was also “always interested in why people behave in certain ways and how this influences macro trends.”

    As an undergraduate at Tufts University, she became interested in international affairs. But it was her 2013 fieldwork studying women-only carriages in Delhi, India’s metro system, that proved formative. “I interviewed women for a month, talking to them about how these cars enabled them to participate in public life,” she recalls. Another project involving informal transportation routes in Cape Town, South Africa, immersed her more deeply in the questions of people’s experience of public space. “I left college thinking about mobility and public space, and I discovered how much I love geographic information systems,” she says.

    A gig with the Commonwealth of Massachusetts to improve the 911 emergency service — updating and cleaning geolocations of addresses using Google Street View — further piqued her interest. “The job was tedious, but I realized you can really understand a place, and how people move around, from these images.” Bollen began thinking about a career in urban planning.

    Then a two-year stint as a researcher at MIT GOV/LAB brought Bollen firmly into the political science fold. Working with Lily Tsai, the Ford Professor of Political Science, on civil society partnerships in the developing world, Bollen realized that “political science wasn’t what I thought it was,” she says. “You could bring psychology, economics, and sociology into thinking about politics.” Her decision to join the doctoral program was simple: “I knew and loved the people I was with at MIT.”

    Bollen has not regretted that decision. “All the things I’ve been interested in are finally coming together in my dissertation,” she says. Due to the pandemic, questions involving space, mobility, and contact became sharper to her. “I shifted my research emphasis from asking people about inter-ethnic differences and inequality through surveys, to using contact and context information to measure these variables.”

    She sees a number of applications for her work, including working with civil society organizations in communities touched by ethnic or other frictions “to rethink what we know about contact, challenging some of the classic things we think we know.”

    As she moves into the final phases of her dissertation, which she hopes to publish as a book, Bollen also relishes teaching comparative politics to undergraduates. “There’s something so fun engaging with them, and making their arguments stronger,” she says. With the long process of earning a PhD, this helps her “enjoy what she is doing every single day.” More