More stories

  • in

    A new state of the art for unsupervised vision

    Labeling data can be a chore. It’s the main source of sustenance for computer-vision models; without it, they’d have a lot of difficulty identifying objects, people, and other important image characteristics. Yet producing just an hour of tagged and labeled data can take a whopping 800 hours of human time. Our high-fidelity understanding of the world develops as machines can better perceive and interact with our surroundings. But they need more help.

    Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Microsoft, and Cornell University have attempted to solve this problem plaguing vision models by creating “STEGO,” an algorithm that can jointly discover and segment objects without any human labels at all, down to the pixel.

    STEGO learns something called “semantic segmentation” — fancy speak for the process of assigning a label to every pixel in an image. Semantic segmentation is an important skill for today’s computer-vision systems because images can be cluttered with objects. Even more challenging is that these objects don’t always fit into literal boxes; algorithms tend to work better for discrete “things” like people and cars as opposed to “stuff” like vegetation, sky, and mashed potatoes. A previous system might simply perceive a nuanced scene of a dog playing in the park as just a dog, but by assigning every pixel of the image a label, STEGO can break the image into its main ingredients: a dog, sky, grass, and its owner.

    Play video

    A new state of the art for unsupervised computer vision

    Assigning every single pixel of the world a label is ambitious — especially without any kind of feedback from humans. The majority of algorithms today get their knowledge from mounds of labeled data, which can take painstaking human-hours to source. Just imagine the excitement of labeling every pixel of 100,000 images! To discover these objects without a human’s helpful guidance, STEGO looks for similar objects that appear throughout a dataset. It then associates these similar objects together to construct a consistent view of the world across all of the images it learns from.

    Seeing the world

    Machines that can “see” are crucial for a wide array of new and emerging technologies like self-driving cars and predictive modeling for medical diagnostics. Since STEGO can learn without labels, it can detect objects in many different domains, even those that humans don’t yet understand fully. 

    “If you’re looking at oncological scans, the surface of planets, or high-resolution biological images, it’s hard to know what objects to look for without expert knowledge. In emerging domains, sometimes even human experts don’t know what the right objects should be,” says Mark Hamilton, a PhD student in electrical engineering and computer science at MIT, research affiliate of MIT CSAIL, software engineer at Microsoft, and lead author on a new paper about STEGO. “In these types of situations where you want to design a method to operate at the boundaries of science, you can’t rely on humans to figure it out before machines do.”

    STEGO was tested on a slew of visual domains spanning general images, driving images, and high-altitude aerial photographs. In each domain, STEGO was able to identify and segment relevant objects that were closely aligned with human judgments. STEGO’s most diverse benchmark was the COCO-Stuff dataset, which is made up of diverse images from all over the world, from indoor scenes to people playing sports to trees and cows. In most cases, the previous state-of-the-art system could capture a low-resolution gist of a scene, but struggled on fine-grained details: A human was a blob, a motorcycle was captured as a person, and it couldn’t recognize any geese. On the same scenes, STEGO doubled the performance of previous systems and discovered concepts like animals, buildings, people, furniture, and many others.

    STEGO not only doubled the performance of prior systems on the COCO-Stuff benchmark, but made similar leaps forward in other visual domains. When applied to driverless car datasets, STEGO successfully segmented out roads, people, and street signs with much higher resolution and granularity than previous systems. On images from space, the system broke down every single square foot of the surface of the Earth into roads, vegetation, and buildings. 

    Connecting the pixels

    STEGO — which stands for “Self-supervised Transformer with Energy-based Graph Optimization” — builds on top of the DINO algorithm, which learned about the world through 14 million images from the ImageNet database. STEGO refines the DINO backbone through a learning process that mimics our own way of stitching together pieces of the world to make meaning. 

    For example, you might consider two images of dogs walking in the park. Even though they’re different dogs, with different owners, in different parks, STEGO can tell (without humans) how each scene’s objects relate to each other. The authors even probe STEGO’s mind to see how each little, brown, furry thing in the images are similar, and likewise with other shared objects like grass and people. By connecting objects across images, STEGO builds a consistent view of the word.

    “The idea is that these types of algorithms can find consistent groupings in a largely automated fashion so we don’t have to do that ourselves,” says Hamilton. “It might have taken years to understand complex visual datasets like biological imagery, but if we can avoid spending 1,000 hours combing through data and labeling it, we can find and discover new information that we might have missed. We hope this will help us understand the visual word in a more empirically grounded way.”

    Looking ahead

    Despite its improvements, STEGO still faces certain challenges. One is that labels can be arbitrary. For example, the labels of the COCO-Stuff dataset distinguish between “food-things” like bananas and chicken wings, and “food-stuff” like grits and pasta. STEGO doesn’t see much of a distinction there. In other cases, STEGO was confused by odd images — like one of a banana sitting on a phone receiver — where the receiver was labeled “foodstuff,” instead of “raw material.” 

    For upcoming work, they’re planning to explore giving STEGO a bit more flexibility than just labeling pixels into a fixed number of classes as things in the real world can sometimes be multiple things at the same time (like “food”, “plant” and “fruit”). The authors hope this will give the algorithm room for uncertainty, trade-offs, and more abstract thinking.

    “In making a general tool for understanding potentially complicated datasets, we hope that this type of an algorithm can automate the scientific process of object discovery from images. There’s a lot of different domains where human labeling would be prohibitively expensive, or humans simply don’t even know the specific structure, like in certain biological and astrophysical domains. We hope that future work enables application to a very broad scope of datasets. Since you don’t need any human labels, we can now start to apply ML tools more broadly,” says Hamilton.

    “STEGO is simple, elegant, and very effective. I consider unsupervised segmentation to be a benchmark for progress in image understanding, and a very difficult problem. The research community has made terrific progress in unsupervised image understanding with the adoption of transformer architectures,” says Andrea Vedaldi, professor of computer vision and machine learning and a co-lead of the Visual Geometry Group at the engineering science department of the University of Oxford. “This research provides perhaps the most direct and effective demonstration of this progress on unsupervised segmentation.” 

    Hamilton wrote the paper alongside MIT CSAIL PhD student Zhoutong Zhang, Assistant Professor Bharath Hariharan of Cornell University, Associate Professor Noah Snavely of Cornell Tech, and MIT professor William T. Freeman. They will present the paper at the 2022 International Conference on Learning Representations (ICLR).  More

  • in

    Generating new molecules with graph grammar

    Chemical engineers and materials scientists are constantly looking for the next revolutionary material, chemical, and drug. The rise of machine-learning approaches is expediting the discovery process, which could otherwise take years. “Ideally, the goal is to train a machine-learning model on a few existing chemical samples and then allow it to produce as many manufacturable molecules of the same class as possible, with predictable physical properties,” says Wojciech Matusik, professor of electrical engineering and computer science at MIT. “If you have all these components, you can build new molecules with optimal properties, and you also know how to synthesize them. That’s the overall vision that people in that space want to achieve”

    However, current techniques, mainly deep learning, require extensive datasets for training models, and many class-specific chemical datasets contain a handful of example compounds, limiting their ability to generalize and generate physical molecules that could be created in the real world.

    Now, a new paper from researchers at MIT and IBM tackles this problem using a generative graph model to build new synthesizable molecules within the same chemical class as their training data. To do this, they treat the formation of atoms and chemical bonds as a graph and develop a graph grammar — a linguistics analogy of systems and structures for word ordering — that contains a sequence of rules for building molecules, such as monomers and polymers. Using the grammar and production rules that were inferred from the training set, the model can not only reverse engineer its examples, but can create new compounds in a systematic and data-efficient way. “We basically built a language for creating molecules,” says Matusik “This grammar essentially is the generative model.”

    Matusik’s co-authors include MIT graduate students Minghao Guo, who is the lead author, and Beichen Li as well as Veronika Thost, Payal Das, and Jie Chen, research staff members with IBM Research. Matusik, Thost, and Chen are affiliated with the MIT-IBM Watson AI Lab. Their method, which they’ve called data-efficient graph grammar (DEG), will be presented at the International Conference on Learning Representations.

    “We want to use this grammar representation for monomer and polymer generation, because this grammar is explainable and expressive,” says Guo. “With only a few number of the production rules, we can generate many kinds of structures.”

    A molecular structure can be thought of as a symbolic representation in a graph — a string of atoms (nodes) joined together by chemical bonds (edges). In this method, the researchers allow the model to take the chemical structure and collapse a substructure of the molecule down to one node; this may be two atoms connected by a bond, a short sequence of bonded atoms, or a ring of atoms. This is done repeatedly, creating the production rules as it goes, until a single node remains. The rules and grammar then could be applied in the reverse order to recreate the training set from scratch or combined in different combinations to produce new molecules of the same chemical class.

    “Existing graph generation methods would produce one node or one edge sequentially at a time, but we are looking at higher-level structures and, specifically, exploiting chemistry knowledge, so that we don’t treat the individual atoms and bonds as the unit. This simplifies the generation process and also makes it more data-efficient to learn,” says Chen.

    Further, the researchers optimized the technique so that the bottom-up grammar was relatively simple and straightforward, such that it fabricated molecules that could be made.

    “If we switch the order of applying these production rules, we would get another molecule; what’s more, we can enumerate all the possibilities and generate tons of them,” says Chen. “Some of these molecules are valid and some of them not, so the learning of the grammar itself is actually to figure out a minimal collection of production rules, such that the percentage of molecules that can actually be synthesized is maximized.” While the researchers concentrated on three training sets of less than 33 samples each — acrylates, chain extenders, and isocyanates — they note that the process could be applied to any chemical class.

    To see how their method performed, the researchers tested DEG against other state-of-the-art models and techniques, looking at percentages of chemically valid and unique molecules, diversity of those created, success rate of retrosynthesis, and percentage of molecules belonging to the training data’s monomer class.

    “We clearly show that, for the synthesizability and membership, our algorithm outperforms all the existing methods by a very large margin, while it’s comparable for some other widely-used metrics,” says Guo. Further, “what is amazing about our algorithm is that we only need about 0.15 percent of the original dataset to achieve very similar results compared to state-of-the-art approaches that train on tens of thousands of samples. Our algorithm can specifically handle the problem of data sparsity.”

    In the immediate future, the team plans to address scaling up this grammar learning process to be able to generate large graphs, as well as produce and identify chemicals with desired properties.

    Down the road, the researchers see many applications for the DEG method, as it’s adaptable beyond generating new chemical structures, the team points out. A graph is a very flexible representation, and many entities can be symbolized in this form — robots, vehicles, buildings, and electronic circuits, for example. “Essentially, our goal is to build up our grammar, so that our graphic representation can be widely used across many different domains,” says Guo, as “DEG can automate the design of novel entities and structures,” says Chen.

    This research was supported, in part, by the MIT-IBM Watson AI Lab and Evonik. More

  • in

    Improving predictions of sea level rise for the next century

    When we think of climate change, one of the most dramatic images that comes to mind is the loss of glacial ice. As the Earth warms, these enormous rivers of ice become a casualty of the rising temperatures. But, as ice sheets retreat, they also become an important contributor to one the more dangerous outcomes of climate change: sea-level rise. At MIT, an interdisciplinary team of scientists is determined to improve sea level rise predictions for the next century, in part by taking a closer look at the physics of ice sheets.

    Last month, two research proposals on the topic, led by Brent Minchew, the Cecil and Ida Green Career Development Professor in the Department of Earth, Atmospheric and Planetary Sciences (EAPS), were announced as finalists in the MIT Climate Grand Challenges initiative. Launched in July 2020, Climate Grand Challenges fielded almost 100 project proposals from collaborators across the Institute who heeded the bold charge: to develop research and innovations that will deliver game-changing advances in the world’s efforts to address the climate challenge.

    As finalists, Minchew and his collaborators from the departments of Urban Studies and Planning, Economics, Civil and Environmental Engineering, the Haystack Observatory, and external partners, received $100,000 to develop their research plans. A subset of the 27 proposals tapped as finalists will be announced next month, making up a portfolio of multiyear “flagship” projects receiving additional funding and support.

    One goal of both Minchew proposals is to more fully understand the most fundamental processes that govern rapid changes in glacial ice, and to use that understanding to build next-generation models that are more predictive of ice sheet behavior as they respond to, and influence, climate change.

    “We need to develop more accurate and computationally efficient models that provide testable projections of sea-level rise over the coming decades. To do so quickly, we want to make better and more frequent observations and learn the physics of ice sheets from these data,” says Minchew. “For example, how much stress do you have to apply to ice before it breaks?”

    Currently, Minchew’s Glacier Dynamics and Remote Sensing group uses satellites to observe the ice sheets on Greenland and Antarctica primarily with interferometric synthetic aperture radar (InSAR). But the data are often collected over long intervals of time, which only gives them “before and after” snapshots of big events. By taking more frequent measurements on shorter time scales, such as hours or days, they can get a more detailed picture of what is happening in the ice.

    “Many of the key unknowns in our projections of what ice sheets are going to look like in the future, and how they’re going to evolve, involve the dynamics of glaciers, or our understanding of how the flow speed and the resistances to flow are related,” says Minchew.

    At the heart of the two proposals is the creation of SACOS, the Stratospheric Airborne Climate Observatory System. The group envisions developing solar-powered drones that can fly in the stratosphere for months at a time, taking more frequent measurements using a new lightweight, low-power radar and other high-resolution instrumentation. They also propose air-dropping sensors directly onto the ice, equipped with seismometers and GPS trackers to measure high-frequency vibrations in the ice and pinpoint the motions of its flow.

    How glaciers contribute to sea level rise

    Current climate models predict an increase in sea levels over the next century, but by just how much is still unclear. Estimates are anywhere from 20 centimeters to two meters, which is a large difference when it comes to enacting policy or mitigation. Minchew points out that response measures will be different, depending on which end of the scale it falls toward. If it’s closer to 20 centimeters, coastal barriers can be built to protect low-level areas. But with higher surges, such measures become too expensive and inefficient to be viable, as entire portions of cities and millions of people would have to be relocated.

    “If we’re looking at a future where we could get more than a meter of sea level rise by the end of the century, then we need to know about that sooner rather than later so that we can start to plan and to do our best to prepare for that scenario,” he says.

    There are two ways glaciers and ice sheets contribute to rising sea levels: direct melting of the ice and accelerated transport of ice to the oceans. In Antarctica, warming waters melt the margins of the ice sheets, which tends to reduce the resistive stresses and allow ice to flow more quickly to the ocean. This thinning can also cause the ice shelves to be more prone to fracture, facilitating the calving of icebergs — events which sometimes cause even further acceleration of ice flow.

    Using data collected by SACOS, Minchew and his group can better understand what material properties in the ice allow for fracturing and calving of icebergs, and build a more complete picture of how ice sheets respond to climate forces. 

    “What I want is to reduce and quantify the uncertainties in projections of sea level rise out to the year 2100,” he says.

    From that more complete picture, the team — which also includes economists, engineers, and urban planning specialists — can work on developing predictive models and methods to help communities and governments estimate the costs associated with sea level rise, develop sound infrastructure strategies, and spur engineering innovation.

    Understanding glacier dynamics

    More frequent radar measurements and the collection of higher-resolution seismic and GPS data will allow Minchew and the team to develop a better understanding of the broad category of glacier dynamics — including calving, an important process in setting the rate of sea level rise which is currently not well understood.  

    “Some of what we’re doing is quite similar to what seismologists do,” he says. “They measure seismic waves following an earthquake, or a volcanic eruption, or things of this nature and use those observations to better understand the mechanisms that govern these phenomena.”

    Air-droppable sensors will help them collect information about ice sheet movement, but this method comes with drawbacks — like installation and maintenance, which is difficult to do out on a massive ice sheet that is moving and melting. Also, the instruments can each only take measurements at a single location. Minchew equates it to a bobber in water: All it can tell you is how the bobber moves as the waves disturb it.

    But by also taking continuous radar measurements from the air, Minchew’s team can collect observations both in space and in time. Instead of just watching the bobber in the water, they can effectively make a movie of the waves propagating out, as well as visualize processes like iceberg calving happening in multiple dimensions.

    Once the bobbers are in place and the movies recorded, the next step is developing machine learning algorithms to help analyze all the new data being collected. While this data-driven kind of discovery has been a hot topic in other fields, this is the first time it has been applied to glacier research.

    “We’ve developed this new methodology to ingest this huge amount of data,” he says, “and from that create an entirely new way of analyzing the system to answer these fundamental and critically important questions.”  More

  • in

    Security tool guarantees privacy in surveillance footage

    Surveillance cameras have an identity problem, fueled by an inherent tension between utility and privacy. As these powerful little devices have cropped up seemingly everywhere, the use of machine learning tools has automated video content analysis at a massive scale — but with increasing mass surveillance, there are currently no legally enforceable rules to limit privacy invasions. 

    Security cameras can do a lot — they’ve become smarter and supremely more competent than their ghosts of grainy pictures past, the ofttimes “hero tool” in crime media. (“See that little blurry blue blob in the right hand corner of that densely populated corner — we got him!”) Now, video surveillance can help health officials measure the fraction of people wearing masks, enable transportation departments to monitor the density and flow of vehicles, bikes, and pedestrians, and provide businesses with a better understanding of shopping behaviors. But why has privacy remained a weak afterthought? 

    The status quo is to retrofit video with blurred faces or black boxes. Not only does this prevent analysts from asking some genuine queries (e.g., Are people wearing masks?), it also doesn’t always work; the system may miss some faces and leave them unblurred for the world to see. Dissatisfied with this status quo, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), in collaboration with other institutions, came up with a system to better guarantee privacy in video footage from surveillance cameras. Called “Privid,” the system lets analysts submit video data queries, and adds a little bit of noise (extra data) to the end result to ensure that an individual can’t be identified. The system builds on a formal definition of privacy — “differential privacy” — which allows access to aggregate statistics about private data without revealing personally identifiable information.

    Typically, analysts would just have access to the entire video to do whatever they want with it, but Privid makes sure the video isn’t a free buffet. Honest analysts can get access to the information they need, but that access is restrictive enough that malicious analysts can’t do too much with it. To enable this, rather than running the code over the entire video in one shot, Privid breaks the video into small pieces and runs processing code over each chunk. Instead of getting results back from each piece, the segments are aggregated, and that additional noise is added. (There’s also information on the error bound you’re going to get on your result — maybe a 2 percent error margin, given the extra noisy data added). 

    For example, the code might output the number of people observed in each video chunk, and the aggregation might be the “sum,” to count the total number of people wearing face coverings, or the “average” to estimate the density of crowds. 

    Privid allows analysts to use their own deep neural networks that are commonplace for video analytics today. This gives analysts the flexibility to ask questions that the designers of Privid did not anticipate. Across a variety of videos and queries, Privid was accurate within 79 to 99 percent of a non-private system.

    “We’re at a stage right now where cameras are practically ubiquitous. If there’s a camera on every street corner, every place you go, and if someone could actually process all of those videos in aggregate, you can imagine that entity building a very precise timeline of when and where a person has gone,” says MIT CSAIL PhD student ​​Frank Cangialosi, the lead author on a paper about Privid. “People are already worried about location privacy with GPS — video data in aggregate could capture not only your location history, but also moods, behaviors, and more at each location.” 

    Privid introduces a new notion of “duration-based privacy,” which decouples the definition of privacy from its enforcement — with obfuscation, if your privacy goal is to protect all people, the enforcement mechanism needs to do some work to find the people to protect, which it may or may not do perfectly. With this mechanism, you don’t need to fully specify everything, and you’re not hiding more information than you need to. 

    Let’s say we have a video overlooking a street. Two analysts, Alice and Bob, both claim they want to count the number of people that pass by each hour, so they submit a video processing module and ask for a sum aggregation.

    The first analyst is the city planning department, which hopes to use this information to understand footfall patterns and plan sidewalks for the city. Their model counts people and outputs this count for each video chunk.

    The other analyst is malicious. They hope to identify every time “Charlie” passes by the camera. Their model only looks for Charlie’s face, and outputs a large number if Charlie is present (i.e., the “signal” they’re trying to extract), or zero otherwise. Their hope is that the sum will be non-zero if Charlie was present. 

    From Privid’s perspective, these two queries look identical. It’s hard to reliably determine what their models might be doing internally, or what the analyst hopes to use the data for. This is where the noise comes in. Privid executes both of the queries, and adds the same amount of noise for each. In the first case, because Alice was counting all people, this noise will only have a small impact on the result, but likely won’t impact the usefulness. 

    In the second case, since Bob was looking for a specific signal (Charlie was only visible for a few chunks), the noise is enough to prevent them from knowing if Charlie was there or not. If they see a non-zero result, it might be because Charlie was actually there, or because the model outputs “zero,” but the noise made it non-zero. Privid didn’t need to know anything about when or where Charlie appeared, the system just needed to know a rough upper bound on how long Charlie might appear for, which is easier to specify than figuring out the exact locations, which prior methods rely on. 

    The challenge is determining how much noise to add — Privid wants to add just enough to hide everyone, but not so much that it would be useless for analysts. Adding noise to the data and insisting on queries over time windows means that your result isn’t going to be as accurate as it could be, but the results are still useful while providing better privacy. 

    Cangialosi wrote the paper with Princeton PhD student Neil Agarwal, MIT CSAIL PhD student Venkat Arun, assistant professor at the University of Chicago Junchen Jiang, assistant professor at Rutgers University and former MIT CSAIL postdoc Srinivas Narayana, associate professor at Rutgers University Anand Sarwate, and assistant professor at Princeton University and Ravi Netravali SM ’15, PhD ’18. Cangialosi will present the paper at the USENIX Symposium on Networked Systems Design and Implementation Conference in April in Renton, Washington. 

    This work was partially supported by a Sloan Research Fellowship and National Science Foundation grants. More

  • in

    A tool for predicting the future

    Whether someone is trying to predict tomorrow’s weather, forecast future stock prices, identify missed opportunities for sales in retail, or estimate a patient’s risk of developing a disease, they will likely need to interpret time-series data, which are a collection of observations recorded over time.

    Making predictions using time-series data typically requires several data-processing steps and the use of complex machine-learning algorithms, which have such a steep learning curve they aren’t readily accessible to nonexperts.

    To make these powerful tools more user-friendly, MIT researchers developed a system that directly integrates prediction functionality on top of an existing time-series database. Their simplified interface, which they call tspDB (time series predict database), does all the complex modeling behind the scenes so a nonexpert can easily generate a prediction in only a few seconds.

    The new system is more accurate and more efficient than state-of-the-art deep learning methods when performing two tasks: predicting future values and filling in missing data points.

    One reason tspDB is so successful is that it incorporates a novel time-series-prediction algorithm, explains electrical engineering and computer science (EECS) graduate student Abdullah Alomar, an author of a recent research paper in which he and his co-authors describe the algorithm. This algorithm is especially effective at making predictions on multivariate time-series data, which are data that have more than one time-dependent variable. In a weather database, for instance, temperature, dew point, and cloud cover each depend on their past values.

    The algorithm also estimates the volatility of a multivariate time series to provide the user with a confidence level for its predictions.

    “Even as the time-series data becomes more and more complex, this algorithm can effectively capture any time-series structure out there. It feels like we have found the right lens to look at the model complexity of time-series data,” says senior author Devavrat Shah, the Andrew and Erna Viterbi Professor in EECS and a member of the Institute for Data, Systems, and Society and of the Laboratory for Information and Decision Systems.

    Joining Alomar and Shah on the paper is lead author Anish Agrawal, a former EECS graduate student who is currently a postdoc at the Simons Institute at the University of California at Berkeley. The research will be presented at the ACM SIGMETRICS conference.

    Adapting a new algorithm

    Shah and his collaborators have been working on the problem of interpreting time-series data for years, adapting different algorithms and integrating them into tspDB as they built the interface.

    About four years ago, they learned about a particularly powerful classical algorithm, called singular spectrum analysis (SSA), that imputes and forecasts single time series. Imputation is the process of replacing missing values or correcting past values. While this algorithm required manual parameter selection, the researchers suspected it could enable their interface to make effective predictions using time series data. In earlier work, they removed this need to manually intervene for algorithmic implementation.  

    The algorithm for single time series transformed it into a matrix and utilized matrix estimation procedures. The key intellectual challenge was how to adapt it to utilize multiple time series.  After a few years of struggle, they realized the answer was something very simple: “Stack” the matrices for each individual time series, treat it as a one big matrix, and then apply the single time-series algorithm on it.

    This utilizes information across multiple time series naturally — both across the time series and across time, which they describe in their new paper.

    This recent publication also discusses interesting alternatives, where instead of transforming the multivariate time series into a big matrix, it is viewed as a three-dimensional tensor. A tensor is a multi-dimensional array, or grid, of numbers. This established a promising connection between the classical field of time series analysis and the growing field of tensor estimation, Alomar says.

    “The variant of mSSA that we introduced actually captures all of that beautifully. So, not only does it provide the most likely estimation, but a time-varying confidence interval, as well,” Shah says.

    The simpler, the better

    They tested the adapted mSSA against other state-of-the-art algorithms, including deep-learning methods, on real-world time-series datasets with inputs drawn from the electricity grid, traffic patterns, and financial markets.

    Their algorithm outperformed all the others on imputation and it outperformed all but one of the other algorithms when it came to forecasting future values. The researchers also demonstrated that their tweaked version of mSSA can be applied to any kind of time-series data.

    “One reason I think this works so well is that the model captures a lot of time series dynamics, but at the end of the day, it is still a simple model. When you are working with something simple like this, instead of a neural network that can easily overfit the data, you can actually perform better,” Alomar says.

    The impressive performance of mSSA is what makes tspDB so effective, Shah explains. Now, their goal is to make this algorithm accessible to everyone.

    One a user installs tspDB on top of an existing database, they can run a prediction query with just a few keystrokes in about 0.9 milliseconds, as compared to 0.5 milliseconds for a standard search query. The confidence intervals are also designed to help nonexperts to make a more informed decision by incorporating the degree of uncertainty of the predictions into their decision making.

    For instance, the system could enable a nonexpert to predict future stock prices with high accuracy in just a few minutes, even if the time-series dataset contains missing values.

    Now that the researchers have shown why mSSA works so well, they are targeting new algorithms that can be incorporated into tspDB. One of these algorithms utilizes the same model to automatically enable change point detection, so if the user believes their time series will change its behavior at some point, the system will automatically detect that change and incorporate that into its predictions.

    They also want to continue gathering feedback from current tspDB users to see how they can improve the system’s functionality and user-friendliness, Shah says.

    “Our interest at the highest level is to make tspDB a success in the form of a broadly utilizable, open-source system. Time-series data are very important, and this is a beautiful concept of actually building prediction functionalities directly into the database. It has never been done before, and so we want to make sure the world uses it,” he says.

    “This work is very interesting for a number of reasons. It provides a practical variant of mSSA which requires no hand tuning, they provide the first known analysis of mSSA, and the authors demonstrate the real-world value of their algorithm by being competitive with or out-performing several known algorithms for imputations and predictions in (multivariate) time series for several real-world data sets,” says Vishal Misra, a professor of computer science at Columbia University who was not involved with this research. “At the heart of it all is the beautiful modeling work where they cleverly exploit correlations across time (within a time series) and space (across time series) to create a low-rank spatiotemporal factor representation of a multivariate time series. Importantly this model connects the field of time series analysis to that of the rapidly evolving topic of tensor completion, and I expect a lot of follow-on research spurred by this paper.” More

  • in

    How artificial intelligence can help combat systemic racism

    In 2020, Detroit police arrested a Black man for shoplifting almost $4,000 worth of watches from an upscale boutique. He was handcuffed in front of his family and spent a night in lockup. After some questioning, however, it became clear that they had the wrong man. So why did they arrest him in the first place?

    The reason: a facial recognition algorithm had matched the photo on his driver’s license to grainy security camera footage.

    Facial recognition algorithms — which have repeatedly been demonstrated to be less accurate for people with darker skin — are just one example of how racial bias gets replicated within and perpetuated by emerging technologies.

    “There’s an urgency as AI is used to make really high-stakes decisions,” says MLK Visiting Professor S. Craig Watkins, whose academic home for his time at MIT is the Institute for Data, Systems, and Society (IDSS). “The stakes are higher because new systems can replicate historical biases at scale.”

    Watkins, a professor at the University of Texas at Austin and the founding director of the Institute for Media Innovation​, researches the impacts of media and data-based systems on human behavior, with a specific concentration on issues related to systemic racism. “One of the fundamental questions of the work is: how do we build AI models that deal with systemic inequality more effectively?”

    Play video

    Artificial Intelligence and the Future of Racial Justice | S. Craig Watkins | TEDxMIT

    Ethical AI

    Inequality is perpetuated by technology in many ways across many sectors. One broad domain is health care, where Watkins says inequity shows up in both quality of and access to care. The demand for mental health care, for example, far outstrips the capacity for services in the United States. That demand has been exacerbated by the pandemic, and access to care is harder for communities of color.

    For Watkins, taking the bias out of the algorithm is just one component of building more ethical AI. He works also to develop tools and platforms that can address inequality outside of tech head-on. In the case of mental health access, this entails developing a tool to help mental health providers deliver care more efficiently.

    “We are building a real-time data collection platform that looks at activities and behaviors and tries to identify patterns and contexts in which certain mental states emerge,” says Watkins. “The goal is to provide data-informed insights to care providers in order to deliver higher-impact services.”

    Watkins is no stranger to the privacy concerns such an app would raise. He takes a user-centered approach to the development that is grounded in data ethics. “Data rights are a significant component,” he argues. “You have to give the user complete control over how their data is shared and used and what data a care provider sees. No one else has access.”

    Combating systemic racism

    Here at MIT, Watkins has joined the newly launched Initiative on Combatting Systemic Racism (ICSR), an IDSS research collaboration that brings together faculty and researchers from the MIT Stephen A. Schwarzman College of Computing and beyond. The aim of the ICSR is to develop and harness computational tools that can help effect structural and normative change toward racial equity.

    The ICSR collaboration has separate project teams researching systemic racism in different sectors of society, including health care. Each of these “verticals” addresses different but interconnected issues, from sustainability to employment to gaming. Watkins is a part of two ICSR groups, policing and housing, that aim to better understand the processes that lead to discriminatory practices in both sectors. “Discrimination in housing contributes significantly to the racial wealth gap in the U.S.,” says Watkins.

    The policing team examines patterns in how different populations get policed. “There is obviously a significant and charged history to policing and race in America,” says Watkins. “This is an attempt to understand, to identify patterns, and note regional differences.”

    Watkins and the policing team are building models using data that details police interventions, responses, and race, among other variables. The ICSR is a good fit for this kind of research, says Watkins, who notes the interdisciplinary focus of both IDSS and the SCC. 

    “Systemic change requires a collaborative model and different expertise,” says Watkins. “We are trying to maximize influence and potential on the computational side, but we won’t get there with computation alone.”

    Opportunities for change

    Models can also predict outcomes, but Watkins is careful to point out that no algorithm alone will solve racial challenges.

    “Models in my view can inform policy and strategy that we as humans have to create. Computational models can inform and generate knowledge, but that doesn’t equate with change.” It takes additional work — and additional expertise in policy and advocacy — to use knowledge and insights to strive toward progress.

    One important lever of change, he argues, will be building a more AI-literate society through access to information and opportunities to understand AI and its impact in a more dynamic way. He hopes to see greater data rights and greater understanding of how societal systems impact our lives.

    “I was inspired by the response of younger people to the murders of George Floyd and Breonna Taylor,” he says. “Their tragic deaths shine a bright light on the real-world implications of structural racism and has forced the broader society to pay more attention to this issue, which creates more opportunities for change.” More

  • in

    Computational modeling guides development of new materials

    Metal-organic frameworks, a class of materials with porous molecular structures, have a variety of possible applications, such as capturing harmful gases and catalyzing chemical reactions. Made of metal atoms linked by organic molecules, they can be configured in hundreds of thousands of different ways.

    To help researchers sift through all of the possible metal-organic framework (MOF) structures and help identify the ones that would be most practical for a particular application, a team of MIT computational chemists has developed a model that can analyze the features of a MOF structure and predict if it will be stable enough to be useful.

    The researchers hope that these computational predictions will help cut the development time of new MOFs.

    “This will allow researchers to test the promise of specific materials before they go through the trouble of synthesizing them,” says Heather Kulik, an associate professor of chemical engineering at MIT.

    The MIT team is now working to develop MOFs that could be used to capture methane gas and convert it to useful compounds such as fuels.

    The researchers described their new model in two papers, one in the Journal of the American Chemical Society and one in Scientific Data. Graduate students Aditya Nandy and Gianmarco Terrones are the lead authors of the Scientific Data paper, and Nandy is also the lead author of the JACS paper. Kulik is the senior author of both papers.

    Modeling structure

    MOFs consist of metal atoms joined by organic molecules called linkers to create a rigid, cage-like structure. The materials also have many pores, which makes them useful for catalyzing reactions involving gases but can also make them less structurally stable.

    “The limitation in seeing MOFs realized at industrial scale is that although we can control their properties by controlling where each atom is in the structure, they’re not necessarily that stable, as far as materials go,” Kulik says. “They’re very porous and they can degrade under realistic conditions that we need for catalysis.”

    Scientists have been working on designing MOFs for more than 20 years, and thousands of possible structures have been published. A centralized repository contains about 10,000 of these structures but is not linked to any of the published findings on the properties of those structures.

    Kulik, who specializes in using computational modeling to discover structure-property relationships of materials, wanted to take a more systematic approach to analyzing and classifying the properties of MOFs.

    “When people make these now, it’s mostly trial and error. The MOF dataset is really promising because there are so many people excited about MOFs, so there’s so much to learn from what everyone’s been working on, but at the same time, it’s very noisy and it’s not systematic the way it’s reported,” she says.

    Kulik and her colleagues set out to analyze published reports of MOF structures and properties using a natural-language-processing algorithm. Using this algorithm, they scoured nearly 4,000 published papers, extracting information on the temperature at which a given MOF would break down. They also pulled out data on whether particular MOFs can withstand the conditions needed to remove solvents used to synthesize them and make sure they become porous.

    Once the researchers had this information, they used it to train two neural networks to predict MOFs’ thermal stability and stability during solvent removal, based on the molecules’ structure.

    “Before you start working with a material and thinking about scaling it up for different applications, you want to know will it hold up, or is it going to degrade in the conditions I would want to use it in?” Kulik says. “Our goal was to get better at predicting what makes a stable MOF.”

    Better stability

    Using the model, the researchers were able to identify certain features that influence stability. In general, simpler linkers with fewer chemical groups attached to them are more stable. Pore size is also important: Before the researchers did their analysis, it had been thought that MOFs with larger pores might be too unstable. However, the MIT team found that large-pore MOFs can be stable if other aspects of their structure counteract the large pore size.

    “Since MOFs have so many things that can vary at the same time, such as the metal, the linkers, the connectivity, and the pore size, it is difficult to nail down what governs stability across different families of MOFs,” Nandy says. “Our models enable researchers to make predictions on existing or new materials, many of which have yet to be made.”

    The researchers have made their data and models available online. Scientists interested in using the models can get recommendations for strategies to make an existing MOF more stable, and they can also add their own data and feedback on the predictions of the models.

    The MIT team is now using the model to try to identify MOFs that could be used to catalyze the conversion of methane gas to methanol, which could be used as fuel. Kulik also plans to use the model to create a new dataset of hypothetical MOFs that haven’t been built before but are predicted to have high stability. Researchers could then screen this dataset for a variety of properties.

    “People are interested in MOFs for things like quantum sensing and quantum computing, all sorts of different applications where you need metals distributed in this atomically precise way,” Kulik says.

    The research was funded by DARPA, the U.S. Office of Naval Research, the U.S. Department of Energy, a National Science Foundation Graduate Research Fellowship, a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, and an AAAS Marion Milligan Mason Award. More

  • in

    Unlocking new doors to artificial intelligence

    Artificial intelligence research is constantly developing new hypotheses that have the potential to benefit society and industry; however, sometimes these benefits are not fully realized due to a lack of engineering tools. To help bridge this gap, graduate students in the MIT Department of Electrical Engineering and Computer Science’s 6-A Master of Engineering (MEng) Thesis Program work with some of the most innovative companies in the world and collaborate on cutting-edge projects, while contributing to and completing their MEng thesis.

    During a portion of the last year, four 6-A MEng students teamed up and completed an internship with IBM Research’s advanced prototyping team through the MIT-IBM Watson AI Lab on AI projects, often developing web applications to solve a real-world issue or business use cases. Here, the students worked alongside AI engineers, user experience engineers, full-stack researchers, and generalists to accommodate project requests and receive thesis advice, says Lee Martie, IBM research staff member and 6-A manager. The students’ projects ranged from generating synthetic data to allow for privacy-sensitive data analysis to using computer vision to identify actions in video that allows for monitoring human safety and tracking build progress on a construction site.

    “I appreciated all of the expertise from the team and the feedback,” says 6-A graduate Violetta Jusiega ’21, who participated in the program. “I think that working in industry gives the lens of making sure that the project’s needs are satisfied and [provides the opportunity] to ground research and make sure that it is helpful for some use case in the future.”

    Jusiega’s research intersected the fields of computer vision and design to focus on data visualization and user interfaces for the medical field. Working with IBM, she built an application programming interface (API) that let clinicians interact with a medical treatment strategy AI model, which was deployed in the cloud. Her interface provided a medical decision tree, as well as some prescribed treatment plans. After receiving feedback on her design from physicians at a local hospital, Jusiega developed iterations of the API and how the results where displayed, visually, so that it would be user-friendly and understandable for clinicians, who don’t usually code. She says that, “these tools are often not acquired into the field because they lack some of these API principles which become more important in an industry where everything is already very fast paced, so there’s little time to incorporate a new technology.” But this project might eventually allow for industry deployment. “I think this application has a bunch of potential, whether it does get picked up by clinicians or whether it’s simply used in research. It’s very promising and very exciting to see how technology can help us modify, or I can improve, the health-care field to be even more custom-tailored towards patients and giving them the best care possible,” she says.

    Another 6-A graduate student, Spencer Compton, was also considering aiding professionals to make more informed decisions, for use in settings including health care, but he was tackling it from a causal perspective. When given a set of related variables, Compton was investigating if there was a way to determine not just correlation, but the cause-and-effect relationship between them (the direction of the interaction) from the data alone. For this, he and his collaborators from IBM Research and Purdue University turned to a field of math called information theory. With the goal of designing an algorithm to learn complex networks of causal relationships, Compton used ideas relating to entropy, the randomness in a system, to help determine if a causal relationship is present and how variables might be interacting. “When judging an explanation, people often default to Occam’s razor” says Compton. “We’re more inclined to believe a simpler explanation than a more complex one.” In many cases, he says, it seemed to perform well. For instance, they were able to consider variables such as lung cancer, pollution, and X-ray findings. He was pleased that his research allowed him to help create a framework of “entropic causal inference” that could aid in safe and smart decisions in the future, in a satisfying way. “The math is really surprisingly deep, interesting, and complex,” says Compton. “We’re basically asking, ‘when is the simplest explanation correct?’ but as a math question.”

    Determining relationships within data can sometimes require large volumes of it to suss out patterns, but for data that may contain sensitive information, this may not be available. For her master’s work, Ivy Huang worked with IBM Research to generate synthetic tabular data using a natural language processing tool called a transformer model, which can learn and predict future values from past values. Trained on real data, the model can produce new data with similar patterns, properties, and relationships without restrictions like privacy, availability, and access that might come with real data in financial transactions and electronic medical records. Further, she created an API and deployed the model in an IBM cluster, which allowed users increased access to the model and abilities to query it without compromising the original data.

    Working with the advanced prototyping team, MEng candidate Brandon Perez also considered how to gather and investigate data with restrictions, but in his case it was to use computer vision frameworks, centered on an action recognition model, to identify construction site happenings. The team based their work on the Moments in Time dataset, which contains over a million three-second video clips with about 300 attached classification labels, and has performed well during AI training. However, the group needed more construction-based video data. For this, they used YouTube-8M. Perez built a framework for testing and fine-tuning existing object detection models and action recognition models that could plug into an automatic spatial and temporal localization tool — how they would identify and label particular actions in a video timeline. “I was satisfied that I was able to explore what made me curious, and I was grateful for the autonomy that I was given with this project,” says Perez. “I felt like I was always supported, and my mentor was a great support to the project.”

    “The kind of collaborations that we have seen between our MEng students and IBM researchers are exactly what the 6-A MEng Thesis program at MIT is all about,” says Tomas Palacios, professor of electrical engineering and faculty director of the MIT 6-A MEng Thesis program. “For more than 100 years, 6-A has been connecting MIT students with industry to solve together some of the most important problems in the world.” More