More stories

  • in

    Image recognition accuracy: An unseen challenge confounding today’s AI

    Imagine you are scrolling through the photos on your phone and you come across an image that at first you can’t recognize. It looks like maybe something fuzzy on the couch; could it be a pillow or a coat? After a couple of seconds it clicks — of course! That ball of fluff is your friend’s cat, Mocha. While some of your photos could be understood in an instant, why was this cat photo much more difficult?

    MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers were surprised to find that despite the critical importance of understanding visual data in pivotal areas ranging from health care to transportation to household devices, the notion of an image’s recognition difficulty for humans has been almost entirely ignored. One of the major drivers of progress in deep learning-based AI has been datasets, yet we know little about how data drives progress in large-scale deep learning beyond that bigger is better.

    In real-world applications that require understanding visual data, humans outperform object recognition models despite the fact that models perform well on current datasets, including those explicitly designed to challenge machines with debiased images or distribution shifts. This problem persists, in part, because we have no guidance on the absolute difficulty of an image or dataset. Without controlling for the difficulty of images used for evaluation, it’s hard to objectively assess progress toward human-level performance, to cover the range of human abilities, and to increase the challenge posed by a dataset.

    To fill in this knowledge gap, David Mayo, an MIT PhD student in electrical engineering and computer science and a CSAIL affiliate, delved into the deep world of image datasets, exploring why certain images are more difficult for humans and machines to recognize than others. “Some images inherently take longer to recognize, and it’s essential to understand the brain’s activity during this process and its relation to machine learning models. Perhaps there are complex neural circuits or unique mechanisms missing in our current models, visible only when tested with challenging visual stimuli. This exploration is crucial for comprehending and enhancing machine vision models,” says Mayo, a lead author of a new paper on the work.

    This led to the development of a new metric, the “minimum viewing time” (MVT), which quantifies the difficulty of recognizing an image based on how long a person needs to view it before making a correct identification. Using a subset of ImageNet, a popular dataset in machine learning, and ObjectNet, a dataset designed to test object recognition robustness, the team showed images to participants for varying durations from as short as 17 milliseconds to as long as 10 seconds, and asked them to choose the correct object from a set of 50 options. After over 200,000 image presentation trials, the team found that existing test sets, including ObjectNet, appeared skewed toward easier, shorter MVT images, with the vast majority of benchmark performance derived from images that are easy for humans.

    The project identified interesting trends in model performance — particularly in relation to scaling. Larger models showed considerable improvement on simpler images but made less progress on more challenging images. The CLIP models, which incorporate both language and vision, stood out as they moved in the direction of more human-like recognition.

    “Traditionally, object recognition datasets have been skewed towards less-complex images, a practice that has led to an inflation in model performance metrics, not truly reflective of a model’s robustness or its ability to tackle complex visual tasks. Our research reveals that harder images pose a more acute challenge, causing a distribution shift that is often not accounted for in standard evaluations,” says Mayo. “We released image sets tagged by difficulty along with tools to automatically compute MVT, enabling MVT to be added to existing benchmarks and extended to various applications. These include measuring test set difficulty before deploying real-world systems, discovering neural correlates of image difficulty, and advancing object recognition techniques to close the gap between benchmark and real-world performance.”

    “One of my biggest takeaways is that we now have another dimension to evaluate models on. We want models that are able to recognize any image even if — perhaps especially if — it’s hard for a human to recognize. We’re the first to quantify what this would mean. Our results show that not only is this not the case with today’s state of the art, but also that our current evaluation methods don’t have the ability to tell us when it is the case because standard datasets are so skewed toward easy images,” says Jesse Cummings, an MIT graduate student in electrical engineering and computer science and co-first author with Mayo on the paper.

    From ObjectNet to MVT

    A few years ago, the team behind this project identified a significant challenge in the field of machine learning: Models were struggling with out-of-distribution images, or images that were not well-represented in the training data. Enter ObjectNet, a dataset comprised of images collected from real-life settings. The dataset helped illuminate the performance gap between machine learning models and human recognition abilities, by eliminating spurious correlations present in other benchmarks — for example, between an object and its background. ObjectNet illuminated the gap between the performance of machine vision models on datasets and in real-world applications, encouraging use for many researchers and developers — which subsequently improved model performance.

    Fast forward to the present, and the team has taken their research a step further with MVT. Unlike traditional methods that focus on absolute performance, this new approach assesses how models perform by contrasting their responses to the easiest and hardest images. The study further explored how image difficulty could be explained and tested for similarity to human visual processing. Using metrics like c-score, prediction depth, and adversarial robustness, the team found that harder images are processed differently by networks. “While there are observable trends, such as easier images being more prototypical, a comprehensive semantic explanation of image difficulty continues to elude the scientific community,” says Mayo.

    In the realm of health care, for example, the pertinence of understanding visual complexity becomes even more pronounced. The ability of AI models to interpret medical images, such as X-rays, is subject to the diversity and difficulty distribution of the images. The researchers advocate for a meticulous analysis of difficulty distribution tailored for professionals, ensuring AI systems are evaluated based on expert standards, rather than layperson interpretations.

    Mayo and Cummings are currently looking at neurological underpinnings of visual recognition as well, probing into whether the brain exhibits differential activity when processing easy versus challenging images. The study aims to unravel whether complex images recruit additional brain areas not typically associated with visual processing, hopefully helping demystify how our brains accurately and efficiently decode the visual world.

    Toward human-level performance

    Looking ahead, the researchers are not only focused on exploring ways to enhance AI’s predictive capabilities regarding image difficulty. The team is working on identifying correlations with viewing-time difficulty in order to generate harder or easier versions of images.

    Despite the study’s significant strides, the researchers acknowledge limitations, particularly in terms of the separation of object recognition from visual search tasks. The current methodology does concentrate on recognizing objects, leaving out the complexities introduced by cluttered images.

    “This comprehensive approach addresses the long-standing challenge of objectively assessing progress towards human-level performance in object recognition and opens new avenues for understanding and advancing the field,” says Mayo. “With the potential to adapt the Minimum Viewing Time difficulty metric for a variety of visual tasks, this work paves the way for more robust, human-like performance in object recognition, ensuring that models are truly put to the test and are ready for the complexities of real-world visual understanding.”

    “This is a fascinating study of how human perception can be used to identify weaknesses in the ways AI vision models are typically benchmarked, which overestimate AI performance by concentrating on easy images,” says Alan L. Yuille, Bloomberg Distinguished Professor of Cognitive Science and Computer Science at Johns Hopkins University, who was not involved in the paper. “This will help develop more realistic benchmarks leading not only to improvements to AI but also make fairer comparisons between AI and human perception.” 

    “It’s widely claimed that computer vision systems now outperform humans, and on some benchmark datasets, that’s true,” says Anthropic technical staff member Simon Kornblith PhD ’17, who was also not involved in this work. “However, a lot of the difficulty in those benchmarks comes from the obscurity of what’s in the images; the average person just doesn’t know enough to classify different breeds of dogs. This work instead focuses on images that people can only get right if given enough time. These images are generally much harder for computer vision systems, but the best systems are only a bit worse than humans.”

    Mayo, Cummings, and Xinyu Lin MEng ’22 wrote the paper alongside CSAIL Research Scientist Andrei Barbu, CSAIL Principal Research Scientist Boris Katz, and MIT-IBM Watson AI Lab Principal Researcher Dan Gutfreund. The researchers are affiliates of the MIT Center for Brains, Minds, and Machines.

    The team is presenting their work at the 2023 Conference on Neural Information Processing Systems (NeurIPS). More

  • in

    Automated system teaches users when to collaborate with an AI assistant

    Artificial intelligence models that pick out patterns in images can often do so better than human eyes — but not always. If a radiologist is using an AI model to help her determine whether a patient’s X-rays show signs of pneumonia, when should she trust the model’s advice and when should she ignore it?

    A customized onboarding process could help this radiologist answer that question, according to researchers at MIT and the MIT-IBM Watson AI Lab. They designed a system that teaches a user when to collaborate with an AI assistant.

    In this case, the training method might find situations where the radiologist trusts the model’s advice — except she shouldn’t because the model is wrong. The system automatically learns rules for how she should collaborate with the AI, and describes them with natural language.

    During onboarding, the radiologist practices collaborating with the AI using training exercises based on these rules, receiving feedback about her performance and the AI’s performance.

    The researchers found that this onboarding procedure led to about a 5 percent improvement in accuracy when humans and AI collaborated on an image prediction task. Their results also show that just telling the user when to trust the AI, without training, led to worse performance.

    Importantly, the researchers’ system is fully automated, so it learns to create the onboarding process based on data from the human and AI performing a specific task. It can also adapt to different tasks, so it can be scaled up and used in many situations where humans and AI models work together, such as in social media content moderation, writing, and programming.

    “So often, people are given these AI tools to use without any training to help them figure out when it is going to be helpful. That’s not what we do with nearly every other tool that people use — there is almost always some kind of tutorial that comes with it. But for AI, this seems to be missing. We are trying to tackle this problem from a methodological and behavioral perspective,” says Hussein Mozannar, a graduate student in the Social and Engineering Systems doctoral program within the Institute for Data, Systems, and Society (IDSS) and lead author of a paper about this training process.

    The researchers envision that such onboarding will be a crucial part of training for medical professionals.

    “One could imagine, for example, that doctors making treatment decisions with the help of AI will first have to do training similar to what we propose. We may need to rethink everything from continuing medical education to the way clinical trials are designed,” says senior author David Sontag, a professor of EECS, a member of the MIT-IBM Watson AI Lab and the MIT Jameel Clinic, and the leader of the Clinical Machine Learning Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Mozannar, who is also a researcher with the Clinical Machine Learning Group, is joined on the paper by Jimin J. Lee, an undergraduate in electrical engineering and computer science; Dennis Wei, a senior research scientist at IBM Research; and Prasanna Sattigeri and Subhro Das, research staff members at the MIT-IBM Watson AI Lab. The paper will be presented at the Conference on Neural Information Processing Systems.

    Training that evolves

    Existing onboarding methods for human-AI collaboration are often composed of training materials produced by human experts for specific use cases, making them difficult to scale up. Some related techniques rely on explanations, where the AI tells the user its confidence in each decision, but research has shown that explanations are rarely helpful, Mozannar says.

    “The AI model’s capabilities are constantly evolving, so the use cases where the human could potentially benefit from it are growing over time. At the same time, the user’s perception of the model continues changing. So, we need a training procedure that also evolves over time,” he adds.

    To accomplish this, their onboarding method is automatically learned from data. It is built from a dataset that contains many instances of a task, such as detecting the presence of a traffic light from a blurry image.

    The system’s first step is to collect data on the human and AI performing this task. In this case, the human would try to predict, with the help of AI, whether blurry images contain traffic lights.

    The system embeds these data points onto a latent space, which is a representation of data in which similar data points are closer together. It uses an algorithm to discover regions of this space where the human collaborates incorrectly with the AI. These regions capture instances where the human trusted the AI’s prediction but the prediction was wrong, and vice versa.

    Perhaps the human mistakenly trusts the AI when images show a highway at night.

    After discovering the regions, a second algorithm utilizes a large language model to describe each region as a rule, using natural language. The algorithm iteratively fine-tunes that rule by finding contrasting examples. It might describe this region as “ignore AI when it is a highway during the night.”

    These rules are used to build training exercises. The onboarding system shows an example to the human, in this case a blurry highway scene at night, as well as the AI’s prediction, and asks the user if the image shows traffic lights. The user can answer yes, no, or use the AI’s prediction.

    If the human is wrong, they are shown the correct answer and performance statistics for the human and AI on these instances of the task. The system does this for each region, and at the end of the training process, repeats the exercises the human got wrong.

    “After that, the human has learned something about these regions that we hope they will take away in the future to make more accurate predictions,” Mozannar says.

    Onboarding boosts accuracy

    The researchers tested this system with users on two tasks — detecting traffic lights in blurry images and answering multiple choice questions from many domains (such as biology, philosophy, computer science, etc.).

    They first showed users a card with information about the AI model, how it was trained, and a breakdown of its performance on broad categories. Users were split into five groups: Some were only shown the card, some went through the researchers’ onboarding procedure, some went through a baseline onboarding procedure, some went through the researchers’ onboarding procedure and were given recommendations of when they should or should not trust the AI, and others were only given the recommendations.

    Only the researchers’ onboarding procedure without recommendations improved users’ accuracy significantly, boosting their performance on the traffic light prediction task by about 5 percent without slowing them down. However, onboarding was not as effective for the question-answering task. The researchers believe this is because the AI model, ChatGPT, provided explanations with each answer that convey whether it should be trusted.

    But providing recommendations without onboarding had the opposite effect — users not only performed worse, they took more time to make predictions.

    “When you only give someone recommendations, it seems like they get confused and don’t know what to do. It derails their process. People also don’t like being told what to do, so that is a factor as well,” Mozannar says.

    Providing recommendations alone could harm the user if those recommendations are wrong, he adds. With onboarding, on the other hand, the biggest limitation is the amount of available data. If there aren’t enough data, the onboarding stage won’t be as effective, he says.

    In the future, he and his collaborators want to conduct larger studies to evaluate the short- and long-term effects of onboarding. They also want to leverage unlabeled data for the onboarding process, and find methods to effectively reduce the number of regions without omitting important examples.

    “People are adopting AI systems willy-nilly, and indeed AI offers great potential, but these AI agents still sometimes makes mistakes. Thus, it’s crucial for AI developers to devise methods that help humans know when it’s safe to rely on the AI’s suggestions,” says Dan Weld, professor emeritus at the Paul G. Allen School of Computer Science and Engineering at the University of Washington, who was not involved with this research. “Mozannar et al. have created an innovative method for identifying situations where the AI is trustworthy, and (importantly) to describe them to people in a way that leads to better human-AI team interactions.”

    This work is funded, in part, by the MIT-IBM Watson AI Lab. More

  • in

    AI accelerates problem-solving in complex scenarios

    While Santa Claus may have a magical sleigh and nine plucky reindeer to help him deliver presents, for companies like FedEx, the optimization problem of efficiently routing holiday packages is so complicated that they often employ specialized software to find a solution.

    This software, called a mixed-integer linear programming (MILP) solver, splits a massive optimization problem into smaller pieces and uses generic algorithms to try and find the best solution. However, the solver could take hours — or even days — to arrive at a solution.

    The process is so onerous that a company often must stop the software partway through, accepting a solution that is not ideal but the best that could be generated in a set amount of time.

    Researchers from MIT and ETH Zurich used machine learning to speed things up.

    They identified a key intermediate step in MILP solvers that has so many potential solutions it takes an enormous amount of time to unravel, which slows the entire process. The researchers employed a filtering technique to simplify this step, then used machine learning to find the optimal solution for a specific type of problem.

    Their data-driven approach enables a company to use its own data to tailor a general-purpose MILP solver to the problem at hand.

    This new technique sped up MILP solvers between 30 and 70 percent, without any drop in accuracy. One could use this method to obtain an optimal solution more quickly or, for especially complex problems, a better solution in a tractable amount of time.

    This approach could be used wherever MILP solvers are employed, such as by ride-hailing services, electric grid operators, vaccination distributors, or any entity faced with a thorny resource-allocation problem.

    “Sometimes, in a field like optimization, it is very common for folks to think of solutions as either purely machine learning or purely classical. I am a firm believer that we want to get the best of both worlds, and this is a really strong instantiation of that hybrid approach,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in Civil and Environmental Engineering (CEE), and a member of a member of the Laboratory for Information and Decision Systems (LIDS) and the Institute for Data, Systems, and Society (IDSS).

    Wu wrote the paper with co-lead authors Siriu Li, an IDSS graduate student, and Wenbin Ouyang, a CEE graduate student; as well as Max Paulus, a graduate student at ETH Zurich. The research will be presented at the Conference on Neural Information Processing Systems.

    Tough to solve

    MILP problems have an exponential number of potential solutions. For instance, say a traveling salesperson wants to find the shortest path to visit several cities and then return to their city of origin. If there are many cities which could be visited in any order, the number of potential solutions might be greater than the number of atoms in the universe.  

    “These problems are called NP-hard, which means it is very unlikely there is an efficient algorithm to solve them. When the problem is big enough, we can only hope to achieve some suboptimal performance,” Wu explains.

    An MILP solver employs an array of techniques and practical tricks that can achieve reasonable solutions in a tractable amount of time.

    A typical solver uses a divide-and-conquer approach, first splitting the space of potential solutions into smaller pieces with a technique called branching. Then, the solver employs a technique called cutting to tighten up these smaller pieces so they can be searched faster.

    Cutting uses a set of rules that tighten the search space without removing any feasible solutions. These rules are generated by a few dozen algorithms, known as separators, that have been created for different kinds of MILP problems. 

    Wu and her team found that the process of identifying the ideal combination of separator algorithms to use is, in itself, a problem with an exponential number of solutions.

    “Separator management is a core part of every solver, but this is an underappreciated aspect of the problem space. One of the contributions of this work is identifying the problem of separator management as a machine learning task to begin with,” she says.

    Shrinking the solution space

    She and her collaborators devised a filtering mechanism that reduces this separator search space from more than 130,000 potential combinations to around 20 options. This filtering mechanism draws on the principle of diminishing marginal returns, which says that the most benefit would come from a small set of algorithms, and adding additional algorithms won’t bring much extra improvement.

    Then they use a machine-learning model to pick the best combination of algorithms from among the 20 remaining options.

    This model is trained with a dataset specific to the user’s optimization problem, so it learns to choose algorithms that best suit the user’s particular task. Since a company like FedEx has solved routing problems many times before, using real data gleaned from past experience should lead to better solutions than starting from scratch each time.

    The model’s iterative learning process, known as contextual bandits, a form of reinforcement learning, involves picking a potential solution, getting feedback on how good it was, and then trying again to find a better solution.

    This data-driven approach accelerated MILP solvers between 30 and 70 percent without any drop in accuracy. Moreover, the speedup was similar when they applied it to a simpler, open-source solver and a more powerful, commercial solver.

    In the future, Wu and her collaborators want to apply this approach to even more complex MILP problems, where gathering labeled data to train the model could be especially challenging. Perhaps they can train the model on a smaller dataset and then tweak it to tackle a much larger optimization problem, she says. The researchers are also interested in interpreting the learned model to better understand the effectiveness of different separator algorithms.

    This research is supported, in part, by Mathworks, the National Science Foundation (NSF), the MIT Amazon Science Hub, and MIT’s Research Support Committee. More

  • in

    Synthetic imagery sets new bar in AI training efficiency

    Data is the new soil, and in this fertile new ground, MIT researchers are planting more than just pixels. By using synthetic images to train machine learning models, a team of scientists recently surpassed results obtained from traditional “real-image” training methods. 

    At the core of the approach is a system called StableRep, which doesn’t just use any synthetic images; it generates them through ultra-popular text-to-image models like Stable Diffusion. It’s like creating worlds with words. 

    So what’s in StableRep’s secret sauce? A strategy called “multi-positive contrastive learning.”

    “We’re teaching the model to learn more about high-level concepts through context and variance, not just feeding it data,” says Lijie Fan, MIT PhD student in electrical engineering, affiliate of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), lead researcher on the work. “When multiple images, all generated from the same text, all treated as depictions of the same underlying thing, the model dives deeper into the concepts behind the images, say the object, not just their pixels.”

    This approach considers multiple images spawned from identical text prompts as positive pairs, providing additional information during training, not just adding more diversity but specifying to the vision system which images are alike and which are different. Remarkably, StableRep outshone the prowess of top-tier models trained on real images, such as SimCLR and CLIP, in extensive datasets.

    “While StableRep helps mitigate the challenges of data acquisition in machine learning, it also ushers in a stride towards a new era of AI training techniques. The capacity to produce high-caliber, diverse synthetic images on command could help curtail cumbersome expenses and resources,” says Fan. 

    The process of data collection has never been straightforward. Back in the 1990s, researchers had to manually capture photographs to assemble datasets for objects and faces. The 2000s saw individuals scouring the internet for data. However, this raw, uncurated data often contained discrepancies when compared to real-world scenarios and reflected societal biases, presenting a distorted view of reality. The task of cleansing datasets through human intervention is not only expensive, but also exceedingly challenging. Imagine, though, if this arduous data collection could be distilled down to something as simple as issuing a command in natural language. 

    A pivotal aspect of StableRep’s triumph is the adjustment of the “guidance scale” in the generative model, which ensures a delicate balance between the synthetic images’ diversity and fidelity. When finely tuned, synthetic images used in training these self-supervised models were found to be as effective, if not more so, than real images.

    Taking it a step forward, language supervision was added to the mix, creating an enhanced variant: StableRep+. When trained with 20 million synthetic images, StableRep+ not only achieved superior accuracy but also displayed remarkable efficiency compared to CLIP models trained with a staggering 50 million real images.

    Yet, the path ahead isn’t without its potholes. The researchers candidly address several limitations, including the current slow pace of image generation, semantic mismatches between text prompts and the resultant images, potential amplification of biases, and complexities in image attribution, all of which are imperative to address for future advancements. Another issue is that StableRep requires first training the generative model on large-scale real data. The team acknowledges that starting with real data remains a necessity; however, when you have a good generative model, you can repurpose it for new tasks, like training recognition models and visual representations. 

    The team notes that they haven’t gotten around the need to start with real data; it’s just that once you have a good generative model you can repurpose it for new tasks, like training recognition models and visual representations. 

    While StableRep offers a good solution by diminishing the dependency on vast real-image collections, it brings to the fore concerns regarding hidden biases within the uncurated data used for these text-to-image models. The choice of text prompts, integral to the image synthesis process, is not entirely free from bias, “indicating the essential role of meticulous text selection or possible human curation,” says Fan. 

    “Using the latest text-to-image models, we’ve gained unprecedented control over image generation, allowing for a diverse range of visuals from a single text input. This surpasses real-world image collection in efficiency and versatility. It proves especially useful in specialized tasks, like balancing image variety in long-tail recognition, presenting a practical supplement to using real images for training,” says Fan. “Our work signifies a step forward in visual learning, towards the goal of offering cost-effective training alternatives while highlighting the need for ongoing improvements in data quality and synthesis.”

    “One dream of generative model learning has long been to be able to generate data useful for discriminative model training,” says Google DeepMind researcher and University of Toronto professor of computer science David Fleet, who was not involved in the paper. “While we have seen some signs of life, the dream has been elusive, especially on large-scale complex domains like high-resolution images. This paper provides compelling evidence, for the first time to my knowledge, that the dream is becoming a reality. They show that contrastive learning from massive amounts of synthetic image data can produce representations that outperform those learned from real data at scale, with the potential to improve myriad downstream vision tasks.”

    Fan is joined by Yonglong Tian PhD ’22 as lead authors of the paper, as well as MIT associate professor of electrical engineering and computer science and CSAIL principal investigator Phillip Isola; Google researcher and OpenAI technical staff member Huiwen Chang; and Google staff research scientist Dilip Krishnan. The team will present StableRep at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in New Orleans. More

  • in

    Rewarding excellence in open data

    The second annual MIT Prize for Open Data, which included a $2,500 cash prize, was recently awarded to 10 individual and group research projects. Presented jointly by the School of Science and the MIT Libraries, the prize highlights the value of open data — research data that is openly accessible and reusable — at the Institute. The prize winners and 12 honorable mention recipients were honored at the Open Data @ MIT event held Oct. 24 at Hayden Library. 

    Conceived by Chris Bourg, director of MIT Libraries, and Rebecca Saxe, associate dean of the School of Science and the John W. Jarve (1978) Professor of Brain and Cognitive Sciences, the prize program was launched in 2022. It recognizes MIT-affiliated researchers who use or share open data, create infrastructure for open data sharing, or theorize about open data. Nominations were solicited from across the Institute, with a focus on trainees: undergraduate and graduate students, postdocs, and research staff. 

    “The prize is explicitly aimed at early-career researchers,” says Bourg. “Supporting and encouraging the next generation of researchers will help ensure that the future of scholarship is characterized by a norm of open sharing.”

    The 2023 awards were presented at a celebratory event held during International Open Access Week. Winners gave five-minute presentations on their projects and the role that open data plays in their research. The program also included remarks from Bourg and Anne White, School of Engineering Distinguished Professor of Engineering, vice provost, and associate vice president for research administration. White reflected on the ways in which MIT has demonstrated its values with the open sharing of research and scholarship and acknowledged the efforts of the honorees and advocates gathered at the event: “Thank you for the active role you’re all playing in building a culture of openness in research,” she said. “It benefits us all.” 

    Winners were chosen from more than 80 nominees, representing all five MIT schools, the MIT Schwarzman College of Computing, and several research centers across the Institute. A committee composed of faculty, staff, and graduate students made the selections:

    Hammaad Adam, graduate student in the Institute for Data, Systems, and Society, accepted on behalf of the team behind Organ Retrieval and Collection of Health Information for Donation (ORCHID), the first ever multi-center dataset dedicated to the organ procurement process. ORCHID provides the first opportunity to quantitatively analyze organ procurement organization decisions and identify operational inefficiencies.
    Adam Atanas, postdoc in the Department of Brain and Cognitive Sciences (BCS), and Jungsoo Kim, graduate student in BCS, created WormWideWeb.org. The site, allowing researchers to easily browse and download C. elegans whole-brain datasets, will be useful to C. elegans neuroscientists and theoretical/computational neuroscientists. 
    Paul Berube, research scientist in the Department of Civil and Environmental Engineering, and Steven Biller, assistant professor of biological sciences at Wellesley College, won for “Unlocking Marine Microbiomes with Open Data.” Open data of genomes and metagenomes for marine ecosystems, with a focus on cyanobacteria, leverage the power of contemporaneous data from GEOTRACES and other long-standing ocean time-series programs to provide underlying information to answer questions about marine ecosystem function. 
    Jack Cavanagh, Sarah Kopper, and Diana Horvath of the Abdul Latif Jameel Poverty Action Lab (J-PAL) were recognized for J-PAL’s Data Publication Infrastructure, which includes a trusted repository of open-access datasets, a dedicated team of data curators, and coding tools and training materials to help other teams publish data in an efficient and ethical manner. 
    Jerome Patrick Cruz, graduate student in the Department of Political Science, won for OpenAudit, leveraging advances in natural language processing and machine learning to make data in public audit reports more usable for academics and policy researchers, as well as governance practitioners, watchdogs, and reformers. This work was done in collaboration with colleagues at Ateneo de Manila University in the Philippines. 
    Undergraduate student Daniel Kurlander created a tool for planetary scientists to rapidly access and filter images of the comet 67P/Churyumov-Gerasimenko. The web-based tool enables searches by location and other properties, does not require a time-intensive download of a massive dataset, allows analysis of the data independent of the speed of one’s computer, and does not require installation of a complex set of programs. 
    Halie Olson, postdoc in BCS, was recognized for sharing data from a functional magnetic resonance imaging (fMRI) study on language processing. The study used video clips from “Sesame Street” in which researchers manipulated the comprehensibility of the speech stream, allowing them to isolate a “language response” in the brain.
    Thomas González Roberts, graduate student in the Department of Aeronautics and Astronautics, won for the International Telecommunication Union Compliance Assessment Monitor. This tool combats the heritage of secrecy in outer space operations by creating human- and machine-readable datasets that succinctly describe the international agreements that govern satellite operations. 
    Melissa Kline Struhl, research scientist in BCS, was recognized for Children Helping Science, a free, open-source platform for remote studies with babies and children that makes it possible for researchers at more than 100 institutions to conduct reproducible studies. 
    JS Tan, graduate student in the Department of Urban Studies and Planning, developed the Collective Action in Tech Archive in collaboration with Nataliya Nedzhvetskaya of the University of California at Berkeley. It is an open database of all publicly recorded collective actions taken by workers in the global tech industry. 
    A complete list of winning projects and honorable mentions, including links to the research data, is available on the MIT Libraries website. More

  • in

    Technique enables AI on edge devices to keep learning over time

    Personalized deep-learning models can enable artificial intelligence chatbots that adapt to understand a user’s accent or smart keyboards that continuously update to better predict the next word based on someone’s typing history. This customization requires constant fine-tuning of a machine-learning model with new data.

    Because smartphones and other edge devices lack the memory and computational power necessary for this fine-tuning process, user data are typically uploaded to cloud servers where the model is updated. But data transmission uses a great deal of energy, and sending sensitive user data to a cloud server poses a security risk.  

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere developed a technique that enables deep-learning models to efficiently adapt to new sensor data directly on an edge device.

    Their on-device training method, called PockEngine, determines which parts of a huge machine-learning model need to be updated to improve accuracy, and only stores and computes with those specific pieces. It performs the bulk of these computations while the model is being prepared, before runtime, which minimizes computational overhead and boosts the speed of the fine-tuning process.    

    When compared to other methods, PockEngine significantly sped up on-device training, performing up to 15 times faster on some hardware platforms. Moreover, PockEngine didn’t cause models to have any dip in accuracy. The researchers also found that their fine-tuning method enabled a popular AI chatbot to answer complex questions more accurately.

    “On-device fine-tuning can enable better privacy, lower costs, customization ability, and also lifelong learning, but it is not easy. Everything has to happen with a limited number of resources. We want to be able to run not only inference but also training on an edge device. With PockEngine, now we can,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, a distinguished scientist at NVIDIA, and senior author of an open-access paper describing PockEngine.

    Han is joined on the paper by lead author Ligeng Zhu, an EECS graduate student, as well as others at MIT, the MIT-IBM Watson AI Lab, and the University of California San Diego. The paper was recently presented at the IEEE/ACM International Symposium on Microarchitecture.

    Layer by layer

    Deep-learning models are based on neural networks, which comprise many interconnected layers of nodes, or “neurons,” that process data to make a prediction. When the model is run, a process called inference, a data input (such as an image) is passed from layer to layer until the prediction (perhaps the image label) is output at the end. During inference, each layer no longer needs to be stored after it processes the input.

    But during training and fine-tuning, the model undergoes a process known as backpropagation. In backpropagation, the output is compared to the correct answer, and then the model is run in reverse. Each layer is updated as the model’s output gets closer to the correct answer.

    Because each layer may need to be updated, the entire model and intermediate results must be stored, making fine-tuning more memory demanding than inference

    However, not all layers in the neural network are important for improving accuracy. And even for layers that are important, the entire layer may not need to be updated. Those layers, and pieces of layers, don’t need to be stored. Furthermore, one may not need to go all the way back to the first layer to improve accuracy — the process could be stopped somewhere in the middle.

    PockEngine takes advantage of these factors to speed up the fine-tuning process and cut down on the amount of computation and memory required.

    The system first fine-tunes each layer, one at a time, on a certain task and measures the accuracy improvement after each individual layer. In this way, PockEngine identifies the contribution of each layer, as well as trade-offs between accuracy and fine-tuning cost, and automatically determines the percentage of each layer that needs to be fine-tuned.

    “This method matches the accuracy very well compared to full back propagation on different tasks and different neural networks,” Han adds.

    A pared-down model

    Conventionally, the backpropagation graph is generated during runtime, which involves a great deal of computation. Instead, PockEngine does this during compile time, while the model is being prepared for deployment.

    PockEngine deletes bits of code to remove unnecessary layers or pieces of layers, creating a pared-down graph of the model to be used during runtime. It then performs other optimizations on this graph to further improve efficiency.

    Since all this only needs to be done once, it saves on computational overhead for runtime.

    “It is like before setting out on a hiking trip. At home, you would do careful planning — which trails are you going to go on, which trails are you going to ignore. So then at execution time, when you are actually hiking, you already have a very careful plan to follow,” Han explains.

    When they applied PockEngine to deep-learning models on different edge devices, including Apple M1 Chips and the digital signal processors common in many smartphones and Raspberry Pi computers, it performed on-device training up to 15 times faster, without any drop in accuracy. PockEngine also significantly slashed the amount of memory required for fine-tuning.

    The team also applied the technique to the large language model Llama-V2. With large language models, the fine-tuning process involves providing many examples, and it’s crucial for the model to learn how to interact with users, Han says. The process is also important for models tasked with solving complex problems or reasoning about solutions.

    For instance, Llama-V2 models that were fine-tuned using PockEngine answered the question “What was Michael Jackson’s last album?” correctly, while models that weren’t fine-tuned failed. PockEngine cut the time it took for each iteration of the fine-tuning process from about seven seconds to less than one second on a NVIDIA Jetson Orin, an edge GPU platform.

    In the future, the researchers want to use PockEngine to fine-tune even larger models designed to process text and images together.

    “This work addresses growing efficiency challenges posed by the adoption of large AI models such as LLMs across diverse applications in many different industries. It not only holds promise for edge applications that incorporate larger models, but also for lowering the cost of maintaining and updating large AI models in the cloud,” says Ehry MacRostie, a senior manager in Amazon’s Artificial General Intelligence division who was not involved in this study but works with MIT on related AI research through the MIT-Amazon Science Hub.

    This work was supported, in part, by the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT-Amazon Science Hub, the National Science Foundation (NSF), and the Qualcomm Innovation Fellowship. More

  • in

    Forging climate connections across the Institute

    Climate change is the ultimate cross-cutting issue: Not limited to any one discipline, it ranges across science, technology, policy, culture, human behavior, and well beyond. The response to it likewise requires an all-of-MIT effort.

    Now, to strengthen such an effort, a new grant program spearheaded by the Climate Nucleus, the faculty committee charged with the oversight and implementation of Fast Forward: MIT’s Climate Action Plan for the Decade, aims to build up MIT’s climate leadership capacity while also supporting innovative scholarship on diverse climate-related topics and forging new connections across the Institute.

    Called the Fast Forward Faculty Fund (F^4 for short), the program has named its first cohort of six faculty members after issuing its inaugural call for proposals in April 2023. The cohort will come together throughout the year for climate leadership development programming and networking. The program provides financial support for graduate students who will work with the faculty members on the projects — the students will also participate in leadership-building activities — as well as $50,000 in flexible, discretionary funding to be used to support related activities. 

    “Climate change is a crisis that truly touches every single person on the planet,” says Noelle Selin, co-chair of the nucleus and interim director of the Institute for Data, Systems, and Society. “It’s therefore essential that we build capacity for every member of the MIT community to make sense of the problem and help address it. Through the Fast Forward Faculty Fund, our aim is to have a cohort of climate ambassadors who can embed climate everywhere at the Institute.”

    F^4 supports both faculty who would like to begin doing climate-related work, as well as faculty members who are interested in deepening their work on climate. The program has the core goal of developing cohorts of F^4 faculty and graduate students who, in addition to conducting their own research, will become climate leaders at MIT, proactively looking for ways to forge new climate connections across schools, departments, and disciplines.

    One of the projects, “Climate Crisis and Real Estate: Science-based Mitigation and Adaptation Strategies,” led by Professor Siqi Zheng of the MIT Center for Real Estate in collaboration with colleagues from the MIT Sloan School of Management, focuses on the roughly 40 percent of carbon dioxide emissions that come from the buildings and real estate sector. Zheng notes that this sector has been slow to respond to climate change, but says that is starting to change, thanks in part to the rising awareness of climate risks and new local regulations aimed at reducing emissions from buildings.

    Using a data-driven approach, the project seeks to understand the efficient and equitable market incentives, technology solutions, and public policies that are most effective at transforming the real estate industry. Johnattan Ontiveros, a graduate student in the Technology and Policy Program, is working with Zheng on the project.

    “We were thrilled at the incredible response we received from the MIT faculty to our call for proposals, which speaks volumes about the depth and breadth of interest in climate at MIT,” says Anne White, nucleus co-chair and vice provost and associate vice president for research. “This program makes good on key commitments of the Fast Forward plan, supporting cutting-edge new work by faculty and graduate students while helping to deepen the bench of climate leaders at MIT.”

    During the 2023-24 academic year, the F^4 faculty and graduate student cohorts will come together to discuss their projects, explore opportunities for collaboration, participate in climate leadership development, and think proactively about how to deepen interdisciplinary connections among MIT community members interested in climate change.

    The six inaugural F^4 awardees are:

    Professor Tristan Brown, History Section: Humanistic Approaches to the Climate Crisis  

    With this project, Brown aims to create a new community of practice around narrative-centric approaches to environmental and climate issues. Part of a broader humanities initiative at MIT, it brings together a global working group of interdisciplinary scholars, including Serguei Saavedra (Department of Civil and Environmental Engineering) and Or Porath (Tel Aviv University; Religion), collectively focused on examining the historical and present links between sacred places and biodiversity for the purposes of helping governments and nongovernmental organizations formulate better sustainability goals. Boyd Ruamcharoen, a PhD student in the History, Anthropology, and Science, Technology, and Society (HASTS) program, will work with Brown on this project.

    Professor Kerri Cahoy, departments of Aeronautics and Astronautics and Earth, Atmospheric, and Planetary Sciences (AeroAstro): Onboard Autonomous AI-driven Satellite Sensor Fusion for Coastal Region Monitoring

    The motivation for this project is the need for much better data collection from satellites, where technology can be “20 years behind,” says Cahoy. As part of this project, Cahoy will pursue research in the area of autonomous artificial intelligence-enabled rapid sensor fusion (which combines data from different sensors, such as radar and cameras) onboard satellites to improve understanding of the impacts of climate change, specifically sea-level rise and hurricanes and flooding in coastal regions. Graduate students Madeline Anderson, a PhD student in electrical engineering and computer science (EECS), and Mary Dahl, a PhD student in AeroAstro, will work with Cahoy on this project.

    Professor Priya Donti, Department of Electrical Engineering and Computer Science: Robust Reinforcement Learning for High-Renewables Power Grids 

    With renewables like wind and solar making up a growing share of electricity generation on power grids, Donti’s project focuses on improving control methods for these distributed sources of electricity. The research will aim to create a realistic representation of the characteristics of power grid operations, and eventually inform scalable operational improvements in power systems. It will “give power systems operators faith that, OK, this conceptually is good, but it also actually works on this grid,” says Donti. PhD candidate Ana Rivera from EECS is the F^4 graduate student on the project.

    Professor Jason Jackson, Department of Urban Studies and Planning (DUSP): Political Economy of the Climate Crisis: Institutions, Power and Global Governance

    This project takes a political economy approach to the climate crisis, offering a distinct lens to examine, first, the political governance challenge of mobilizing climate action and designing new institutional mechanisms to address the global and intergenerational distributional aspects of climate change; second, the economic challenge of devising new institutional approaches to equitably finance climate action; and third, the cultural challenge — and opportunity — of empowering an adaptive socio-cultural ecology through traditional knowledge and local-level social networks to achieve environmental resilience. Graduate students Chen Chu and Mrinalini Penumaka, both PhD students in DUSP, are working with Jackson on the project.

    Professor Haruko Wainwright, departments of Nuclear Science and Engineering (NSE) and Civil and Environmental Engineering: Low-cost Environmental Monitoring Network Technologies in Rural Communities for Addressing Climate Justice 

    This project will establish a community-based climate and environmental monitoring network in addition to a data visualization and analysis infrastructure in rural marginalized communities to better understand and address climate justice issues. The project team plans to work with rural communities in Alaska to install low-cost air and water quality, weather, and soil sensors. Graduate students Kay Whiteaker, an MS candidate in NSE, and Amandeep Singh, and MS candidate in System Design and Management at Sloan, are working with Wainwright on the project, as is David McGee, professor in earth, atmospheric, and planetary sciences.

    Professor Siqi Zheng, MIT Center for Real Estate and DUSP: Climate Crisis and Real Estate: Science-based Mitigation and Adaptation Strategies 

    See the text above for the details on this project. More

  • in

    Accelerating AI tasks while preserving data security

    With the proliferation of computationally intensive machine-learning applications, such as chatbots that perform real-time language translation, device manufacturers often incorporate specialized hardware components to rapidly move and process the massive amounts of data these systems demand.

    Choosing the best design for these components, known as deep neural network accelerators, is challenging because they can have an enormous range of design options. This difficult problem becomes even thornier when a designer seeks to add cryptographic operations to keep data safe from attackers.

    Now, MIT researchers have developed a search engine that can efficiently identify optimal designs for deep neural network accelerators, that preserve data security while boosting performance.

    Their search tool, known as SecureLoop, is designed to consider how the addition of data encryption and authentication measures will impact the performance and energy usage of the accelerator chip. An engineer could use this tool to obtain the optimal design of an accelerator tailored to their neural network and machine-learning task.

    When compared to conventional scheduling techniques that don’t consider security, SecureLoop can improve performance of accelerator designs while keeping data protected.  

    Using SecureLoop could help a user improve the speed and performance of demanding AI applications, such as autonomous driving or medical image classification, while ensuring sensitive user data remains safe from some types of attacks.

    “If you are interested in doing a computation where you are going to preserve the security of the data, the rules that we used before for finding the optimal design are now broken. So all of that optimization needs to be customized for this new, more complicated set of constraints. And that is what [lead author] Kyungmi has done in this paper,” says Joel Emer, an MIT professor of the practice in computer science and electrical engineering and co-author of a paper on SecureLoop.

    Emer is joined on the paper by lead author Kyungmi Lee, an electrical engineering and computer science graduate student; Mengjia Yan, the Homer A. Burnell Career Development Assistant Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Anantha Chandrakasan, dean of the MIT School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. The research will be presented at the IEEE/ACM International Symposium on Microarchitecture.

    “The community passively accepted that adding cryptographic operations to an accelerator will introduce overhead. They thought it would introduce only a small variance in the design trade-off space. But, this is a misconception. In fact, cryptographic operations can significantly distort the design space of energy-efficient accelerators. Kyungmi did a fantastic job identifying this issue,” Yan adds.

    Secure acceleration

    A deep neural network consists of many layers of interconnected nodes that process data. Typically, the output of one layer becomes the input of the next layer. Data are grouped into units called tiles for processing and transfer between off-chip memory and the accelerator. Each layer of the neural network can have its own data tiling configuration.

    A deep neural network accelerator is a processor with an array of computational units that parallelizes operations, like multiplication, in each layer of the network. The accelerator schedule describes how data are moved and processed.

    Since space on an accelerator chip is at a premium, most data are stored in off-chip memory and fetched by the accelerator when needed. But because data are stored off-chip, they are vulnerable to an attacker who could steal information or change some values, causing the neural network to malfunction.

    “As a chip manufacturer, you can’t guarantee the security of external devices or the overall operating system,” Lee explains.

    Manufacturers can protect data by adding authenticated encryption to the accelerator. Encryption scrambles the data using a secret key. Then authentication cuts the data into uniform chunks and assigns a cryptographic hash to each chunk of data, which is stored along with the data chunk in off-chip memory.

    When the accelerator fetches an encrypted chunk of data, known as an authentication block, it uses a secret key to recover and verify the original data before processing it.

    But the sizes of authentication blocks and tiles of data don’t match up, so there could be multiple tiles in one block, or a tile could be split between two blocks. The accelerator can’t arbitrarily grab a fraction of an authentication block, so it may end up grabbing extra data, which uses additional energy and slows down computation.

    Plus, the accelerator still must run the cryptographic operation on each authentication block, adding even more computational cost.

    An efficient search engine

    With SecureLoop, the MIT researchers sought a method that could identify the fastest and most energy efficient accelerator schedule — one that minimizes the number of times the device needs to access off-chip memory to grab extra blocks of data because of encryption and authentication.  

    They began by augmenting an existing search engine Emer and his collaborators previously developed, called Timeloop. First, they added a model that could account for the additional computation needed for encryption and authentication.

    Then, they reformulated the search problem into a simple mathematical expression, which enables SecureLoop to find the ideal authentical block size in a much more efficient manner than searching through all possible options.

    “Depending on how you assign this block, the amount of unnecessary traffic might increase or decrease. If you assign the cryptographic block cleverly, then you can just fetch a small amount of additional data,” Lee says.

    Finally, they incorporated a heuristic technique that ensures SecureLoop identifies a schedule which maximizes the performance of the entire deep neural network, rather than only a single layer.

    At the end, the search engine outputs an accelerator schedule, which includes the data tiling strategy and the size of the authentication blocks, that provides the best possible speed and energy efficiency for a specific neural network.

    “The design spaces for these accelerators are huge. What Kyungmi did was figure out some very pragmatic ways to make that search tractable so she could find good solutions without needing to exhaustively search the space,” says Emer.

    When tested in a simulator, SecureLoop identified schedules that were up to 33.2 percent faster and exhibited 50.2 percent better energy delay product (a metric related to energy efficiency) than other methods that didn’t consider security.

    The researchers also used SecureLoop to explore how the design space for accelerators changes when security is considered. They learned that allocating a bit more of the chip’s area for the cryptographic engine and sacrificing some space for on-chip memory can lead to better performance, Lee says.

    In the future, the researchers want to use SecureLoop to find accelerator designs that are resilient to side-channel attacks, which occur when an attacker has access to physical hardware. For instance, an attacker could monitor the power consumption pattern of a device to obtain secret information, even if the data have been encrypted. They are also extending SecureLoop so it could be applied to other kinds of computation.

    This work is funded, in part, by Samsung Electronics and the Korea Foundation for Advanced Studies. More