More stories

  • in

    AI accelerates problem-solving in complex scenarios

    While Santa Claus may have a magical sleigh and nine plucky reindeer to help him deliver presents, for companies like FedEx, the optimization problem of efficiently routing holiday packages is so complicated that they often employ specialized software to find a solution.

    This software, called a mixed-integer linear programming (MILP) solver, splits a massive optimization problem into smaller pieces and uses generic algorithms to try and find the best solution. However, the solver could take hours — or even days — to arrive at a solution.

    The process is so onerous that a company often must stop the software partway through, accepting a solution that is not ideal but the best that could be generated in a set amount of time.

    Researchers from MIT and ETH Zurich used machine learning to speed things up.

    They identified a key intermediate step in MILP solvers that has so many potential solutions it takes an enormous amount of time to unravel, which slows the entire process. The researchers employed a filtering technique to simplify this step, then used machine learning to find the optimal solution for a specific type of problem.

    Their data-driven approach enables a company to use its own data to tailor a general-purpose MILP solver to the problem at hand.

    This new technique sped up MILP solvers between 30 and 70 percent, without any drop in accuracy. One could use this method to obtain an optimal solution more quickly or, for especially complex problems, a better solution in a tractable amount of time.

    This approach could be used wherever MILP solvers are employed, such as by ride-hailing services, electric grid operators, vaccination distributors, or any entity faced with a thorny resource-allocation problem.

    “Sometimes, in a field like optimization, it is very common for folks to think of solutions as either purely machine learning or purely classical. I am a firm believer that we want to get the best of both worlds, and this is a really strong instantiation of that hybrid approach,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in Civil and Environmental Engineering (CEE), and a member of a member of the Laboratory for Information and Decision Systems (LIDS) and the Institute for Data, Systems, and Society (IDSS).

    Wu wrote the paper with co-lead authors Siriu Li, an IDSS graduate student, and Wenbin Ouyang, a CEE graduate student; as well as Max Paulus, a graduate student at ETH Zurich. The research will be presented at the Conference on Neural Information Processing Systems.

    Tough to solve

    MILP problems have an exponential number of potential solutions. For instance, say a traveling salesperson wants to find the shortest path to visit several cities and then return to their city of origin. If there are many cities which could be visited in any order, the number of potential solutions might be greater than the number of atoms in the universe.  

    “These problems are called NP-hard, which means it is very unlikely there is an efficient algorithm to solve them. When the problem is big enough, we can only hope to achieve some suboptimal performance,” Wu explains.

    An MILP solver employs an array of techniques and practical tricks that can achieve reasonable solutions in a tractable amount of time.

    A typical solver uses a divide-and-conquer approach, first splitting the space of potential solutions into smaller pieces with a technique called branching. Then, the solver employs a technique called cutting to tighten up these smaller pieces so they can be searched faster.

    Cutting uses a set of rules that tighten the search space without removing any feasible solutions. These rules are generated by a few dozen algorithms, known as separators, that have been created for different kinds of MILP problems. 

    Wu and her team found that the process of identifying the ideal combination of separator algorithms to use is, in itself, a problem with an exponential number of solutions.

    “Separator management is a core part of every solver, but this is an underappreciated aspect of the problem space. One of the contributions of this work is identifying the problem of separator management as a machine learning task to begin with,” she says.

    Shrinking the solution space

    She and her collaborators devised a filtering mechanism that reduces this separator search space from more than 130,000 potential combinations to around 20 options. This filtering mechanism draws on the principle of diminishing marginal returns, which says that the most benefit would come from a small set of algorithms, and adding additional algorithms won’t bring much extra improvement.

    Then they use a machine-learning model to pick the best combination of algorithms from among the 20 remaining options.

    This model is trained with a dataset specific to the user’s optimization problem, so it learns to choose algorithms that best suit the user’s particular task. Since a company like FedEx has solved routing problems many times before, using real data gleaned from past experience should lead to better solutions than starting from scratch each time.

    The model’s iterative learning process, known as contextual bandits, a form of reinforcement learning, involves picking a potential solution, getting feedback on how good it was, and then trying again to find a better solution.

    This data-driven approach accelerated MILP solvers between 30 and 70 percent without any drop in accuracy. Moreover, the speedup was similar when they applied it to a simpler, open-source solver and a more powerful, commercial solver.

    In the future, Wu and her collaborators want to apply this approach to even more complex MILP problems, where gathering labeled data to train the model could be especially challenging. Perhaps they can train the model on a smaller dataset and then tweak it to tackle a much larger optimization problem, she says. The researchers are also interested in interpreting the learned model to better understand the effectiveness of different separator algorithms.

    This research is supported, in part, by Mathworks, the National Science Foundation (NSF), the MIT Amazon Science Hub, and MIT’s Research Support Committee. More

  • in

    Search algorithm reveals nearly 200 new kinds of CRISPR systems

    Microbial sequence databases contain a wealth of information about enzymes and other molecules that could be adapted for biotechnology. But these databases have grown so large in recent years that they’ve become difficult to search efficiently for enzymes of interest.

    Now, scientists at the McGovern Institute for Brain Research at MIT, the Broad Institute of MIT and Harvard, and the National Center for Biotechnology Information (NCBI) at the National Institutes of Health have developed a new search algorithm that has identified 188 kinds of new rare CRISPR systems in bacterial genomes, encompassing thousands of individual systems. The work appears today in Science.

    The algorithm, which comes from the lab of pioneering CRISPR researcher Professor Feng Zhang, uses big-data clustering approaches to rapidly search massive amounts of genomic data. The team used their algorithm, called Fast Locality-Sensitive Hashing-based clustering (FLSHclust) to mine three major public databases that contain data from a wide range of unusual bacteria, including ones found in coal mines, breweries, Antarctic lakes, and dog saliva. The scientists found a surprising number and diversity of CRISPR systems, including ones that could make edits to DNA in human cells, others that can target RNA, and many with a variety of other functions.

    The new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems. They could also one day be used as diagnostics or serve as molecular records of activity inside cells.

    The researchers say their search highlights an unprecedented level of diversity and flexibility of CRISPR and that there are likely many more rare systems yet to be discovered as databases continue to grow.

    “Biodiversity is such a treasure trove, and as we continue to sequence more genomes and metagenomic samples, there is a growing need for better tools, like FLSHclust, to search that sequence space to find the molecular gems,” says Zhang, a co-senior author on the study and the James and Patricia Poitras Professor of Neuroscience at MIT with joint appointments in the departments of Brain and Cognitive Sciences and Biological Engineering. Zhang is also an investigator at the McGovern Institute for Brain Research at MIT, a core institute member at the Broad, and an investigator at the Howard Hughes Medical Institute. Eugene Koonin, a distinguished investigator at the NCBI, is co-senior author on the study as well.

    Searching for CRISPR

    CRISPR, which stands for clustered regularly interspaced short palindromic repeats, is a bacterial defense system that has been engineered into many tools for genome editing and diagnostics.

    To mine databases of protein and nucleic acid sequences for novel CRISPR systems, the researchers developed an algorithm based on an approach borrowed from the big data community. This technique, called locality-sensitive hashing, clusters together objects that are similar but not exactly identical. Using this approach allowed the team to probe billions of protein and DNA sequences — from the NCBI, its Whole Genome Shotgun database, and the Joint Genome Institute — in weeks, whereas previous methods that look for identical objects would have taken months. They designed their algorithm to look for genes associated with CRISPR.

    “This new algorithm allows us to parse through data in a time frame that’s short enough that we can actually recover results and make biological hypotheses,” says Soumya Kannan PhD ’23, who is a co-first author on the study. Kannan was a graduate student in Zhang’s lab when the study began and is currently a postdoc and Junior Fellow at Harvard University. Han Altae-Tran PhD ’23, a graduate student in Zhang’s lab during the study and currently a postdoc at the University of Washington, was the study’s other co-first author.

    “This is a testament to what you can do when you improve on the methods for exploration and use as much data as possible,” says Altae-Tran. “It’s really exciting to be able to improve the scale at which we search.”

    New systems

    In their analysis, Altae-Tran, Kannan, and their colleagues noticed that the thousands of CRISPR systems they found fell into a few existing and many new categories. They studied several of the new systems in greater detail in the lab.

    They found several new variants of known Type I CRISPR systems, which use a guide RNA that is 32 base pairs long rather than the 20-nucleotide guide of Cas9. Because of their longer guide RNAs, these Type I systems could potentially be used to develop more precise gene-editing technology that is less prone to off-target editing. Zhang’s team showed that two of these systems could make short edits in the DNA of human cells. And because these Type I systems are similar in size to CRISPR-Cas9, they could likely be delivered to cells in animals or humans using the same gene-delivery technologies being used today for CRISPR.

    One of the Type I systems also showed “collateral activity” — broad degradation of nucleic acids after the CRISPR protein binds its target. Scientists have used similar systems to make infectious disease diagnostics such as SHERLOCK, a tool capable of rapidly sensing a single molecule of DNA or RNA. Zhang’s team thinks the new systems could be adapted for diagnostic technologies as well.

    The researchers also uncovered new mechanisms of action for some Type IV CRISPR systems, and a Type VII system that precisely targets RNA, which could potentially be used in RNA editing. Other systems could potentially be used as recording tools — a molecular document of when a gene was expressed — or as sensors of specific activity in a living cell.

    Mining data

    The scientists say their algorithm could aid in the search for other biochemical systems. “This search algorithm could be used by anyone who wants to work with these large databases for studying how proteins evolve or discovering new genes,” Altae-Tran says.

    The researchers add that their findings illustrate not only how diverse CRISPR systems are, but also that most are rare and only found in unusual bacteria. “Some of these microbial systems were exclusively found in water from coal mines,” Kannan says. “If someone hadn’t been interested in that, we may never have seen those systems. Broadening our sampling diversity is really important to continue expanding the diversity of what we can discover.”

    This work was supported by the Howard Hughes Medical Institute; the K. Lisa Yang and Hock E. Tan Molecular Therapeutics Center at MIT; Broad Institute Programmable Therapeutics Gift Donors; The Pershing Square Foundation, William Ackman and Neri Oxman; James and Patricia Poitras; BT Charitable Foundation; Asness Family Foundation; Kenneth C. Griffin; the Phillips family; David Cheng; and Robert Metcalfe. More

  • in

    Synthetic imagery sets new bar in AI training efficiency

    Data is the new soil, and in this fertile new ground, MIT researchers are planting more than just pixels. By using synthetic images to train machine learning models, a team of scientists recently surpassed results obtained from traditional “real-image” training methods. 

    At the core of the approach is a system called StableRep, which doesn’t just use any synthetic images; it generates them through ultra-popular text-to-image models like Stable Diffusion. It’s like creating worlds with words. 

    So what’s in StableRep’s secret sauce? A strategy called “multi-positive contrastive learning.”

    “We’re teaching the model to learn more about high-level concepts through context and variance, not just feeding it data,” says Lijie Fan, MIT PhD student in electrical engineering, affiliate of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), lead researcher on the work. “When multiple images, all generated from the same text, all treated as depictions of the same underlying thing, the model dives deeper into the concepts behind the images, say the object, not just their pixels.”

    This approach considers multiple images spawned from identical text prompts as positive pairs, providing additional information during training, not just adding more diversity but specifying to the vision system which images are alike and which are different. Remarkably, StableRep outshone the prowess of top-tier models trained on real images, such as SimCLR and CLIP, in extensive datasets.

    “While StableRep helps mitigate the challenges of data acquisition in machine learning, it also ushers in a stride towards a new era of AI training techniques. The capacity to produce high-caliber, diverse synthetic images on command could help curtail cumbersome expenses and resources,” says Fan. 

    The process of data collection has never been straightforward. Back in the 1990s, researchers had to manually capture photographs to assemble datasets for objects and faces. The 2000s saw individuals scouring the internet for data. However, this raw, uncurated data often contained discrepancies when compared to real-world scenarios and reflected societal biases, presenting a distorted view of reality. The task of cleansing datasets through human intervention is not only expensive, but also exceedingly challenging. Imagine, though, if this arduous data collection could be distilled down to something as simple as issuing a command in natural language. 

    A pivotal aspect of StableRep’s triumph is the adjustment of the “guidance scale” in the generative model, which ensures a delicate balance between the synthetic images’ diversity and fidelity. When finely tuned, synthetic images used in training these self-supervised models were found to be as effective, if not more so, than real images.

    Taking it a step forward, language supervision was added to the mix, creating an enhanced variant: StableRep+. When trained with 20 million synthetic images, StableRep+ not only achieved superior accuracy but also displayed remarkable efficiency compared to CLIP models trained with a staggering 50 million real images.

    Yet, the path ahead isn’t without its potholes. The researchers candidly address several limitations, including the current slow pace of image generation, semantic mismatches between text prompts and the resultant images, potential amplification of biases, and complexities in image attribution, all of which are imperative to address for future advancements. Another issue is that StableRep requires first training the generative model on large-scale real data. The team acknowledges that starting with real data remains a necessity; however, when you have a good generative model, you can repurpose it for new tasks, like training recognition models and visual representations. 

    The team notes that they haven’t gotten around the need to start with real data; it’s just that once you have a good generative model you can repurpose it for new tasks, like training recognition models and visual representations. 

    While StableRep offers a good solution by diminishing the dependency on vast real-image collections, it brings to the fore concerns regarding hidden biases within the uncurated data used for these text-to-image models. The choice of text prompts, integral to the image synthesis process, is not entirely free from bias, “indicating the essential role of meticulous text selection or possible human curation,” says Fan. 

    “Using the latest text-to-image models, we’ve gained unprecedented control over image generation, allowing for a diverse range of visuals from a single text input. This surpasses real-world image collection in efficiency and versatility. It proves especially useful in specialized tasks, like balancing image variety in long-tail recognition, presenting a practical supplement to using real images for training,” says Fan. “Our work signifies a step forward in visual learning, towards the goal of offering cost-effective training alternatives while highlighting the need for ongoing improvements in data quality and synthesis.”

    “One dream of generative model learning has long been to be able to generate data useful for discriminative model training,” says Google DeepMind researcher and University of Toronto professor of computer science David Fleet, who was not involved in the paper. “While we have seen some signs of life, the dream has been elusive, especially on large-scale complex domains like high-resolution images. This paper provides compelling evidence, for the first time to my knowledge, that the dream is becoming a reality. They show that contrastive learning from massive amounts of synthetic image data can produce representations that outperform those learned from real data at scale, with the potential to improve myriad downstream vision tasks.”

    Fan is joined by Yonglong Tian PhD ’22 as lead authors of the paper, as well as MIT associate professor of electrical engineering and computer science and CSAIL principal investigator Phillip Isola; Google researcher and OpenAI technical staff member Huiwen Chang; and Google staff research scientist Dilip Krishnan. The team will present StableRep at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in New Orleans. More

  • in

    Technique enables AI on edge devices to keep learning over time

    Personalized deep-learning models can enable artificial intelligence chatbots that adapt to understand a user’s accent or smart keyboards that continuously update to better predict the next word based on someone’s typing history. This customization requires constant fine-tuning of a machine-learning model with new data.

    Because smartphones and other edge devices lack the memory and computational power necessary for this fine-tuning process, user data are typically uploaded to cloud servers where the model is updated. But data transmission uses a great deal of energy, and sending sensitive user data to a cloud server poses a security risk.  

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere developed a technique that enables deep-learning models to efficiently adapt to new sensor data directly on an edge device.

    Their on-device training method, called PockEngine, determines which parts of a huge machine-learning model need to be updated to improve accuracy, and only stores and computes with those specific pieces. It performs the bulk of these computations while the model is being prepared, before runtime, which minimizes computational overhead and boosts the speed of the fine-tuning process.    

    When compared to other methods, PockEngine significantly sped up on-device training, performing up to 15 times faster on some hardware platforms. Moreover, PockEngine didn’t cause models to have any dip in accuracy. The researchers also found that their fine-tuning method enabled a popular AI chatbot to answer complex questions more accurately.

    “On-device fine-tuning can enable better privacy, lower costs, customization ability, and also lifelong learning, but it is not easy. Everything has to happen with a limited number of resources. We want to be able to run not only inference but also training on an edge device. With PockEngine, now we can,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, a distinguished scientist at NVIDIA, and senior author of an open-access paper describing PockEngine.

    Han is joined on the paper by lead author Ligeng Zhu, an EECS graduate student, as well as others at MIT, the MIT-IBM Watson AI Lab, and the University of California San Diego. The paper was recently presented at the IEEE/ACM International Symposium on Microarchitecture.

    Layer by layer

    Deep-learning models are based on neural networks, which comprise many interconnected layers of nodes, or “neurons,” that process data to make a prediction. When the model is run, a process called inference, a data input (such as an image) is passed from layer to layer until the prediction (perhaps the image label) is output at the end. During inference, each layer no longer needs to be stored after it processes the input.

    But during training and fine-tuning, the model undergoes a process known as backpropagation. In backpropagation, the output is compared to the correct answer, and then the model is run in reverse. Each layer is updated as the model’s output gets closer to the correct answer.

    Because each layer may need to be updated, the entire model and intermediate results must be stored, making fine-tuning more memory demanding than inference

    However, not all layers in the neural network are important for improving accuracy. And even for layers that are important, the entire layer may not need to be updated. Those layers, and pieces of layers, don’t need to be stored. Furthermore, one may not need to go all the way back to the first layer to improve accuracy — the process could be stopped somewhere in the middle.

    PockEngine takes advantage of these factors to speed up the fine-tuning process and cut down on the amount of computation and memory required.

    The system first fine-tunes each layer, one at a time, on a certain task and measures the accuracy improvement after each individual layer. In this way, PockEngine identifies the contribution of each layer, as well as trade-offs between accuracy and fine-tuning cost, and automatically determines the percentage of each layer that needs to be fine-tuned.

    “This method matches the accuracy very well compared to full back propagation on different tasks and different neural networks,” Han adds.

    A pared-down model

    Conventionally, the backpropagation graph is generated during runtime, which involves a great deal of computation. Instead, PockEngine does this during compile time, while the model is being prepared for deployment.

    PockEngine deletes bits of code to remove unnecessary layers or pieces of layers, creating a pared-down graph of the model to be used during runtime. It then performs other optimizations on this graph to further improve efficiency.

    Since all this only needs to be done once, it saves on computational overhead for runtime.

    “It is like before setting out on a hiking trip. At home, you would do careful planning — which trails are you going to go on, which trails are you going to ignore. So then at execution time, when you are actually hiking, you already have a very careful plan to follow,” Han explains.

    When they applied PockEngine to deep-learning models on different edge devices, including Apple M1 Chips and the digital signal processors common in many smartphones and Raspberry Pi computers, it performed on-device training up to 15 times faster, without any drop in accuracy. PockEngine also significantly slashed the amount of memory required for fine-tuning.

    The team also applied the technique to the large language model Llama-V2. With large language models, the fine-tuning process involves providing many examples, and it’s crucial for the model to learn how to interact with users, Han says. The process is also important for models tasked with solving complex problems or reasoning about solutions.

    For instance, Llama-V2 models that were fine-tuned using PockEngine answered the question “What was Michael Jackson’s last album?” correctly, while models that weren’t fine-tuned failed. PockEngine cut the time it took for each iteration of the fine-tuning process from about seven seconds to less than one second on a NVIDIA Jetson Orin, an edge GPU platform.

    In the future, the researchers want to use PockEngine to fine-tune even larger models designed to process text and images together.

    “This work addresses growing efficiency challenges posed by the adoption of large AI models such as LLMs across diverse applications in many different industries. It not only holds promise for edge applications that incorporate larger models, but also for lowering the cost of maintaining and updating large AI models in the cloud,” says Ehry MacRostie, a senior manager in Amazon’s Artificial General Intelligence division who was not involved in this study but works with MIT on related AI research through the MIT-Amazon Science Hub.

    This work was supported, in part, by the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT-Amazon Science Hub, the National Science Foundation (NSF), and the Qualcomm Innovation Fellowship. More

  • in

    2023-24 Takeda Fellows: Advancing research at the intersection of AI and health

    The School of Engineering has selected 13 new Takeda Fellows for the 2023-24 academic year. With support from Takeda, the graduate students will conduct pathbreaking research ranging from remote health monitoring for virtual clinical trials to ingestible devices for at-home, long-term diagnostics.

    Now in its fourth year, the MIT-Takeda Program, a collaboration between MIT’s School of Engineering and Takeda, fuels the development and application of artificial intelligence capabilities to benefit human health and drug development. Part of the Abdul Latif Jameel Clinic for Machine Learning in Health, the program coalesces disparate disciplines, merges theory and practical implementation, combines algorithm and hardware innovations, and creates multidimensional collaborations between academia and industry.

    The 2023-24 Takeda Fellows are:

    Adam Gierlach

    Adam Gierlach is a PhD candidate in the Department of Electrical Engineering and Computer Science. Gierlach’s work combines innovative biotechnology with machine learning to create ingestible devices for advanced diagnostics and delivery of therapeutics. In his previous work, Gierlach developed a non-invasive, ingestible device for long-term gastric recordings in free-moving patients. With the support of a Takeda Fellowship, he will build on this pathbreaking work by developing smart, energy-efficient, ingestible devices powered by application-specific integrated circuits for at-home, long-term diagnostics. These revolutionary devices — capable of identifying, characterizing, and even correcting gastrointestinal diseases — represent the leading edge of biotechnology. Gierlach’s innovative contributions will help to advance fundamental research on the enteric nervous system and help develop a better understanding of gut-brain axis dysfunctions in Parkinson’s disease, autism spectrum disorder, and other prevalent disorders and conditions.

    Vivek Gopalakrishnan

    Vivek Gopalakrishnan is a PhD candidate in the Harvard-MIT Program in Health Sciences and Technology. Gopalakrishnan’s goal is to develop biomedical machine-learning methods to improve the study and treatment of human disease. Specifically, he employs computational modeling to advance new approaches for minimally invasive, image-guided neurosurgery, offering a safe alternative to open brain and spinal procedures. With the support of a Takeda Fellowship, Gopalakrishnan will develop real-time computer vision algorithms that deliver high-quality, 3D intraoperative image guidance by extracting and fusing information from multimodal neuroimaging data. These algorithms could allow surgeons to reconstruct 3D neurovasculature from X-ray angiography, thereby enhancing the precision of device deployment and enabling more accurate localization of healthy versus pathologic anatomy.

    Hao He

    Hao He is a PhD candidate in the Department of Electrical Engineering and Computer Science. His research interests lie at the intersection of generative AI, machine learning, and their applications in medicine and human health, with a particular emphasis on passive, continuous, remote health monitoring to support virtual clinical trials and health-care management. More specifically, He aims to develop trustworthy AI models that promote equitable access and deliver fair performance independent of race, gender, and age. In his past work, He has developed monitoring systems applied in clinical studies of Parkinson’s disease, Alzheimer’s disease, and epilepsy. Supported by a Takeda Fellowship, He will develop a novel technology for the passive monitoring of sleep stages (using radio signaling) that seeks to address existing gaps in performance across different demographic groups. His project will tackle the problem of imbalance in available datasets and account for intrinsic differences across subpopulations, using generative AI and multi-modality/multi-domain learning, with the goal of learning robust features that are invariant to different subpopulations. He’s work holds great promise for delivering advanced, equitable health-care services to all people and could significantly impact health care and AI.

    Chengyi Long

    Chengyi Long is a PhD candidate in the Department of Civil and Environmental Engineering. Long’s interdisciplinary research integrates the methodology of physics, mathematics, and computer science to investigate questions in ecology. Specifically, Long is developing a series of potentially groundbreaking techniques to explain and predict the temporal dynamics of ecological systems, including human microbiota, which are essential subjects in health and medical research. His current work, supported by a Takeda Fellowship, is focused on developing a conceptual, mathematical, and practical framework to understand the interplay between external perturbations and internal community dynamics in microbial systems, which may serve as a key step toward finding bio solutions to health management. A broader perspective of his research is to develop AI-assisted platforms to anticipate the changing behavior of microbial systems, which may help to differentiate between healthy and unhealthy hosts and design probiotics for the prevention and mitigation of pathogen infections. By creating novel methods to address these issues, Long’s research has the potential to offer powerful contributions to medicine and global health.

    Omar Mohd

    Omar Mohd is a PhD candidate in the Department of Electrical Engineering and Computer Science. Mohd’s research is focused on developing new technologies for the spatial profiling of microRNAs, with potentially important applications in cancer research. Through innovative combinations of micro-technologies and AI-enabled image analysis to measure the spatial variations of microRNAs within tissue samples, Mohd hopes to gain new insights into drug resistance in cancer. This work, supported by a Takeda Fellowship, falls within the emerging field of spatial transcriptomics, which seeks to understand cancer and other diseases by examining the relative locations of cells and their contents within tissues. The ultimate goal of Mohd’s current project is to find multidimensional patterns in tissues that may have prognostic value for cancer patients. One valuable component of his work is an open-source AI program developed with collaborators at Beth Israel Deaconess Medical Center and Harvard Medical School to auto-detect cancer epithelial cells from other cell types in a tissue sample and to correlate their abundance with the spatial variations of microRNAs. Through his research, Mohd is making innovative contributions at the interface of microsystem technology, AI-based image analysis, and cancer treatment, which could significantly impact medicine and human health.

    Sanghyun Park

    Sanghyun Park is a PhD candidate in the Department of Mechanical Engineering. Park specializes in the integration of AI and biomedical engineering to address complex challenges in human health. Drawing on his expertise in polymer physics, drug delivery, and rheology, his research focuses on the pioneering field of in-situ forming implants (ISFIs) for drug delivery. Supported by a Takeda Fellowship, Park is currently developing an injectable formulation designed for long-term drug delivery. The primary goal of his research is to unravel the compaction mechanism of drug particles in ISFI formulations through comprehensive modeling and in-vitro characterization studies utilizing advanced AI tools. He aims to gain a thorough understanding of this unique compaction mechanism and apply it to drug microcrystals to achieve properties optimal for long-term drug delivery. Beyond these fundamental studies, Park’s research also focuses on translating this knowledge into practical applications in a clinical setting through animal studies specifically aimed at extending drug release duration and improving mechanical properties. The innovative use of AI in developing advanced drug delivery systems, coupled with Park’s valuable insights into the compaction mechanism, could contribute to improving long-term drug delivery. This work has the potential to pave the way for effective management of chronic diseases, benefiting patients, clinicians, and the pharmaceutical industry.

    Huaiyao Peng

    Huaiyao Peng is a PhD candidate in the Department of Biological Engineering. Peng’s research interests are focused on engineered tissue, microfabrication platforms, cancer metastasis, and the tumor microenvironment. Specifically, she is advancing novel AI techniques for the development of pre-cancer organoid models of high-grade serous ovarian cancer (HGSOC), an especially lethal and difficult-to-treat cancer, with the goal of gaining new insights into progression and effective treatments. Peng’s project, supported by a Takeda Fellowship, will be one of the first to use cells from serous tubal intraepithelial carcinoma lesions found in the fallopian tubes of many HGSOC patients. By examining the cellular and molecular changes that occur in response to treatment with small molecule inhibitors, she hopes to identify potential biomarkers and promising therapeutic targets for HGSOC, including personalized treatment options for HGSOC patients, ultimately improving their clinical outcomes. Peng’s work has the potential to bring about important advances in cancer treatment and spur innovative new applications of AI in health care. 

    Priyanka Raghavan

    Priyanka Raghavan is a PhD candidate in the Department of Chemical Engineering. Raghavan’s research interests lie at the frontier of predictive chemistry, integrating computational and experimental approaches to build powerful new predictive tools for societally important applications, including drug discovery. Specifically, Raghavan is developing novel models to predict small-molecule substrate reactivity and compatibility in regimes where little data is available (the most realistic regimes). A Takeda Fellowship will enable Raghavan to push the boundaries of her research, making innovative use of low-data and multi-task machine learning approaches, synthetic chemistry, and robotic laboratory automation, with the goal of creating an autonomous, closed-loop system for the discovery of high-yielding organic small molecules in the context of underexplored reactions. Raghavan’s work aims to identify new, versatile reactions to broaden a chemist’s synthetic toolbox with novel scaffolds and substrates that could form the basis of essential drugs. Her work has the potential for far-reaching impacts in early-stage, small-molecule discovery and could help make the lengthy drug-discovery process significantly faster and cheaper.

    Zhiye Song

    Zhiye “Zoey” Song is a PhD candidate in the Department of Electrical Engineering and Computer Science. Song’s research integrates cutting-edge approaches in machine learning (ML) and hardware optimization to create next-generation, wearable medical devices. Specifically, Song is developing novel approaches for the energy-efficient implementation of ML computation in low-power medical devices, including a wearable ultrasound “patch” that captures and processes images for real-time decision-making capabilities. Her recent work, conducted in collaboration with clinicians, has centered on bladder volume monitoring; other potential applications include blood pressure monitoring, muscle diagnosis, and neuromodulation. With the support of a Takeda Fellowship, Song will build on that promising work and pursue key improvements to existing wearable device technologies, including developing low-compute and low-memory ML algorithms and low-power chips to enable ML on smart wearable devices. The technologies emerging from Song’s research could offer exciting new capabilities in health care, enabling powerful and cost-effective point-of-care diagnostics and expanding individual access to autonomous and continuous medical monitoring.

    Peiqi Wang

    Peiqi Wang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Wang’s research aims to develop machine learning methods for learning and interpretation from medical images and associated clinical data to support clinical decision-making. He is developing a multimodal representation learning approach that aligns knowledge captured in large amounts of medical image and text data to transfer this knowledge to new tasks and applications. Supported by a Takeda Fellowship, Wang will advance this promising line of work to build robust tools that interpret images, learn from sparse human feedback, and reason like doctors, with potentially major benefits to important stakeholders in health care.

    Oscar Wu

    Haoyang “Oscar” Wu is a PhD candidate in the Department of Chemical Engineering. Wu’s research integrates quantum chemistry and deep learning methods to accelerate the process of small-molecule screening in the development of new drugs. By identifying and automating reliable methods for finding transition state geometries and calculating barrier heights for new reactions, Wu’s work could make it possible to conduct the high-throughput ab initio calculations of reaction rates needed to screen the reactivity of large numbers of active pharmaceutical ingredients (APIs). A Takeda Fellowship will support his current project to: (1) develop open-source software for high-throughput quantum chemistry calculations, focusing on the reactivity of drug-like molecules, and (2) develop deep learning models that can quantitatively predict the oxidative stability of APIs. The tools and insights resulting from Wu’s research could help to transform and accelerate the drug-discovery process, offering significant benefits to the pharmaceutical and medical fields and to patients.

    Soojung Yang

    Soojung Yang is a PhD candidate in the Department of Materials Science and Engineering. Yang’s research applies cutting-edge methods in geometric deep learning and generative modeling, along with atomistic simulations, to better understand and model protein dynamics. Specifically, Yang is developing novel tools in generative AI to explore protein conformational landscapes that offer greater speed and detail than physics-based simulations at a substantially lower cost. With the support of a Takeda Fellowship, she will build upon her successful work on the reverse transformation of coarse-grained proteins to the all-atom resolution, aiming to build machine-learning models that bridge multiple size scales of protein conformation diversity (all-atom, residue-level, and domain-level). Yang’s research holds the potential to provide a powerful and widely applicable new tool for researchers who seek to understand the complex protein functions at work in human diseases and to design drugs to treat and cure those diseases.

    Yuzhe Yang

    Yuzhe Yang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Yang’s research interests lie at the intersection of machine learning and health care. In his past and current work, Yang has developed and applied innovative machine-learning models that address key challenges in disease diagnosis and tracking. His many notable achievements include the creation of one of the first machine learning-based solutions using nocturnal breathing signals to detect Parkinson’s disease (PD), estimate disease severity, and track PD progression. With the support of a Takeda Fellowship, Yang will expand this promising work to develop an AI-based diagnosis model for Alzheimer’s disease (AD) using sleep-breathing data that is significantly more reliable, flexible, and economical than current diagnostic tools. This passive, in-home, contactless monitoring system — resembling a simple home Wi-Fi router — will also enable remote disease assessment and continuous progression tracking. Yang’s groundbreaking work has the potential to advance the diagnosis and treatment of prevalent diseases like PD and AD, and it offers exciting possibilities for addressing many health challenges with reliable, affordable machine-learning tools.  More

  • in

    Generating opportunities with generative AI

    Talking with retail executives back in 2010, Rama Ramakrishnan came to two realizations. First, although retail systems that offered customers personalized recommendations were getting a great deal of attention, these systems often provided little payoff for retailers. Second, for many of the firms, most customers shopped only once or twice a year, so companies didn’t really know much about them.

    “But by being very diligent about noting down the interactions a customer has with a retailer or an e-commerce site, we can create a very nice and detailed composite picture of what that person does and what they care about,” says Ramakrishnan, professor of the practice at the MIT Sloan School of Management. “Once you have that, then you can apply proven algorithms from machine learning.”

    These realizations led Ramakrishnan to found CQuotient, a startup whose software has now become the foundation for Salesforce’s widely adopted AI e-commerce platform. “On Black Friday alone, CQuotient technology probably sees and interacts with over a billion shoppers on a single day,” he says.

    After a highly successful entrepreneurial career, in 2019 Ramakrishnan returned to MIT Sloan, where he had earned master’s and PhD degrees in operations research in the 1990s. He teaches students “not just how these amazing technologies work, but also how do you take these technologies and actually put them to use pragmatically in the real world,” he says.

    Additionally, Ramakrishnan enjoys participating in MIT executive education. “This is a great opportunity for me to convey the things that I have learned, but also as importantly, to learn what’s on the minds of these senior executives, and to guide them and nudge them in the right direction,” he says.

    For example, executives are understandably concerned about the need for massive amounts of data to train machine learning systems. He can now guide them to a wealth of models that are pre-trained for specific tasks. “The ability to use these pre-trained AI models, and very quickly adapt them to your particular business problem, is an incredible advance,” says Ramakrishnan.

    Rama Ramakrishnan – Utilizing AI in Real World Applications for Intelligent WorkVideo: MIT Industrial Liaison Program

    Understanding AI categories

    “AI is the quest to imbue computers with the ability to do cognitive tasks that typically only humans can do,” he says. Understanding the history of this complex, supercharged landscape aids in exploiting the technologies.

    The traditional approach to AI, which basically solved problems by applying if/then rules learned from humans, proved useful for relatively few tasks. “One reason is that we can do lots of things effortlessly, but if asked to explain how we do them, we can’t actually articulate how we do them,” Ramakrishnan comments. Also, those systems may be baffled by new situations that don’t match up to the rules enshrined in the software.

    Machine learning takes a dramatically different approach, with the software fundamentally learning by example. “You give it lots of examples of inputs and outputs, questions and answers, tasks and responses, and get the computer to automatically learn how to go from the input to the output,” he says. Credit scoring, loan decision-making, disease prediction, and demand forecasting are among the many tasks conquered by machine learning.

    But machine learning only worked well when the input data was structured, for instance in a spreadsheet. “If the input data was unstructured, such as images, video, audio, ECGs, or X-rays, it wasn’t very good at going from that to a predicted output,” Ramakrishnan says. That means humans had to manually structure the unstructured data to train the system.

    Around 2010 deep learning began to overcome that limitation, delivering the ability to directly work with unstructured input data, he says. Based on a longstanding AI strategy known as neural networks, deep learning became practical due to the global flood tide of data, the availability of extraordinarily powerful parallel processing hardware called graphics processing units (originally invented for video games) and advances in algorithms and math.

    Finally, within deep learning, the generative AI software packages appearing last year can create unstructured outputs, such as human-sounding text, images of dogs, and three-dimensional models. Large language models (LLMs) such as OpenAI’s ChatGPT go from text inputs to text outputs, while text-to-image models such as OpenAI’s DALL-E can churn out realistic-appearing images.

    Rama Ramakrishnan – Making Note of Little Data to Improve Customer ServiceVideo: MIT Industrial Liaison Program

    What generative AI can (and can’t) do

    Trained on the unimaginably vast text resources of the internet, a LLM’s “fundamental capability is to predict the next most likely, most plausible word,” Ramakrishnan says. “Then it attaches the word to the original sentence, predicts the next word again, and keeps on doing it.”

    “To the surprise of many, including a lot of researchers, an LLM can do some very complicated things,” he says. “It can compose beautifully coherent poetry, write Seinfeld episodes, and solve some kinds of reasoning problems. It’s really quite remarkable how next-word prediction can lead to these amazing capabilities.”

    “But you have to always keep in mind that what it is doing is not so much finding the correct answer to your question as finding a plausible answer your question,” Ramakrishnan emphasizes. Its content may be factually inaccurate, irrelevant, toxic, biased, or offensive.

    That puts the burden on users to make sure that the output is correct, relevant, and useful for the task at hand. “You have to make sure there is some way for you to check its output for errors and fix them before it goes out,” he says.

    Intense research is underway to find techniques to address these shortcomings, adds Ramakrishnan, who expects many innovative tools to do so.

    Finding the right corporate roles for LLMs

    Given the astonishing progress in LLMs, how should industry think about applying the software to tasks such as generating content?

    First, Ramakrishnan advises, consider costs: “Is it a much less expensive effort to have a draft that you correct, versus you creating the whole thing?” Second, if the LLM makes a mistake that slips by, and the mistaken content is released to the outside world, can you live with the consequences?

    “If you have an application which satisfies both considerations, then it’s good to do a pilot project to see whether these technologies can actually help you with that particular task,” says Ramakrishnan. He stresses the need to treat the pilot as an experiment rather than as a normal IT project.

    Right now, software development is the most mature corporate LLM application. “ChatGPT and other LLMs are text-in, text-out, and a software program is just text-out,” he says. “Programmers can go from English text-in to Python text-out, as well as you can go from English-to-English or English-to-German. There are lots of tools which help you write code using these technologies.”

    Of course, programmers must make sure the result does the job properly. Fortunately, software development already offers infrastructure for testing and verifying code. “This is a beautiful sweet spot,” he says, “where it’s much cheaper to have the technology write code for you, because you can very quickly check and verify it.”

    Another major LLM use is content generation, such as writing marketing copy or e-commerce product descriptions. “Again, it may be much cheaper to fix ChatGPT’s draft than for you to write the whole thing,” Ramakrishnan says. “However, companies must be very careful to make sure there is a human in the loop.”

    LLMs also are spreading quickly as in-house tools to search enterprise documents. Unlike conventional search algorithms, an LLM chatbot can offer a conversational search experience, because it remembers each question you ask. “But again, it will occasionally make things up,” he says. “In terms of chatbots for external customers, these are very early days, because of the risk of saying something wrong to the customer.”

    Overall, Ramakrishnan notes, we’re living in a remarkable time to grapple with AI’s rapidly evolving potentials and pitfalls. “I help companies figure out how to take these very transformative technologies and put them to work, to make products and services much more intelligent, employees much more productive, and processes much more efficient,” he says. More

  • in

    Accelerating AI tasks while preserving data security

    With the proliferation of computationally intensive machine-learning applications, such as chatbots that perform real-time language translation, device manufacturers often incorporate specialized hardware components to rapidly move and process the massive amounts of data these systems demand.

    Choosing the best design for these components, known as deep neural network accelerators, is challenging because they can have an enormous range of design options. This difficult problem becomes even thornier when a designer seeks to add cryptographic operations to keep data safe from attackers.

    Now, MIT researchers have developed a search engine that can efficiently identify optimal designs for deep neural network accelerators, that preserve data security while boosting performance.

    Their search tool, known as SecureLoop, is designed to consider how the addition of data encryption and authentication measures will impact the performance and energy usage of the accelerator chip. An engineer could use this tool to obtain the optimal design of an accelerator tailored to their neural network and machine-learning task.

    When compared to conventional scheduling techniques that don’t consider security, SecureLoop can improve performance of accelerator designs while keeping data protected.  

    Using SecureLoop could help a user improve the speed and performance of demanding AI applications, such as autonomous driving or medical image classification, while ensuring sensitive user data remains safe from some types of attacks.

    “If you are interested in doing a computation where you are going to preserve the security of the data, the rules that we used before for finding the optimal design are now broken. So all of that optimization needs to be customized for this new, more complicated set of constraints. And that is what [lead author] Kyungmi has done in this paper,” says Joel Emer, an MIT professor of the practice in computer science and electrical engineering and co-author of a paper on SecureLoop.

    Emer is joined on the paper by lead author Kyungmi Lee, an electrical engineering and computer science graduate student; Mengjia Yan, the Homer A. Burnell Career Development Assistant Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Anantha Chandrakasan, dean of the MIT School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. The research will be presented at the IEEE/ACM International Symposium on Microarchitecture.

    “The community passively accepted that adding cryptographic operations to an accelerator will introduce overhead. They thought it would introduce only a small variance in the design trade-off space. But, this is a misconception. In fact, cryptographic operations can significantly distort the design space of energy-efficient accelerators. Kyungmi did a fantastic job identifying this issue,” Yan adds.

    Secure acceleration

    A deep neural network consists of many layers of interconnected nodes that process data. Typically, the output of one layer becomes the input of the next layer. Data are grouped into units called tiles for processing and transfer between off-chip memory and the accelerator. Each layer of the neural network can have its own data tiling configuration.

    A deep neural network accelerator is a processor with an array of computational units that parallelizes operations, like multiplication, in each layer of the network. The accelerator schedule describes how data are moved and processed.

    Since space on an accelerator chip is at a premium, most data are stored in off-chip memory and fetched by the accelerator when needed. But because data are stored off-chip, they are vulnerable to an attacker who could steal information or change some values, causing the neural network to malfunction.

    “As a chip manufacturer, you can’t guarantee the security of external devices or the overall operating system,” Lee explains.

    Manufacturers can protect data by adding authenticated encryption to the accelerator. Encryption scrambles the data using a secret key. Then authentication cuts the data into uniform chunks and assigns a cryptographic hash to each chunk of data, which is stored along with the data chunk in off-chip memory.

    When the accelerator fetches an encrypted chunk of data, known as an authentication block, it uses a secret key to recover and verify the original data before processing it.

    But the sizes of authentication blocks and tiles of data don’t match up, so there could be multiple tiles in one block, or a tile could be split between two blocks. The accelerator can’t arbitrarily grab a fraction of an authentication block, so it may end up grabbing extra data, which uses additional energy and slows down computation.

    Plus, the accelerator still must run the cryptographic operation on each authentication block, adding even more computational cost.

    An efficient search engine

    With SecureLoop, the MIT researchers sought a method that could identify the fastest and most energy efficient accelerator schedule — one that minimizes the number of times the device needs to access off-chip memory to grab extra blocks of data because of encryption and authentication.  

    They began by augmenting an existing search engine Emer and his collaborators previously developed, called Timeloop. First, they added a model that could account for the additional computation needed for encryption and authentication.

    Then, they reformulated the search problem into a simple mathematical expression, which enables SecureLoop to find the ideal authentical block size in a much more efficient manner than searching through all possible options.

    “Depending on how you assign this block, the amount of unnecessary traffic might increase or decrease. If you assign the cryptographic block cleverly, then you can just fetch a small amount of additional data,” Lee says.

    Finally, they incorporated a heuristic technique that ensures SecureLoop identifies a schedule which maximizes the performance of the entire deep neural network, rather than only a single layer.

    At the end, the search engine outputs an accelerator schedule, which includes the data tiling strategy and the size of the authentication blocks, that provides the best possible speed and energy efficiency for a specific neural network.

    “The design spaces for these accelerators are huge. What Kyungmi did was figure out some very pragmatic ways to make that search tractable so she could find good solutions without needing to exhaustively search the space,” says Emer.

    When tested in a simulator, SecureLoop identified schedules that were up to 33.2 percent faster and exhibited 50.2 percent better energy delay product (a metric related to energy efficiency) than other methods that didn’t consider security.

    The researchers also used SecureLoop to explore how the design space for accelerators changes when security is considered. They learned that allocating a bit more of the chip’s area for the cryptographic engine and sacrificing some space for on-chip memory can lead to better performance, Lee says.

    In the future, the researchers want to use SecureLoop to find accelerator designs that are resilient to side-channel attacks, which occur when an attacker has access to physical hardware. For instance, an attacker could monitor the power consumption pattern of a device to obtain secret information, even if the data have been encrypted. They are also extending SecureLoop so it could be applied to other kinds of computation.

    This work is funded, in part, by Samsung Electronics and the Korea Foundation for Advanced Studies. More

  • in

    New techniques efficiently accelerate sparse tensors for massive AI models

    Researchers from MIT and NVIDIA have developed two techniques that accelerate the processing of sparse tensors, a type of data structure that’s used for high-performance computing tasks. The complementary techniques could result in significant improvements to the performance and energy-efficiency of systems like the massive machine-learning models that drive generative artificial intelligence.

    Tensors are data structures used by machine-learning models. Both of the new methods seek to efficiently exploit what’s known as sparsity — zero values — in the tensors. When processing these tensors, one can skip over the zeros and save on both computation and memory. For instance, anything multiplied by zero is zero, so it can skip that operation. And it can compress the tensor (zeros don’t need to be stored) so a larger portion can be stored in on-chip memory.

    However, there are several challenges to exploiting sparsity. Finding the nonzero values in a large tensor is no easy task. Existing approaches often limit the locations of nonzero values by enforcing a sparsity pattern to simplify the search, but this limits the variety of sparse tensors that can be processed efficiently.

    Another challenge is that the number of nonzero values can vary in different regions of the tensor. This makes it difficult to determine how much space is required to store different regions in memory. To make sure the region fits, more space is often allocated than is needed, causing the storage buffer to be underutilized. This increases off-chip memory traffic, which increases energy consumption.

    The MIT and NVIDIA researchers crafted two solutions to address these problems. For one, they developed a technique that allows the hardware to efficiently find the nonzero values for a wider variety of sparsity patterns.

    For the other solution, they created a method that can handle the case where the data do not fit in memory, which increases the utilization of the storage buffer and reduces off-chip memory traffic.

    Both methods boost the performance and reduce the energy demands of hardware accelerators specifically designed to speed up the processing of sparse tensors.

    “Typically, when you use more specialized or domain-specific hardware accelerators, you lose the flexibility that you would get from a more general-purpose processor, like a CPU. What stands out with these two works is that we show that you can still maintain flexibility and adaptability while being specialized and efficient,” says Vivienne Sze, associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the Research Laboratory of Electronics (RLE), and co-senior author of papers on both advances.

    Her co-authors include lead authors Yannan Nellie Wu PhD ’23 and Zi Yu Xue, an electrical engineering and computer science graduate student; and co-senior author Joel Emer, an MIT professor of the practice in computer science and electrical engineering and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), as well as others at NVIDIA. Both papers will be presented at the IEEE/ACM International Symposium on Microarchitecture.

    HighLight: Efficiently finding zero values

    Sparsity can arise in the tensor for a variety of reasons. For example, researchers sometimes “prune” unnecessary pieces of the machine-learning models by replacing some values in the tensor with zeros, creating sparsity. The degree of sparsity (percentage of zeros) and the locations of the zeros can vary for different models.

    To make it easier to find the remaining nonzero values in a model with billions of individual values, researchers often restrict the location of the nonzero values so they fall into a certain pattern. However, each hardware accelerator is typically designed to support one specific sparsity pattern, limiting its flexibility.  

    By contrast, the hardware accelerator the MIT researchers designed, called HighLight, can handle a wide variety of sparsity patterns and still perform well when running models that don’t have any zero values.

    They use a technique they call “hierarchical structured sparsity” to efficiently represent a wide variety of sparsity patterns that are composed of several simple sparsity patterns. This approach divides the values in a tensor into smaller blocks, where each block has its own simple, sparsity pattern (perhaps two zeros and two nonzeros in a block with four values).

    Then, they combine the blocks into a hierarchy, where each collection of blocks also has its own simple, sparsity pattern (perhaps one zero block and three nonzero blocks in a level with four blocks). They continue combining blocks into larger levels, but the patterns remain simple at each step.

    This simplicity enables HighLight to more efficiently find and skip zeros, so it can take full advantage of the opportunity to cut excess computation. On average, their accelerator design had about six times better energy-delay product (a metric related to energy efficiency) than other approaches.

    “In the end, the HighLight accelerator is able to efficiently accelerate dense models because it does not introduce a lot of overhead, and at the same time it is able to exploit workloads with different amounts of zero values based on hierarchical structured sparsity,” Wu explains.

    In the future, she and her collaborators want to apply hierarchical structured sparsity to more types of machine-learning models and different types of tensors in the models.

    Tailors and Swiftiles: Effectively “overbooking” to accelerate workloads

    Researchers can also leverage sparsity to more efficiently move and process data on a computer chip.

    Since the tensors are often larger than what can be stored in the memory buffer on chip, the chip only grabs and processes a chunk of the tensor at a time. The chunks are called tiles.

    To maximize the utilization of that buffer and limit the number of times the chip must access off-chip memory, which often dominates energy consumption and limits processing speed, researchers seek to use the largest tile that will fit into the buffer.

    But in a sparse tensor, many of the data values are zero, so an even larger tile can fit into the buffer than one might expect based on its capacity. Zero values don’t need to be stored.

    But the number of zero values can vary across different regions of the tensor, so they can also vary for each tile. This makes it difficult to determine a tile size that will fit in the buffer. As a result, existing approaches often conservatively assume there are no zeros and end up selecting a smaller tile, which results in wasted blank spaces in the buffer.

    To address this uncertainty, the researchers propose the use of “overbooking” to allow them to increase the tile size, as well as a way to tolerate it if the tile doesn’t fit the buffer.

    The same way an airline overbooks tickets for a flight, if all the passengers show up, the airline must compensate the ones who are bumped from the plane. But usually all the passengers don’t show up.

    In a sparse tensor, a tile size can be chosen such that usually the tiles will have enough zeros that most still fit into the buffer. But occasionally, a tile will have more nonzero values than will fit. In this case, those data are bumped out of the buffer.

    The researchers enable the hardware to only re-fetch the bumped data without grabbing and processing the entire tile again. They modify the “tail end” of the buffer to handle this, hence the name of this technique, Tailors.

    Then they also created an approach for finding the size for tiles that takes advantage of overbooking. This method, called Swiftiles, swiftly estimates the ideal tile size so that a specific percentage of tiles, set by the user, are overbooked. (The names “Tailors” and “Swiftiles” pay homage to Taylor Swift, whose recent Eras tour was fraught with overbooked presale codes for tickets).

    Swiftiles reduces the number of times the hardware needs to check the tensor to identify an ideal tile size, saving on computation. The combination of Tailors and Swiftiles more than doubles the speed while requiring only half the energy demands of existing hardware accelerators which cannot handle overbooking.

    “Swiftiles allows us to estimate how large these tiles need to be without requiring multiple iterations to refine the estimate. This only works because overbooking is supported. Even if you are off by a decent amount, you can still extract a fair bit of speedup because of the way the non-zeros are distributed,” Xue says.

    In the future, the researchers want to apply the idea of overbooking to other aspects in computer architecture and also work to improve the process for estimating the optimal level of overbooking.

    This research is funded, in part, by the MIT AI Hardware Program. More