More stories

  • in

    Synthetic imagery sets new bar in AI training efficiency

    Data is the new soil, and in this fertile new ground, MIT researchers are planting more than just pixels. By using synthetic images to train machine learning models, a team of scientists recently surpassed results obtained from traditional “real-image” training methods. 

    At the core of the approach is a system called StableRep, which doesn’t just use any synthetic images; it generates them through ultra-popular text-to-image models like Stable Diffusion. It’s like creating worlds with words. 

    So what’s in StableRep’s secret sauce? A strategy called “multi-positive contrastive learning.”

    “We’re teaching the model to learn more about high-level concepts through context and variance, not just feeding it data,” says Lijie Fan, MIT PhD student in electrical engineering, affiliate of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), lead researcher on the work. “When multiple images, all generated from the same text, all treated as depictions of the same underlying thing, the model dives deeper into the concepts behind the images, say the object, not just their pixels.”

    This approach considers multiple images spawned from identical text prompts as positive pairs, providing additional information during training, not just adding more diversity but specifying to the vision system which images are alike and which are different. Remarkably, StableRep outshone the prowess of top-tier models trained on real images, such as SimCLR and CLIP, in extensive datasets.

    “While StableRep helps mitigate the challenges of data acquisition in machine learning, it also ushers in a stride towards a new era of AI training techniques. The capacity to produce high-caliber, diverse synthetic images on command could help curtail cumbersome expenses and resources,” says Fan. 

    The process of data collection has never been straightforward. Back in the 1990s, researchers had to manually capture photographs to assemble datasets for objects and faces. The 2000s saw individuals scouring the internet for data. However, this raw, uncurated data often contained discrepancies when compared to real-world scenarios and reflected societal biases, presenting a distorted view of reality. The task of cleansing datasets through human intervention is not only expensive, but also exceedingly challenging. Imagine, though, if this arduous data collection could be distilled down to something as simple as issuing a command in natural language. 

    A pivotal aspect of StableRep’s triumph is the adjustment of the “guidance scale” in the generative model, which ensures a delicate balance between the synthetic images’ diversity and fidelity. When finely tuned, synthetic images used in training these self-supervised models were found to be as effective, if not more so, than real images.

    Taking it a step forward, language supervision was added to the mix, creating an enhanced variant: StableRep+. When trained with 20 million synthetic images, StableRep+ not only achieved superior accuracy but also displayed remarkable efficiency compared to CLIP models trained with a staggering 50 million real images.

    Yet, the path ahead isn’t without its potholes. The researchers candidly address several limitations, including the current slow pace of image generation, semantic mismatches between text prompts and the resultant images, potential amplification of biases, and complexities in image attribution, all of which are imperative to address for future advancements. Another issue is that StableRep requires first training the generative model on large-scale real data. The team acknowledges that starting with real data remains a necessity; however, when you have a good generative model, you can repurpose it for new tasks, like training recognition models and visual representations. 

    The team notes that they haven’t gotten around the need to start with real data; it’s just that once you have a good generative model you can repurpose it for new tasks, like training recognition models and visual representations. 

    While StableRep offers a good solution by diminishing the dependency on vast real-image collections, it brings to the fore concerns regarding hidden biases within the uncurated data used for these text-to-image models. The choice of text prompts, integral to the image synthesis process, is not entirely free from bias, “indicating the essential role of meticulous text selection or possible human curation,” says Fan. 

    “Using the latest text-to-image models, we’ve gained unprecedented control over image generation, allowing for a diverse range of visuals from a single text input. This surpasses real-world image collection in efficiency and versatility. It proves especially useful in specialized tasks, like balancing image variety in long-tail recognition, presenting a practical supplement to using real images for training,” says Fan. “Our work signifies a step forward in visual learning, towards the goal of offering cost-effective training alternatives while highlighting the need for ongoing improvements in data quality and synthesis.”

    “One dream of generative model learning has long been to be able to generate data useful for discriminative model training,” says Google DeepMind researcher and University of Toronto professor of computer science David Fleet, who was not involved in the paper. “While we have seen some signs of life, the dream has been elusive, especially on large-scale complex domains like high-resolution images. This paper provides compelling evidence, for the first time to my knowledge, that the dream is becoming a reality. They show that contrastive learning from massive amounts of synthetic image data can produce representations that outperform those learned from real data at scale, with the potential to improve myriad downstream vision tasks.”

    Fan is joined by Yonglong Tian PhD ’22 as lead authors of the paper, as well as MIT associate professor of electrical engineering and computer science and CSAIL principal investigator Phillip Isola; Google researcher and OpenAI technical staff member Huiwen Chang; and Google staff research scientist Dilip Krishnan. The team will present StableRep at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in New Orleans. More

  • in

    Rewarding excellence in open data

    The second annual MIT Prize for Open Data, which included a $2,500 cash prize, was recently awarded to 10 individual and group research projects. Presented jointly by the School of Science and the MIT Libraries, the prize highlights the value of open data — research data that is openly accessible and reusable — at the Institute. The prize winners and 12 honorable mention recipients were honored at the Open Data @ MIT event held Oct. 24 at Hayden Library. 

    Conceived by Chris Bourg, director of MIT Libraries, and Rebecca Saxe, associate dean of the School of Science and the John W. Jarve (1978) Professor of Brain and Cognitive Sciences, the prize program was launched in 2022. It recognizes MIT-affiliated researchers who use or share open data, create infrastructure for open data sharing, or theorize about open data. Nominations were solicited from across the Institute, with a focus on trainees: undergraduate and graduate students, postdocs, and research staff. 

    “The prize is explicitly aimed at early-career researchers,” says Bourg. “Supporting and encouraging the next generation of researchers will help ensure that the future of scholarship is characterized by a norm of open sharing.”

    The 2023 awards were presented at a celebratory event held during International Open Access Week. Winners gave five-minute presentations on their projects and the role that open data plays in their research. The program also included remarks from Bourg and Anne White, School of Engineering Distinguished Professor of Engineering, vice provost, and associate vice president for research administration. White reflected on the ways in which MIT has demonstrated its values with the open sharing of research and scholarship and acknowledged the efforts of the honorees and advocates gathered at the event: “Thank you for the active role you’re all playing in building a culture of openness in research,” she said. “It benefits us all.” 

    Winners were chosen from more than 80 nominees, representing all five MIT schools, the MIT Schwarzman College of Computing, and several research centers across the Institute. A committee composed of faculty, staff, and graduate students made the selections:

    Hammaad Adam, graduate student in the Institute for Data, Systems, and Society, accepted on behalf of the team behind Organ Retrieval and Collection of Health Information for Donation (ORCHID), the first ever multi-center dataset dedicated to the organ procurement process. ORCHID provides the first opportunity to quantitatively analyze organ procurement organization decisions and identify operational inefficiencies.
    Adam Atanas, postdoc in the Department of Brain and Cognitive Sciences (BCS), and Jungsoo Kim, graduate student in BCS, created WormWideWeb.org. The site, allowing researchers to easily browse and download C. elegans whole-brain datasets, will be useful to C. elegans neuroscientists and theoretical/computational neuroscientists. 
    Paul Berube, research scientist in the Department of Civil and Environmental Engineering, and Steven Biller, assistant professor of biological sciences at Wellesley College, won for “Unlocking Marine Microbiomes with Open Data.” Open data of genomes and metagenomes for marine ecosystems, with a focus on cyanobacteria, leverage the power of contemporaneous data from GEOTRACES and other long-standing ocean time-series programs to provide underlying information to answer questions about marine ecosystem function. 
    Jack Cavanagh, Sarah Kopper, and Diana Horvath of the Abdul Latif Jameel Poverty Action Lab (J-PAL) were recognized for J-PAL’s Data Publication Infrastructure, which includes a trusted repository of open-access datasets, a dedicated team of data curators, and coding tools and training materials to help other teams publish data in an efficient and ethical manner. 
    Jerome Patrick Cruz, graduate student in the Department of Political Science, won for OpenAudit, leveraging advances in natural language processing and machine learning to make data in public audit reports more usable for academics and policy researchers, as well as governance practitioners, watchdogs, and reformers. This work was done in collaboration with colleagues at Ateneo de Manila University in the Philippines. 
    Undergraduate student Daniel Kurlander created a tool for planetary scientists to rapidly access and filter images of the comet 67P/Churyumov-Gerasimenko. The web-based tool enables searches by location and other properties, does not require a time-intensive download of a massive dataset, allows analysis of the data independent of the speed of one’s computer, and does not require installation of a complex set of programs. 
    Halie Olson, postdoc in BCS, was recognized for sharing data from a functional magnetic resonance imaging (fMRI) study on language processing. The study used video clips from “Sesame Street” in which researchers manipulated the comprehensibility of the speech stream, allowing them to isolate a “language response” in the brain.
    Thomas González Roberts, graduate student in the Department of Aeronautics and Astronautics, won for the International Telecommunication Union Compliance Assessment Monitor. This tool combats the heritage of secrecy in outer space operations by creating human- and machine-readable datasets that succinctly describe the international agreements that govern satellite operations. 
    Melissa Kline Struhl, research scientist in BCS, was recognized for Children Helping Science, a free, open-source platform for remote studies with babies and children that makes it possible for researchers at more than 100 institutions to conduct reproducible studies. 
    JS Tan, graduate student in the Department of Urban Studies and Planning, developed the Collective Action in Tech Archive in collaboration with Nataliya Nedzhvetskaya of the University of California at Berkeley. It is an open database of all publicly recorded collective actions taken by workers in the global tech industry. 
    A complete list of winning projects and honorable mentions, including links to the research data, is available on the MIT Libraries website. More

  • in

    Technique enables AI on edge devices to keep learning over time

    Personalized deep-learning models can enable artificial intelligence chatbots that adapt to understand a user’s accent or smart keyboards that continuously update to better predict the next word based on someone’s typing history. This customization requires constant fine-tuning of a machine-learning model with new data.

    Because smartphones and other edge devices lack the memory and computational power necessary for this fine-tuning process, user data are typically uploaded to cloud servers where the model is updated. But data transmission uses a great deal of energy, and sending sensitive user data to a cloud server poses a security risk.  

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere developed a technique that enables deep-learning models to efficiently adapt to new sensor data directly on an edge device.

    Their on-device training method, called PockEngine, determines which parts of a huge machine-learning model need to be updated to improve accuracy, and only stores and computes with those specific pieces. It performs the bulk of these computations while the model is being prepared, before runtime, which minimizes computational overhead and boosts the speed of the fine-tuning process.    

    When compared to other methods, PockEngine significantly sped up on-device training, performing up to 15 times faster on some hardware platforms. Moreover, PockEngine didn’t cause models to have any dip in accuracy. The researchers also found that their fine-tuning method enabled a popular AI chatbot to answer complex questions more accurately.

    “On-device fine-tuning can enable better privacy, lower costs, customization ability, and also lifelong learning, but it is not easy. Everything has to happen with a limited number of resources. We want to be able to run not only inference but also training on an edge device. With PockEngine, now we can,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, a distinguished scientist at NVIDIA, and senior author of an open-access paper describing PockEngine.

    Han is joined on the paper by lead author Ligeng Zhu, an EECS graduate student, as well as others at MIT, the MIT-IBM Watson AI Lab, and the University of California San Diego. The paper was recently presented at the IEEE/ACM International Symposium on Microarchitecture.

    Layer by layer

    Deep-learning models are based on neural networks, which comprise many interconnected layers of nodes, or “neurons,” that process data to make a prediction. When the model is run, a process called inference, a data input (such as an image) is passed from layer to layer until the prediction (perhaps the image label) is output at the end. During inference, each layer no longer needs to be stored after it processes the input.

    But during training and fine-tuning, the model undergoes a process known as backpropagation. In backpropagation, the output is compared to the correct answer, and then the model is run in reverse. Each layer is updated as the model’s output gets closer to the correct answer.

    Because each layer may need to be updated, the entire model and intermediate results must be stored, making fine-tuning more memory demanding than inference

    However, not all layers in the neural network are important for improving accuracy. And even for layers that are important, the entire layer may not need to be updated. Those layers, and pieces of layers, don’t need to be stored. Furthermore, one may not need to go all the way back to the first layer to improve accuracy — the process could be stopped somewhere in the middle.

    PockEngine takes advantage of these factors to speed up the fine-tuning process and cut down on the amount of computation and memory required.

    The system first fine-tunes each layer, one at a time, on a certain task and measures the accuracy improvement after each individual layer. In this way, PockEngine identifies the contribution of each layer, as well as trade-offs between accuracy and fine-tuning cost, and automatically determines the percentage of each layer that needs to be fine-tuned.

    “This method matches the accuracy very well compared to full back propagation on different tasks and different neural networks,” Han adds.

    A pared-down model

    Conventionally, the backpropagation graph is generated during runtime, which involves a great deal of computation. Instead, PockEngine does this during compile time, while the model is being prepared for deployment.

    PockEngine deletes bits of code to remove unnecessary layers or pieces of layers, creating a pared-down graph of the model to be used during runtime. It then performs other optimizations on this graph to further improve efficiency.

    Since all this only needs to be done once, it saves on computational overhead for runtime.

    “It is like before setting out on a hiking trip. At home, you would do careful planning — which trails are you going to go on, which trails are you going to ignore. So then at execution time, when you are actually hiking, you already have a very careful plan to follow,” Han explains.

    When they applied PockEngine to deep-learning models on different edge devices, including Apple M1 Chips and the digital signal processors common in many smartphones and Raspberry Pi computers, it performed on-device training up to 15 times faster, without any drop in accuracy. PockEngine also significantly slashed the amount of memory required for fine-tuning.

    The team also applied the technique to the large language model Llama-V2. With large language models, the fine-tuning process involves providing many examples, and it’s crucial for the model to learn how to interact with users, Han says. The process is also important for models tasked with solving complex problems or reasoning about solutions.

    For instance, Llama-V2 models that were fine-tuned using PockEngine answered the question “What was Michael Jackson’s last album?” correctly, while models that weren’t fine-tuned failed. PockEngine cut the time it took for each iteration of the fine-tuning process from about seven seconds to less than one second on a NVIDIA Jetson Orin, an edge GPU platform.

    In the future, the researchers want to use PockEngine to fine-tune even larger models designed to process text and images together.

    “This work addresses growing efficiency challenges posed by the adoption of large AI models such as LLMs across diverse applications in many different industries. It not only holds promise for edge applications that incorporate larger models, but also for lowering the cost of maintaining and updating large AI models in the cloud,” says Ehry MacRostie, a senior manager in Amazon’s Artificial General Intelligence division who was not involved in this study but works with MIT on related AI research through the MIT-Amazon Science Hub.

    This work was supported, in part, by the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT-Amazon Science Hub, the National Science Foundation (NSF), and the Qualcomm Innovation Fellowship. More

  • in

    2023-24 Takeda Fellows: Advancing research at the intersection of AI and health

    The School of Engineering has selected 13 new Takeda Fellows for the 2023-24 academic year. With support from Takeda, the graduate students will conduct pathbreaking research ranging from remote health monitoring for virtual clinical trials to ingestible devices for at-home, long-term diagnostics.

    Now in its fourth year, the MIT-Takeda Program, a collaboration between MIT’s School of Engineering and Takeda, fuels the development and application of artificial intelligence capabilities to benefit human health and drug development. Part of the Abdul Latif Jameel Clinic for Machine Learning in Health, the program coalesces disparate disciplines, merges theory and practical implementation, combines algorithm and hardware innovations, and creates multidimensional collaborations between academia and industry.

    The 2023-24 Takeda Fellows are:

    Adam Gierlach

    Adam Gierlach is a PhD candidate in the Department of Electrical Engineering and Computer Science. Gierlach’s work combines innovative biotechnology with machine learning to create ingestible devices for advanced diagnostics and delivery of therapeutics. In his previous work, Gierlach developed a non-invasive, ingestible device for long-term gastric recordings in free-moving patients. With the support of a Takeda Fellowship, he will build on this pathbreaking work by developing smart, energy-efficient, ingestible devices powered by application-specific integrated circuits for at-home, long-term diagnostics. These revolutionary devices — capable of identifying, characterizing, and even correcting gastrointestinal diseases — represent the leading edge of biotechnology. Gierlach’s innovative contributions will help to advance fundamental research on the enteric nervous system and help develop a better understanding of gut-brain axis dysfunctions in Parkinson’s disease, autism spectrum disorder, and other prevalent disorders and conditions.

    Vivek Gopalakrishnan

    Vivek Gopalakrishnan is a PhD candidate in the Harvard-MIT Program in Health Sciences and Technology. Gopalakrishnan’s goal is to develop biomedical machine-learning methods to improve the study and treatment of human disease. Specifically, he employs computational modeling to advance new approaches for minimally invasive, image-guided neurosurgery, offering a safe alternative to open brain and spinal procedures. With the support of a Takeda Fellowship, Gopalakrishnan will develop real-time computer vision algorithms that deliver high-quality, 3D intraoperative image guidance by extracting and fusing information from multimodal neuroimaging data. These algorithms could allow surgeons to reconstruct 3D neurovasculature from X-ray angiography, thereby enhancing the precision of device deployment and enabling more accurate localization of healthy versus pathologic anatomy.

    Hao He

    Hao He is a PhD candidate in the Department of Electrical Engineering and Computer Science. His research interests lie at the intersection of generative AI, machine learning, and their applications in medicine and human health, with a particular emphasis on passive, continuous, remote health monitoring to support virtual clinical trials and health-care management. More specifically, He aims to develop trustworthy AI models that promote equitable access and deliver fair performance independent of race, gender, and age. In his past work, He has developed monitoring systems applied in clinical studies of Parkinson’s disease, Alzheimer’s disease, and epilepsy. Supported by a Takeda Fellowship, He will develop a novel technology for the passive monitoring of sleep stages (using radio signaling) that seeks to address existing gaps in performance across different demographic groups. His project will tackle the problem of imbalance in available datasets and account for intrinsic differences across subpopulations, using generative AI and multi-modality/multi-domain learning, with the goal of learning robust features that are invariant to different subpopulations. He’s work holds great promise for delivering advanced, equitable health-care services to all people and could significantly impact health care and AI.

    Chengyi Long

    Chengyi Long is a PhD candidate in the Department of Civil and Environmental Engineering. Long’s interdisciplinary research integrates the methodology of physics, mathematics, and computer science to investigate questions in ecology. Specifically, Long is developing a series of potentially groundbreaking techniques to explain and predict the temporal dynamics of ecological systems, including human microbiota, which are essential subjects in health and medical research. His current work, supported by a Takeda Fellowship, is focused on developing a conceptual, mathematical, and practical framework to understand the interplay between external perturbations and internal community dynamics in microbial systems, which may serve as a key step toward finding bio solutions to health management. A broader perspective of his research is to develop AI-assisted platforms to anticipate the changing behavior of microbial systems, which may help to differentiate between healthy and unhealthy hosts and design probiotics for the prevention and mitigation of pathogen infections. By creating novel methods to address these issues, Long’s research has the potential to offer powerful contributions to medicine and global health.

    Omar Mohd

    Omar Mohd is a PhD candidate in the Department of Electrical Engineering and Computer Science. Mohd’s research is focused on developing new technologies for the spatial profiling of microRNAs, with potentially important applications in cancer research. Through innovative combinations of micro-technologies and AI-enabled image analysis to measure the spatial variations of microRNAs within tissue samples, Mohd hopes to gain new insights into drug resistance in cancer. This work, supported by a Takeda Fellowship, falls within the emerging field of spatial transcriptomics, which seeks to understand cancer and other diseases by examining the relative locations of cells and their contents within tissues. The ultimate goal of Mohd’s current project is to find multidimensional patterns in tissues that may have prognostic value for cancer patients. One valuable component of his work is an open-source AI program developed with collaborators at Beth Israel Deaconess Medical Center and Harvard Medical School to auto-detect cancer epithelial cells from other cell types in a tissue sample and to correlate their abundance with the spatial variations of microRNAs. Through his research, Mohd is making innovative contributions at the interface of microsystem technology, AI-based image analysis, and cancer treatment, which could significantly impact medicine and human health.

    Sanghyun Park

    Sanghyun Park is a PhD candidate in the Department of Mechanical Engineering. Park specializes in the integration of AI and biomedical engineering to address complex challenges in human health. Drawing on his expertise in polymer physics, drug delivery, and rheology, his research focuses on the pioneering field of in-situ forming implants (ISFIs) for drug delivery. Supported by a Takeda Fellowship, Park is currently developing an injectable formulation designed for long-term drug delivery. The primary goal of his research is to unravel the compaction mechanism of drug particles in ISFI formulations through comprehensive modeling and in-vitro characterization studies utilizing advanced AI tools. He aims to gain a thorough understanding of this unique compaction mechanism and apply it to drug microcrystals to achieve properties optimal for long-term drug delivery. Beyond these fundamental studies, Park’s research also focuses on translating this knowledge into practical applications in a clinical setting through animal studies specifically aimed at extending drug release duration and improving mechanical properties. The innovative use of AI in developing advanced drug delivery systems, coupled with Park’s valuable insights into the compaction mechanism, could contribute to improving long-term drug delivery. This work has the potential to pave the way for effective management of chronic diseases, benefiting patients, clinicians, and the pharmaceutical industry.

    Huaiyao Peng

    Huaiyao Peng is a PhD candidate in the Department of Biological Engineering. Peng’s research interests are focused on engineered tissue, microfabrication platforms, cancer metastasis, and the tumor microenvironment. Specifically, she is advancing novel AI techniques for the development of pre-cancer organoid models of high-grade serous ovarian cancer (HGSOC), an especially lethal and difficult-to-treat cancer, with the goal of gaining new insights into progression and effective treatments. Peng’s project, supported by a Takeda Fellowship, will be one of the first to use cells from serous tubal intraepithelial carcinoma lesions found in the fallopian tubes of many HGSOC patients. By examining the cellular and molecular changes that occur in response to treatment with small molecule inhibitors, she hopes to identify potential biomarkers and promising therapeutic targets for HGSOC, including personalized treatment options for HGSOC patients, ultimately improving their clinical outcomes. Peng’s work has the potential to bring about important advances in cancer treatment and spur innovative new applications of AI in health care. 

    Priyanka Raghavan

    Priyanka Raghavan is a PhD candidate in the Department of Chemical Engineering. Raghavan’s research interests lie at the frontier of predictive chemistry, integrating computational and experimental approaches to build powerful new predictive tools for societally important applications, including drug discovery. Specifically, Raghavan is developing novel models to predict small-molecule substrate reactivity and compatibility in regimes where little data is available (the most realistic regimes). A Takeda Fellowship will enable Raghavan to push the boundaries of her research, making innovative use of low-data and multi-task machine learning approaches, synthetic chemistry, and robotic laboratory automation, with the goal of creating an autonomous, closed-loop system for the discovery of high-yielding organic small molecules in the context of underexplored reactions. Raghavan’s work aims to identify new, versatile reactions to broaden a chemist’s synthetic toolbox with novel scaffolds and substrates that could form the basis of essential drugs. Her work has the potential for far-reaching impacts in early-stage, small-molecule discovery and could help make the lengthy drug-discovery process significantly faster and cheaper.

    Zhiye Song

    Zhiye “Zoey” Song is a PhD candidate in the Department of Electrical Engineering and Computer Science. Song’s research integrates cutting-edge approaches in machine learning (ML) and hardware optimization to create next-generation, wearable medical devices. Specifically, Song is developing novel approaches for the energy-efficient implementation of ML computation in low-power medical devices, including a wearable ultrasound “patch” that captures and processes images for real-time decision-making capabilities. Her recent work, conducted in collaboration with clinicians, has centered on bladder volume monitoring; other potential applications include blood pressure monitoring, muscle diagnosis, and neuromodulation. With the support of a Takeda Fellowship, Song will build on that promising work and pursue key improvements to existing wearable device technologies, including developing low-compute and low-memory ML algorithms and low-power chips to enable ML on smart wearable devices. The technologies emerging from Song’s research could offer exciting new capabilities in health care, enabling powerful and cost-effective point-of-care diagnostics and expanding individual access to autonomous and continuous medical monitoring.

    Peiqi Wang

    Peiqi Wang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Wang’s research aims to develop machine learning methods for learning and interpretation from medical images and associated clinical data to support clinical decision-making. He is developing a multimodal representation learning approach that aligns knowledge captured in large amounts of medical image and text data to transfer this knowledge to new tasks and applications. Supported by a Takeda Fellowship, Wang will advance this promising line of work to build robust tools that interpret images, learn from sparse human feedback, and reason like doctors, with potentially major benefits to important stakeholders in health care.

    Oscar Wu

    Haoyang “Oscar” Wu is a PhD candidate in the Department of Chemical Engineering. Wu’s research integrates quantum chemistry and deep learning methods to accelerate the process of small-molecule screening in the development of new drugs. By identifying and automating reliable methods for finding transition state geometries and calculating barrier heights for new reactions, Wu’s work could make it possible to conduct the high-throughput ab initio calculations of reaction rates needed to screen the reactivity of large numbers of active pharmaceutical ingredients (APIs). A Takeda Fellowship will support his current project to: (1) develop open-source software for high-throughput quantum chemistry calculations, focusing on the reactivity of drug-like molecules, and (2) develop deep learning models that can quantitatively predict the oxidative stability of APIs. The tools and insights resulting from Wu’s research could help to transform and accelerate the drug-discovery process, offering significant benefits to the pharmaceutical and medical fields and to patients.

    Soojung Yang

    Soojung Yang is a PhD candidate in the Department of Materials Science and Engineering. Yang’s research applies cutting-edge methods in geometric deep learning and generative modeling, along with atomistic simulations, to better understand and model protein dynamics. Specifically, Yang is developing novel tools in generative AI to explore protein conformational landscapes that offer greater speed and detail than physics-based simulations at a substantially lower cost. With the support of a Takeda Fellowship, she will build upon her successful work on the reverse transformation of coarse-grained proteins to the all-atom resolution, aiming to build machine-learning models that bridge multiple size scales of protein conformation diversity (all-atom, residue-level, and domain-level). Yang’s research holds the potential to provide a powerful and widely applicable new tool for researchers who seek to understand the complex protein functions at work in human diseases and to design drugs to treat and cure those diseases.

    Yuzhe Yang

    Yuzhe Yang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Yang’s research interests lie at the intersection of machine learning and health care. In his past and current work, Yang has developed and applied innovative machine-learning models that address key challenges in disease diagnosis and tracking. His many notable achievements include the creation of one of the first machine learning-based solutions using nocturnal breathing signals to detect Parkinson’s disease (PD), estimate disease severity, and track PD progression. With the support of a Takeda Fellowship, Yang will expand this promising work to develop an AI-based diagnosis model for Alzheimer’s disease (AD) using sleep-breathing data that is significantly more reliable, flexible, and economical than current diagnostic tools. This passive, in-home, contactless monitoring system — resembling a simple home Wi-Fi router — will also enable remote disease assessment and continuous progression tracking. Yang’s groundbreaking work has the potential to advance the diagnosis and treatment of prevalent diseases like PD and AD, and it offers exciting possibilities for addressing many health challenges with reliable, affordable machine-learning tools.  More

  • in

    Accelerating AI tasks while preserving data security

    With the proliferation of computationally intensive machine-learning applications, such as chatbots that perform real-time language translation, device manufacturers often incorporate specialized hardware components to rapidly move and process the massive amounts of data these systems demand.

    Choosing the best design for these components, known as deep neural network accelerators, is challenging because they can have an enormous range of design options. This difficult problem becomes even thornier when a designer seeks to add cryptographic operations to keep data safe from attackers.

    Now, MIT researchers have developed a search engine that can efficiently identify optimal designs for deep neural network accelerators, that preserve data security while boosting performance.

    Their search tool, known as SecureLoop, is designed to consider how the addition of data encryption and authentication measures will impact the performance and energy usage of the accelerator chip. An engineer could use this tool to obtain the optimal design of an accelerator tailored to their neural network and machine-learning task.

    When compared to conventional scheduling techniques that don’t consider security, SecureLoop can improve performance of accelerator designs while keeping data protected.  

    Using SecureLoop could help a user improve the speed and performance of demanding AI applications, such as autonomous driving or medical image classification, while ensuring sensitive user data remains safe from some types of attacks.

    “If you are interested in doing a computation where you are going to preserve the security of the data, the rules that we used before for finding the optimal design are now broken. So all of that optimization needs to be customized for this new, more complicated set of constraints. And that is what [lead author] Kyungmi has done in this paper,” says Joel Emer, an MIT professor of the practice in computer science and electrical engineering and co-author of a paper on SecureLoop.

    Emer is joined on the paper by lead author Kyungmi Lee, an electrical engineering and computer science graduate student; Mengjia Yan, the Homer A. Burnell Career Development Assistant Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Anantha Chandrakasan, dean of the MIT School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. The research will be presented at the IEEE/ACM International Symposium on Microarchitecture.

    “The community passively accepted that adding cryptographic operations to an accelerator will introduce overhead. They thought it would introduce only a small variance in the design trade-off space. But, this is a misconception. In fact, cryptographic operations can significantly distort the design space of energy-efficient accelerators. Kyungmi did a fantastic job identifying this issue,” Yan adds.

    Secure acceleration

    A deep neural network consists of many layers of interconnected nodes that process data. Typically, the output of one layer becomes the input of the next layer. Data are grouped into units called tiles for processing and transfer between off-chip memory and the accelerator. Each layer of the neural network can have its own data tiling configuration.

    A deep neural network accelerator is a processor with an array of computational units that parallelizes operations, like multiplication, in each layer of the network. The accelerator schedule describes how data are moved and processed.

    Since space on an accelerator chip is at a premium, most data are stored in off-chip memory and fetched by the accelerator when needed. But because data are stored off-chip, they are vulnerable to an attacker who could steal information or change some values, causing the neural network to malfunction.

    “As a chip manufacturer, you can’t guarantee the security of external devices or the overall operating system,” Lee explains.

    Manufacturers can protect data by adding authenticated encryption to the accelerator. Encryption scrambles the data using a secret key. Then authentication cuts the data into uniform chunks and assigns a cryptographic hash to each chunk of data, which is stored along with the data chunk in off-chip memory.

    When the accelerator fetches an encrypted chunk of data, known as an authentication block, it uses a secret key to recover and verify the original data before processing it.

    But the sizes of authentication blocks and tiles of data don’t match up, so there could be multiple tiles in one block, or a tile could be split between two blocks. The accelerator can’t arbitrarily grab a fraction of an authentication block, so it may end up grabbing extra data, which uses additional energy and slows down computation.

    Plus, the accelerator still must run the cryptographic operation on each authentication block, adding even more computational cost.

    An efficient search engine

    With SecureLoop, the MIT researchers sought a method that could identify the fastest and most energy efficient accelerator schedule — one that minimizes the number of times the device needs to access off-chip memory to grab extra blocks of data because of encryption and authentication.  

    They began by augmenting an existing search engine Emer and his collaborators previously developed, called Timeloop. First, they added a model that could account for the additional computation needed for encryption and authentication.

    Then, they reformulated the search problem into a simple mathematical expression, which enables SecureLoop to find the ideal authentical block size in a much more efficient manner than searching through all possible options.

    “Depending on how you assign this block, the amount of unnecessary traffic might increase or decrease. If you assign the cryptographic block cleverly, then you can just fetch a small amount of additional data,” Lee says.

    Finally, they incorporated a heuristic technique that ensures SecureLoop identifies a schedule which maximizes the performance of the entire deep neural network, rather than only a single layer.

    At the end, the search engine outputs an accelerator schedule, which includes the data tiling strategy and the size of the authentication blocks, that provides the best possible speed and energy efficiency for a specific neural network.

    “The design spaces for these accelerators are huge. What Kyungmi did was figure out some very pragmatic ways to make that search tractable so she could find good solutions without needing to exhaustively search the space,” says Emer.

    When tested in a simulator, SecureLoop identified schedules that were up to 33.2 percent faster and exhibited 50.2 percent better energy delay product (a metric related to energy efficiency) than other methods that didn’t consider security.

    The researchers also used SecureLoop to explore how the design space for accelerators changes when security is considered. They learned that allocating a bit more of the chip’s area for the cryptographic engine and sacrificing some space for on-chip memory can lead to better performance, Lee says.

    In the future, the researchers want to use SecureLoop to find accelerator designs that are resilient to side-channel attacks, which occur when an attacker has access to physical hardware. For instance, an attacker could monitor the power consumption pattern of a device to obtain secret information, even if the data have been encrypted. They are also extending SecureLoop so it could be applied to other kinds of computation.

    This work is funded, in part, by Samsung Electronics and the Korea Foundation for Advanced Studies. More

  • in

    New techniques efficiently accelerate sparse tensors for massive AI models

    Researchers from MIT and NVIDIA have developed two techniques that accelerate the processing of sparse tensors, a type of data structure that’s used for high-performance computing tasks. The complementary techniques could result in significant improvements to the performance and energy-efficiency of systems like the massive machine-learning models that drive generative artificial intelligence.

    Tensors are data structures used by machine-learning models. Both of the new methods seek to efficiently exploit what’s known as sparsity — zero values — in the tensors. When processing these tensors, one can skip over the zeros and save on both computation and memory. For instance, anything multiplied by zero is zero, so it can skip that operation. And it can compress the tensor (zeros don’t need to be stored) so a larger portion can be stored in on-chip memory.

    However, there are several challenges to exploiting sparsity. Finding the nonzero values in a large tensor is no easy task. Existing approaches often limit the locations of nonzero values by enforcing a sparsity pattern to simplify the search, but this limits the variety of sparse tensors that can be processed efficiently.

    Another challenge is that the number of nonzero values can vary in different regions of the tensor. This makes it difficult to determine how much space is required to store different regions in memory. To make sure the region fits, more space is often allocated than is needed, causing the storage buffer to be underutilized. This increases off-chip memory traffic, which increases energy consumption.

    The MIT and NVIDIA researchers crafted two solutions to address these problems. For one, they developed a technique that allows the hardware to efficiently find the nonzero values for a wider variety of sparsity patterns.

    For the other solution, they created a method that can handle the case where the data do not fit in memory, which increases the utilization of the storage buffer and reduces off-chip memory traffic.

    Both methods boost the performance and reduce the energy demands of hardware accelerators specifically designed to speed up the processing of sparse tensors.

    “Typically, when you use more specialized or domain-specific hardware accelerators, you lose the flexibility that you would get from a more general-purpose processor, like a CPU. What stands out with these two works is that we show that you can still maintain flexibility and adaptability while being specialized and efficient,” says Vivienne Sze, associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the Research Laboratory of Electronics (RLE), and co-senior author of papers on both advances.

    Her co-authors include lead authors Yannan Nellie Wu PhD ’23 and Zi Yu Xue, an electrical engineering and computer science graduate student; and co-senior author Joel Emer, an MIT professor of the practice in computer science and electrical engineering and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), as well as others at NVIDIA. Both papers will be presented at the IEEE/ACM International Symposium on Microarchitecture.

    HighLight: Efficiently finding zero values

    Sparsity can arise in the tensor for a variety of reasons. For example, researchers sometimes “prune” unnecessary pieces of the machine-learning models by replacing some values in the tensor with zeros, creating sparsity. The degree of sparsity (percentage of zeros) and the locations of the zeros can vary for different models.

    To make it easier to find the remaining nonzero values in a model with billions of individual values, researchers often restrict the location of the nonzero values so they fall into a certain pattern. However, each hardware accelerator is typically designed to support one specific sparsity pattern, limiting its flexibility.  

    By contrast, the hardware accelerator the MIT researchers designed, called HighLight, can handle a wide variety of sparsity patterns and still perform well when running models that don’t have any zero values.

    They use a technique they call “hierarchical structured sparsity” to efficiently represent a wide variety of sparsity patterns that are composed of several simple sparsity patterns. This approach divides the values in a tensor into smaller blocks, where each block has its own simple, sparsity pattern (perhaps two zeros and two nonzeros in a block with four values).

    Then, they combine the blocks into a hierarchy, where each collection of blocks also has its own simple, sparsity pattern (perhaps one zero block and three nonzero blocks in a level with four blocks). They continue combining blocks into larger levels, but the patterns remain simple at each step.

    This simplicity enables HighLight to more efficiently find and skip zeros, so it can take full advantage of the opportunity to cut excess computation. On average, their accelerator design had about six times better energy-delay product (a metric related to energy efficiency) than other approaches.

    “In the end, the HighLight accelerator is able to efficiently accelerate dense models because it does not introduce a lot of overhead, and at the same time it is able to exploit workloads with different amounts of zero values based on hierarchical structured sparsity,” Wu explains.

    In the future, she and her collaborators want to apply hierarchical structured sparsity to more types of machine-learning models and different types of tensors in the models.

    Tailors and Swiftiles: Effectively “overbooking” to accelerate workloads

    Researchers can also leverage sparsity to more efficiently move and process data on a computer chip.

    Since the tensors are often larger than what can be stored in the memory buffer on chip, the chip only grabs and processes a chunk of the tensor at a time. The chunks are called tiles.

    To maximize the utilization of that buffer and limit the number of times the chip must access off-chip memory, which often dominates energy consumption and limits processing speed, researchers seek to use the largest tile that will fit into the buffer.

    But in a sparse tensor, many of the data values are zero, so an even larger tile can fit into the buffer than one might expect based on its capacity. Zero values don’t need to be stored.

    But the number of zero values can vary across different regions of the tensor, so they can also vary for each tile. This makes it difficult to determine a tile size that will fit in the buffer. As a result, existing approaches often conservatively assume there are no zeros and end up selecting a smaller tile, which results in wasted blank spaces in the buffer.

    To address this uncertainty, the researchers propose the use of “overbooking” to allow them to increase the tile size, as well as a way to tolerate it if the tile doesn’t fit the buffer.

    The same way an airline overbooks tickets for a flight, if all the passengers show up, the airline must compensate the ones who are bumped from the plane. But usually all the passengers don’t show up.

    In a sparse tensor, a tile size can be chosen such that usually the tiles will have enough zeros that most still fit into the buffer. But occasionally, a tile will have more nonzero values than will fit. In this case, those data are bumped out of the buffer.

    The researchers enable the hardware to only re-fetch the bumped data without grabbing and processing the entire tile again. They modify the “tail end” of the buffer to handle this, hence the name of this technique, Tailors.

    Then they also created an approach for finding the size for tiles that takes advantage of overbooking. This method, called Swiftiles, swiftly estimates the ideal tile size so that a specific percentage of tiles, set by the user, are overbooked. (The names “Tailors” and “Swiftiles” pay homage to Taylor Swift, whose recent Eras tour was fraught with overbooked presale codes for tickets).

    Swiftiles reduces the number of times the hardware needs to check the tensor to identify an ideal tile size, saving on computation. The combination of Tailors and Swiftiles more than doubles the speed while requiring only half the energy demands of existing hardware accelerators which cannot handle overbooking.

    “Swiftiles allows us to estimate how large these tiles need to be without requiring multiple iterations to refine the estimate. This only works because overbooking is supported. Even if you are off by a decent amount, you can still extract a fair bit of speedup because of the way the non-zeros are distributed,” Xue says.

    In the future, the researchers want to apply the idea of overbooking to other aspects in computer architecture and also work to improve the process for estimating the optimal level of overbooking.

    This research is funded, in part, by the MIT AI Hardware Program. More

  • in

    Making genetic prediction models more inclusive

    While any two human genomes are about 99.9 percent identical, genetic variation in the remaining 0.1 percent plays an important role in shaping human diversity, including a person’s risk for developing certain diseases.

    Measuring the cumulative effect of these small genetic differences can provide an estimate of an individual’s genetic risk for a particular disease or their likelihood of having a particular trait. However, the majority of models used to generate these “polygenic scores” are based on studies done in people of European descent, and do not accurately gauge the risk for people of non-European ancestry or people whose genomes contain a mixture of chromosome regions inherited from previously isolated populations, also known as admixed ancestry.

    In an effort to make these genetic scores more inclusive, MIT researchers have created a new model that takes into account genetic information from people from a wider diversity of genetic ancestries across the world. Using this model, they showed that they could increase the accuracy of genetics-based predictions for a variety of traits, especially for people from populations that have been traditionally underrepresented in genetic studies.

    “For people of African ancestry, our model proved to be about 60 percent more accurate on average,” says Manolis Kellis, a professor of computer science in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and a member of the Broad Institute of MIT and Harvard. “For people of admixed genetic backgrounds more broadly, who have been excluded from most previous models, the accuracy of our model increased by an average of about 18 percent.”

    The researchers hope their more inclusive modeling approach could help improve health outcomes for a wider range of people and promote health equity by spreading the benefits of genomic sequencing more widely across the globe.

    “What we have done is created a method that allows you to be much more accurate for admixed and ancestry-diverse individuals, and ensure the results and the benefits of human genetics research are equally shared by everyone,” says MIT postdoc Yosuke Tanigawa, the lead and co-corresponding author of the paper, which appears today in open-access form in the American Journal of Human Genetics. The researchers have made all of their data publicly available for the broader scientific community to use.

    More inclusive models

    The work builds on the Human Genome Project, which mapped all of the genes found in the human genome, and on subsequent large-scale, cohort-based studies of how genetic variants in the human genome are linked to disease risk and other differences between individuals.

    These studies showed that the effect of any individual genetic variant on its own is typically very small. Together, these small effects add up and influence the risk of developing heart disease or diabetes, having a stroke, or being diagnosed with psychiatric disorders such as schizophrenia.

    “We have hundreds of thousands of genetic variants that are associated with complex traits, each of which is individually playing a weak effect, but together they are beginning to be predictive for disease predispositions,” Kellis says.

    However, most of these genome-wide association studies included few people of non-European descent, so polygenic risk models based on them translate poorly to non-European populations. People from different geographic areas can have different patterns of genetic variation, shaped by stochastic drift, population history, and environmental factors — for example, in people of African descent, genetic variants that protect against malaria are more common than in other populations. Those variants also affect other traits involving the immune system, such as counts of neutrophils, a type of immune cell. That variation would not be well-captured in a model based on genetic analysis of people of European ancestry alone.

    “If you are an individual of African descent, of Latin American descent, of Asian descent, then you are currently being left out by the system,” Kellis says. “This inequity in the utilization of genetic information for predicting risk of patients can cause unnecessary burden, unnecessary deaths, and unnecessary lack of prevention, and that’s where our work comes in.”

    Some researchers have begun trying to address these disparities by creating distinct models for people of European descent, of African descent, or of Asian descent. These emerging approaches assign individuals to distinct genetic ancestry groups, aggregate the data to create an association summary, and make genetic prediction models. However, these approaches still don’t represent people of admixed genetic backgrounds well.

    “Our approach builds on the previous work without requiring researchers to assign individuals or local genomic segments of individuals to predefined distinct genetic ancestry groups,” Tanigawa says. “Instead, we develop a single model for everybody by directly working on individuals across the continuum of their genetic ancestries.”

    In creating their new model, the MIT team used computational and statistical techniques that enabled them to study each individual’s unique genetic profile instead of grouping individuals by population. This methodological advancement allowed the researchers to include people of admixed ancestry, who made up nearly 10 percent of the UK Biobank dataset used for this study and currently account for about one in seven newborns in the United States.

    “Because we work at the individual level, there is no need for computing summary-level data for different populations,” Kellis says. “Thus, we did not need to exclude individuals of admixed ancestry, increasing our power by including more individuals and representing contributions from all populations in our combined model.”

    Better predictions

    To create their new model, the researchers used genetic data from more than 280,000 people, which was collected by UK Biobank, a large-scale biomedical database and research resource containing de-identified genetic, lifestyle, and health information from half a million U.K. participants. Using another set of about 81,000 held-out individuals from the UK Biobank, the researchers evaluated their model across 60 traits, which included traits related to body size and shape, such as height and body mass index, as well as blood traits such as white blood cell count and red blood cell count, which also have a genetic basis.

    The researchers found that, compared to models trained only on European-ancestry individuals, their model’s predictions are more accurate for all genetic ancestry groups. The most notable gain was for people of African ancestry, who showed 61 percent average improvements, even though they only made up about 1.5 percent of samples in UK Biobank. The researchers also saw improvements of 11 percent for people of South Asian descent and 5 percent for white British people. Predictions for people of admixed ancestry improved by about 18 percent.

    “When you bring all the individuals together in the training set, everybody contributes to the training of the polygenic score modeling on equal footing,” Tanigawa says. “Combined with increasingly more inclusive data collection efforts, our method can help leverage these efforts to improve predictive accuracy for all.”

    The MIT team hopes its approach can eventually be incorporated into tests of an individual’s risk of a variety of diseases. Such tests could be combined with conventional risk factors and used to help doctors diagnose disease or to help people manage their risk for certain diseases before they develop.

    “Our work highlights the power of diversity, equity, and inclusion efforts in the context of genomics research,” Tanigawa says.

    The researchers now hope to add even more data to their model, including data from the United States, and to apply it to additional traits that they didn’t analyze in this study.

    “This is just the start,” Kellis says. “We can’t wait to see more people join our effort to propel inclusive human genetics research.”

    The research was funded by the National Institutes of Health. More

  • in

    To excel at engineering design, generative AI must learn to innovate, study finds

    ChatGPT and other deep generative models are proving to be uncanny mimics. These AI supermodels can churn out poems, finish symphonies, and create new videos and images by automatically learning from millions of examples of previous works. These enormously powerful and versatile tools excel at generating new content that resembles everything they’ve seen before.

    But as MIT engineers say in a new study, similarity isn’t enough if you want to truly innovate in engineering tasks.

    “Deep generative models (DGMs) are very promising, but also inherently flawed,” says study author Lyle Regenwetter, a mechanical engineering graduate student at MIT. “The objective of these models is to mimic a dataset. But as engineers and designers, we often don’t want to create a design that’s already out there.”

    He and his colleagues make the case that if mechanical engineers want help from AI to generate novel ideas and designs, they will have to first refocus those models beyond “statistical similarity.”

    “The performance of a lot of these models is explicitly tied to how statistically similar a generated sample is to what the model has already seen,” says co-author Faez Ahmed, assistant professor of mechanical engineering at MIT. “But in design, being different could be important if you want to innovate.”

    In their study, Ahmed and Regenwetter reveal the pitfalls of deep generative models when they are tasked with solving engineering design problems. In a case study of bicycle frame design, the team shows that these models end up generating new frames that mimic previous designs but falter on engineering performance and requirements.

    When the researchers presented the same bicycle frame problem to DGMs that they specifically designed with engineering-focused objectives, rather than only statistical similarity, these models produced more innovative, higher-performing frames.

    The team’s results show that similarity-focused AI models don’t quite translate when applied to engineering problems. But, as the researchers also highlight in their study, with some careful planning of task-appropriate metrics, AI models could be an effective design “co-pilot.”

    “This is about how AI can help engineers be better and faster at creating innovative products,” Ahmed says. “To do that, we have to first understand the requirements. This is one step in that direction.”

    The team’s new study appeared recently online, and will be in the December print edition of the journal Computer Aided Design. The research is a collaboration between computer scientists at MIT-IBM Watson AI Lab and mechanical engineers in MIT’s DeCoDe Lab. The study’s co-authors include Akash Srivastava and Dan Gutreund at the MIT-IBM Watson AI Lab.

    Framing a problem

    As Ahmed and Regenwetter write, DGMs are “powerful learners, boasting unparalleled ability” to process huge amounts of data. DGM is a broad term for any machine-learning model that is trained to learn distribution of data and then use that to generate new, statistically similar content. The enormously popular ChatGPT is one type of deep generative model known as a large language model, or LLM, which incorporates natural language processing capabilities into the model to enable the app to generate realistic imagery and speech in response to conversational queries. Other popular models for image generation include DALL-E and Stable Diffusion.

    Because of their ability to learn from data and generate realistic samples, DGMs have been increasingly applied in multiple engineering domains. Designers have used deep generative models to draft new aircraft frames, metamaterial designs, and optimal geometries for bridges and cars. But for the most part, the models have mimicked existing designs, without improving the performance on existing designs.

    “Designers who are working with DGMs are sort of missing this cherry on top, which is adjusting the model’s training objective to focus on the design requirements,” Regenwetter says. “So, people end up generating designs that are very similar to the dataset.”

    In the new study, he outlines the main pitfalls in applying DGMs to engineering tasks, and shows that the fundamental objective of standard DGMs does not take into account specific design requirements. To illustrate this, the team invokes a simple case of bicycle frame design and demonstrates that problems can crop up as early as the initial learning phase. As a model learns from thousands of existing bike frames of various sizes and shapes, it might consider two frames of similar dimensions to have similar performance, when in fact a small disconnect in one frame — too small to register as a significant difference in statistical similarity metrics — makes the frame much weaker than the other, visually similar frame.

    Beyond “vanilla”
    An animation depicting transformations across common bicycle designs. Credit: Courtesy of the researchers

    The researchers carried the bicycle example forward to see what designs a DGM would actually generate after having learned from existing designs. They first tested a conventional “vanilla” generative adversarial network, or GAN — a model that has widely been used in image and text synthesis, and is tuned simply to generate statistically similar content. They trained the model on a dataset of thousands of bicycle frames, including commercially manufactured designs and less conventional, one-off frames designed by hobbyists.

    Once the model learned from the data, the researchers asked it to generate hundreds of new bike frames. The model produced realistic designs that resembled existing frames. But none of the designs showed significant improvement in performance, and some were even a bit inferior, with heavier, less structurally sound frames.

    The team then carried out the same test with two other DGMs that were specifically designed for engineering tasks. The first model is one that Ahmed previously developed to generate high-performing airfoil designs. He built this model to prioritize statistical similarity as well as functional performance. When applied to the bike frame task, this model generated realistic designs that also were lighter and stronger than existing designs. But it also produced physically “invalid” frames, with components that didn’t quite fit or overlapped in physically impossible ways.

    “We saw designs that were significantly better than the dataset, but also designs that were geometrically incompatible because the model wasn’t focused on meeting design constraints,” Regenwetter says.

    The last model the team tested was one that Regenwetter built to generate new geometric structures. This model was designed with the same priorities as the previous models, with the added ingredient of design constraints, and prioritizing physically viable frames, for instance, with no disconnections or overlapping bars. This last model produced the highest-performing designs, that were also physically feasible.

    “We found that when a model goes beyond statistical similarity, it can come up with designs that are better than the ones that are already out there,” Ahmed says. “It’s a proof of what AI can do, if it is explicitly trained on a design task.”

    For instance, if DGMs can be built with other priorities, such as performance, design constraints, and novelty, Ahmed foresees “numerous engineering fields, such as molecular design and civil infrastructure, would greatly benefit. By shedding light on the potential pitfalls of relying solely on statistical similarity, we hope to inspire new pathways and strategies in generative AI applications outside multimedia.” More