More stories

  • in

    Multiple AI models help robots execute complex plans more transparently

    Your daily to-do list is likely pretty straightforward: wash the dishes, buy groceries, and other minutiae. It’s unlikely you wrote out “pick up the first dirty dish,” or “wash that plate with a sponge,” because each of these miniature steps within the chore feels intuitive. While we can routinely complete each step without much thought, a robot requires a complex plan that involves more detailed outlines.

    MIT’s Improbable AI Lab, a group within the Computer Science and Artificial Intelligence Laboratory (CSAIL), has offered these machines a helping hand with a new multimodal framework: Compositional Foundation Models for Hierarchical Planning (HiP), which develops detailed, feasible plans with the expertise of three different foundation models. Like OpenAI’s GPT-4, the foundation model that ChatGPT and Bing Chat were built upon, these foundation models are trained on massive quantities of data for applications like generating images, translating text, and robotics.Unlike RT2 and other multimodal models that are trained on paired vision, language, and action data, HiP uses three different foundation models each trained on different data modalities. Each foundation model captures a different part of the decision-making process and then works together when it’s time to make decisions. HiP removes the need for access to paired vision, language, and action data, which is difficult to obtain. HiP also makes the reasoning process more transparent.

    What’s considered a daily chore for a human can be a robot’s “long-horizon goal” — an overarching objective that involves completing many smaller steps first — requiring sufficient data to plan, understand, and execute objectives. While computer vision researchers have attempted to build monolithic foundation models for this problem, pairing language, visual, and action data is expensive. Instead, HiP represents a different, multimodal recipe: a trio that cheaply incorporates linguistic, physical, and environmental intelligence into a robot.

    “Foundation models do not have to be monolithic,” says NVIDIA AI researcher Jim Fan, who was not involved in the paper. “This work decomposes the complex task of embodied agent planning into three constituent models: a language reasoner, a visual world model, and an action planner. It makes a difficult decision-making problem more tractable and transparent.”The team believes that their system could help these machines accomplish household chores, such as putting away a book or placing a bowl in the dishwasher. Additionally, HiP could assist with multistep construction and manufacturing tasks, like stacking and placing different materials in specific sequences.Evaluating HiP

    The CSAIL team tested HiP’s acuity on three manipulation tasks, outperforming comparable frameworks. The system reasoned by developing intelligent plans that adapt to new information.

    First, the researchers requested that it stack different-colored blocks on each other and then place others nearby. The catch: Some of the correct colors weren’t present, so the robot had to place white blocks in a color bowl to paint them. HiP often adjusted to these changes accurately, especially compared to state-of-the-art task planning systems like Transformer BC and Action Diffuser, by adjusting its plans to stack and place each square as needed.

    Another test: arranging objects such as candy and a hammer in a brown box while ignoring other items. Some of the objects it needed to move were dirty, so HiP adjusted its plans to place them in a cleaning box, and then into the brown container. In a third demonstration, the bot was able to ignore unnecessary objects to complete kitchen sub-goals such as opening a microwave, clearing a kettle out of the way, and turning on a light. Some of the prompted steps had already been completed, so the robot adapted by skipping those directions.

    A three-pronged hierarchy

    HiP’s three-pronged planning process operates as a hierarchy, with the ability to pre-train each of its components on different sets of data, including information outside of robotics. At the bottom of that order is a large language model (LLM), which starts to ideate by capturing all the symbolic information needed and developing an abstract task plan. Applying the common sense knowledge it finds on the internet, the model breaks its objective into sub-goals. For example, “making a cup of tea” turns into “filling a pot with water,” “boiling the pot,” and the subsequent actions required.

    “All we want to do is take existing pre-trained models and have them successfully interface with each other,” says Anurag Ajay, a PhD student in the MIT Department of Electrical Engineering and Computer Science (EECS) and a CSAIL affiliate. “Instead of pushing for one model to do everything, we combine multiple ones that leverage different modalities of internet data. When used in tandem, they help with robotic decision-making and can potentially aid with tasks in homes, factories, and construction sites.”

    These models also need some form of “eyes” to understand the environment they’re operating in and correctly execute each sub-goal. The team used a large video diffusion model to augment the initial planning completed by the LLM, which collects geometric and physical information about the world from footage on the internet. In turn, the video model generates an observation trajectory plan, refining the LLM’s outline to incorporate new physical knowledge.This process, known as iterative refinement, allows HiP to reason about its ideas, taking in feedback at each stage to generate a more practical outline. The flow of feedback is similar to writing an article, where an author may send their draft to an editor, and with those revisions incorporated in, the publisher reviews for any last changes and finalizes.

    In this case, the top of the hierarchy is an egocentric action model, or a sequence of first-person images that infer which actions should take place based on its surroundings. During this stage, the observation plan from the video model is mapped over the space visible to the robot, helping the machine decide how to execute each task within the long-horizon goal. If a robot uses HiP to make tea, this means it will have mapped out exactly where the pot, sink, and other key visual elements are, and begin completing each sub-goal.Still, the multimodal work is limited by the lack of high-quality video foundation models. Once available, they could interface with HiP’s small-scale video models to further enhance visual sequence prediction and robot action generation. A higher-quality version would also reduce the current data requirements of the video models.That being said, the CSAIL team’s approach only used a tiny bit of data overall. Moreover, HiP was cheap to train and demonstrated the potential of using readily available foundation models to complete long-horizon tasks. “What Anurag has demonstrated is proof-of-concept of how we can take models trained on separate tasks and data modalities and combine them into models for robotic planning. In the future, HiP could be augmented with pre-trained models that can process touch and sound to make better plans,” says senior author Pulkit Agrawal, MIT assistant professor in EECS and director of the Improbable AI Lab. The group is also considering applying HiP to solving real-world long-horizon tasks in robotics.Ajay and Agrawal are lead authors on a paper describing the work. They are joined by MIT professors and CSAIL principal investigators Tommi Jaakkola, Joshua Tenenbaum, and Leslie Pack Kaelbling; CSAIL research affiliate and MIT-IBM AI Lab research manager Akash Srivastava; graduate students Seungwook Han and Yilun Du ’19; former postdoc Abhishek Gupta, who is now assistant professor at University of Washington; and former graduate student Shuang Li PhD ’23.

    The team’s work was supported, in part, by the National Science Foundation, the U.S. Defense Advanced Research Projects Agency, the U.S. Army Research Office, the U.S. Office of Naval Research Multidisciplinary University Research Initiatives, and the MIT-IBM Watson AI Lab. Their findings were presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS). More

  • in

    Technique could efficiently solve partial differential equations for numerous applications

    In fields such as physics and engineering, partial differential equations (PDEs) are used to model complex physical processes to generate insight into how some of the most complicated physical and natural systems in the world function.

    To solve these difficult equations, researchers use high-fidelity numerical solvers, which can be very time-consuming and computationally expensive to run. The current simplified alternative, data-driven surrogate models, compute the goal property of a solution to PDEs rather than the whole solution. Those are trained on a set of data that has been generated by the high-fidelity solver, to predict the output of the PDEs for new inputs. This is data-intensive and expensive because complex physical systems require a large number of simulations to generate enough data. 

    In a new paper, “Physics-enhanced deep surrogates for partial differential equations,” published in December in Nature Machine Intelligence, a new method is proposed for developing data-driven surrogate models for complex physical systems in such fields as mechanics, optics, thermal transport, fluid dynamics, physical chemistry, and climate models.

    The paper was authored by MIT’s professor of applied mathematics Steven G. Johnson along with Payel Das and Youssef Mroueh of the MIT-IBM Watson AI Lab and IBM Research; Chris Rackauckas of Julia Lab; and Raphaël Pestourie, a former MIT postdoc who is now at Georgia Tech. The authors call their method “physics-enhanced deep surrogate” (PEDS), which combines a low-fidelity, explainable physics simulator with a neural network generator. The neural network generator is trained end-to-end to match the output of the high-fidelity numerical solver.

    “My aspiration is to replace the inefficient process of trial and error with systematic, computer-aided simulation and optimization,” says Pestourie. “Recent breakthroughs in AI like the large language model of ChatGPT rely on hundreds of billions of parameters and require vast amounts of resources to train and evaluate. In contrast, PEDS is affordable to all because it is incredibly efficient in computing resources and has a very low barrier in terms of infrastructure needed to use it.”

    In the article, they show that PEDS surrogates can be up to three times more accurate than an ensemble of feedforward neural networks with limited data (approximately 1,000 training points), and reduce the training data needed by at least a factor of 100 to achieve a target error of 5 percent. Developed using the MIT-designed Julia programming language, this scientific machine-learning method is thus efficient in both computing and data.

    The authors also report that PEDS provides a general, data-driven strategy to bridge the gap between a vast array of simplified physical models with corresponding brute-force numerical solvers modeling complex systems. This technique offers accuracy, speed, data efficiency, and physical insights into the process.

    Says Pestourie, “Since the 2000s, as computing capabilities improved, the trend of scientific models has been to increase the number of parameters to fit the data better, sometimes at the cost of a lower predictive accuracy. PEDS does the opposite by choosing its parameters smartly. It leverages the technology of automatic differentiation to train a neural network that makes a model with few parameters accurate.”

    “The main challenge that prevents surrogate models from being used more widely in engineering is the curse of dimensionality — the fact that the needed data to train a model increases exponentially with the number of model variables,” says Pestourie. “PEDS reduces this curse by incorporating information from the data and from the field knowledge in the form of a low-fidelity model solver.”

    The researchers say that PEDS has the potential to revive a whole body of the pre-2000 literature dedicated to minimal models — intuitive models that PEDS could make more accurate while also being predictive for surrogate model applications.

    “The application of the PEDS framework is beyond what we showed in this study,” says Das. “Complex physical systems governed by PDEs are ubiquitous, from climate modeling to seismic modeling and beyond. Our physics-inspired fast and explainable surrogate models will be of great use in those applications, and play a complementary role to other emerging techniques, like foundation models.”

    The research was supported by the MIT-IBM Watson AI Lab and the U.S. Army Research Office through the Institute for Soldier Nanotechnologies.  More

  • in

    Automated system teaches users when to collaborate with an AI assistant

    Artificial intelligence models that pick out patterns in images can often do so better than human eyes — but not always. If a radiologist is using an AI model to help her determine whether a patient’s X-rays show signs of pneumonia, when should she trust the model’s advice and when should she ignore it?

    A customized onboarding process could help this radiologist answer that question, according to researchers at MIT and the MIT-IBM Watson AI Lab. They designed a system that teaches a user when to collaborate with an AI assistant.

    In this case, the training method might find situations where the radiologist trusts the model’s advice — except she shouldn’t because the model is wrong. The system automatically learns rules for how she should collaborate with the AI, and describes them with natural language.

    During onboarding, the radiologist practices collaborating with the AI using training exercises based on these rules, receiving feedback about her performance and the AI’s performance.

    The researchers found that this onboarding procedure led to about a 5 percent improvement in accuracy when humans and AI collaborated on an image prediction task. Their results also show that just telling the user when to trust the AI, without training, led to worse performance.

    Importantly, the researchers’ system is fully automated, so it learns to create the onboarding process based on data from the human and AI performing a specific task. It can also adapt to different tasks, so it can be scaled up and used in many situations where humans and AI models work together, such as in social media content moderation, writing, and programming.

    “So often, people are given these AI tools to use without any training to help them figure out when it is going to be helpful. That’s not what we do with nearly every other tool that people use — there is almost always some kind of tutorial that comes with it. But for AI, this seems to be missing. We are trying to tackle this problem from a methodological and behavioral perspective,” says Hussein Mozannar, a graduate student in the Social and Engineering Systems doctoral program within the Institute for Data, Systems, and Society (IDSS) and lead author of a paper about this training process.

    The researchers envision that such onboarding will be a crucial part of training for medical professionals.

    “One could imagine, for example, that doctors making treatment decisions with the help of AI will first have to do training similar to what we propose. We may need to rethink everything from continuing medical education to the way clinical trials are designed,” says senior author David Sontag, a professor of EECS, a member of the MIT-IBM Watson AI Lab and the MIT Jameel Clinic, and the leader of the Clinical Machine Learning Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Mozannar, who is also a researcher with the Clinical Machine Learning Group, is joined on the paper by Jimin J. Lee, an undergraduate in electrical engineering and computer science; Dennis Wei, a senior research scientist at IBM Research; and Prasanna Sattigeri and Subhro Das, research staff members at the MIT-IBM Watson AI Lab. The paper will be presented at the Conference on Neural Information Processing Systems.

    Training that evolves

    Existing onboarding methods for human-AI collaboration are often composed of training materials produced by human experts for specific use cases, making them difficult to scale up. Some related techniques rely on explanations, where the AI tells the user its confidence in each decision, but research has shown that explanations are rarely helpful, Mozannar says.

    “The AI model’s capabilities are constantly evolving, so the use cases where the human could potentially benefit from it are growing over time. At the same time, the user’s perception of the model continues changing. So, we need a training procedure that also evolves over time,” he adds.

    To accomplish this, their onboarding method is automatically learned from data. It is built from a dataset that contains many instances of a task, such as detecting the presence of a traffic light from a blurry image.

    The system’s first step is to collect data on the human and AI performing this task. In this case, the human would try to predict, with the help of AI, whether blurry images contain traffic lights.

    The system embeds these data points onto a latent space, which is a representation of data in which similar data points are closer together. It uses an algorithm to discover regions of this space where the human collaborates incorrectly with the AI. These regions capture instances where the human trusted the AI’s prediction but the prediction was wrong, and vice versa.

    Perhaps the human mistakenly trusts the AI when images show a highway at night.

    After discovering the regions, a second algorithm utilizes a large language model to describe each region as a rule, using natural language. The algorithm iteratively fine-tunes that rule by finding contrasting examples. It might describe this region as “ignore AI when it is a highway during the night.”

    These rules are used to build training exercises. The onboarding system shows an example to the human, in this case a blurry highway scene at night, as well as the AI’s prediction, and asks the user if the image shows traffic lights. The user can answer yes, no, or use the AI’s prediction.

    If the human is wrong, they are shown the correct answer and performance statistics for the human and AI on these instances of the task. The system does this for each region, and at the end of the training process, repeats the exercises the human got wrong.

    “After that, the human has learned something about these regions that we hope they will take away in the future to make more accurate predictions,” Mozannar says.

    Onboarding boosts accuracy

    The researchers tested this system with users on two tasks — detecting traffic lights in blurry images and answering multiple choice questions from many domains (such as biology, philosophy, computer science, etc.).

    They first showed users a card with information about the AI model, how it was trained, and a breakdown of its performance on broad categories. Users were split into five groups: Some were only shown the card, some went through the researchers’ onboarding procedure, some went through a baseline onboarding procedure, some went through the researchers’ onboarding procedure and were given recommendations of when they should or should not trust the AI, and others were only given the recommendations.

    Only the researchers’ onboarding procedure without recommendations improved users’ accuracy significantly, boosting their performance on the traffic light prediction task by about 5 percent without slowing them down. However, onboarding was not as effective for the question-answering task. The researchers believe this is because the AI model, ChatGPT, provided explanations with each answer that convey whether it should be trusted.

    But providing recommendations without onboarding had the opposite effect — users not only performed worse, they took more time to make predictions.

    “When you only give someone recommendations, it seems like they get confused and don’t know what to do. It derails their process. People also don’t like being told what to do, so that is a factor as well,” Mozannar says.

    Providing recommendations alone could harm the user if those recommendations are wrong, he adds. With onboarding, on the other hand, the biggest limitation is the amount of available data. If there aren’t enough data, the onboarding stage won’t be as effective, he says.

    In the future, he and his collaborators want to conduct larger studies to evaluate the short- and long-term effects of onboarding. They also want to leverage unlabeled data for the onboarding process, and find methods to effectively reduce the number of regions without omitting important examples.

    “People are adopting AI systems willy-nilly, and indeed AI offers great potential, but these AI agents still sometimes makes mistakes. Thus, it’s crucial for AI developers to devise methods that help humans know when it’s safe to rely on the AI’s suggestions,” says Dan Weld, professor emeritus at the Paul G. Allen School of Computer Science and Engineering at the University of Washington, who was not involved with this research. “Mozannar et al. have created an innovative method for identifying situations where the AI is trustworthy, and (importantly) to describe them to people in a way that leads to better human-AI team interactions.”

    This work is funded, in part, by the MIT-IBM Watson AI Lab. More

  • in

    AI accelerates problem-solving in complex scenarios

    While Santa Claus may have a magical sleigh and nine plucky reindeer to help him deliver presents, for companies like FedEx, the optimization problem of efficiently routing holiday packages is so complicated that they often employ specialized software to find a solution.

    This software, called a mixed-integer linear programming (MILP) solver, splits a massive optimization problem into smaller pieces and uses generic algorithms to try and find the best solution. However, the solver could take hours — or even days — to arrive at a solution.

    The process is so onerous that a company often must stop the software partway through, accepting a solution that is not ideal but the best that could be generated in a set amount of time.

    Researchers from MIT and ETH Zurich used machine learning to speed things up.

    They identified a key intermediate step in MILP solvers that has so many potential solutions it takes an enormous amount of time to unravel, which slows the entire process. The researchers employed a filtering technique to simplify this step, then used machine learning to find the optimal solution for a specific type of problem.

    Their data-driven approach enables a company to use its own data to tailor a general-purpose MILP solver to the problem at hand.

    This new technique sped up MILP solvers between 30 and 70 percent, without any drop in accuracy. One could use this method to obtain an optimal solution more quickly or, for especially complex problems, a better solution in a tractable amount of time.

    This approach could be used wherever MILP solvers are employed, such as by ride-hailing services, electric grid operators, vaccination distributors, or any entity faced with a thorny resource-allocation problem.

    “Sometimes, in a field like optimization, it is very common for folks to think of solutions as either purely machine learning or purely classical. I am a firm believer that we want to get the best of both worlds, and this is a really strong instantiation of that hybrid approach,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in Civil and Environmental Engineering (CEE), and a member of a member of the Laboratory for Information and Decision Systems (LIDS) and the Institute for Data, Systems, and Society (IDSS).

    Wu wrote the paper with co-lead authors Siriu Li, an IDSS graduate student, and Wenbin Ouyang, a CEE graduate student; as well as Max Paulus, a graduate student at ETH Zurich. The research will be presented at the Conference on Neural Information Processing Systems.

    Tough to solve

    MILP problems have an exponential number of potential solutions. For instance, say a traveling salesperson wants to find the shortest path to visit several cities and then return to their city of origin. If there are many cities which could be visited in any order, the number of potential solutions might be greater than the number of atoms in the universe.  

    “These problems are called NP-hard, which means it is very unlikely there is an efficient algorithm to solve them. When the problem is big enough, we can only hope to achieve some suboptimal performance,” Wu explains.

    An MILP solver employs an array of techniques and practical tricks that can achieve reasonable solutions in a tractable amount of time.

    A typical solver uses a divide-and-conquer approach, first splitting the space of potential solutions into smaller pieces with a technique called branching. Then, the solver employs a technique called cutting to tighten up these smaller pieces so they can be searched faster.

    Cutting uses a set of rules that tighten the search space without removing any feasible solutions. These rules are generated by a few dozen algorithms, known as separators, that have been created for different kinds of MILP problems. 

    Wu and her team found that the process of identifying the ideal combination of separator algorithms to use is, in itself, a problem with an exponential number of solutions.

    “Separator management is a core part of every solver, but this is an underappreciated aspect of the problem space. One of the contributions of this work is identifying the problem of separator management as a machine learning task to begin with,” she says.

    Shrinking the solution space

    She and her collaborators devised a filtering mechanism that reduces this separator search space from more than 130,000 potential combinations to around 20 options. This filtering mechanism draws on the principle of diminishing marginal returns, which says that the most benefit would come from a small set of algorithms, and adding additional algorithms won’t bring much extra improvement.

    Then they use a machine-learning model to pick the best combination of algorithms from among the 20 remaining options.

    This model is trained with a dataset specific to the user’s optimization problem, so it learns to choose algorithms that best suit the user’s particular task. Since a company like FedEx has solved routing problems many times before, using real data gleaned from past experience should lead to better solutions than starting from scratch each time.

    The model’s iterative learning process, known as contextual bandits, a form of reinforcement learning, involves picking a potential solution, getting feedback on how good it was, and then trying again to find a better solution.

    This data-driven approach accelerated MILP solvers between 30 and 70 percent without any drop in accuracy. Moreover, the speedup was similar when they applied it to a simpler, open-source solver and a more powerful, commercial solver.

    In the future, Wu and her collaborators want to apply this approach to even more complex MILP problems, where gathering labeled data to train the model could be especially challenging. Perhaps they can train the model on a smaller dataset and then tweak it to tackle a much larger optimization problem, she says. The researchers are also interested in interpreting the learned model to better understand the effectiveness of different separator algorithms.

    This research is supported, in part, by Mathworks, the National Science Foundation (NSF), the MIT Amazon Science Hub, and MIT’s Research Support Committee. More

  • in

    Search algorithm reveals nearly 200 new kinds of CRISPR systems

    Microbial sequence databases contain a wealth of information about enzymes and other molecules that could be adapted for biotechnology. But these databases have grown so large in recent years that they’ve become difficult to search efficiently for enzymes of interest.

    Now, scientists at the McGovern Institute for Brain Research at MIT, the Broad Institute of MIT and Harvard, and the National Center for Biotechnology Information (NCBI) at the National Institutes of Health have developed a new search algorithm that has identified 188 kinds of new rare CRISPR systems in bacterial genomes, encompassing thousands of individual systems. The work appears today in Science.

    The algorithm, which comes from the lab of pioneering CRISPR researcher Professor Feng Zhang, uses big-data clustering approaches to rapidly search massive amounts of genomic data. The team used their algorithm, called Fast Locality-Sensitive Hashing-based clustering (FLSHclust) to mine three major public databases that contain data from a wide range of unusual bacteria, including ones found in coal mines, breweries, Antarctic lakes, and dog saliva. The scientists found a surprising number and diversity of CRISPR systems, including ones that could make edits to DNA in human cells, others that can target RNA, and many with a variety of other functions.

    The new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems. They could also one day be used as diagnostics or serve as molecular records of activity inside cells.

    The researchers say their search highlights an unprecedented level of diversity and flexibility of CRISPR and that there are likely many more rare systems yet to be discovered as databases continue to grow.

    “Biodiversity is such a treasure trove, and as we continue to sequence more genomes and metagenomic samples, there is a growing need for better tools, like FLSHclust, to search that sequence space to find the molecular gems,” says Zhang, a co-senior author on the study and the James and Patricia Poitras Professor of Neuroscience at MIT with joint appointments in the departments of Brain and Cognitive Sciences and Biological Engineering. Zhang is also an investigator at the McGovern Institute for Brain Research at MIT, a core institute member at the Broad, and an investigator at the Howard Hughes Medical Institute. Eugene Koonin, a distinguished investigator at the NCBI, is co-senior author on the study as well.

    Searching for CRISPR

    CRISPR, which stands for clustered regularly interspaced short palindromic repeats, is a bacterial defense system that has been engineered into many tools for genome editing and diagnostics.

    To mine databases of protein and nucleic acid sequences for novel CRISPR systems, the researchers developed an algorithm based on an approach borrowed from the big data community. This technique, called locality-sensitive hashing, clusters together objects that are similar but not exactly identical. Using this approach allowed the team to probe billions of protein and DNA sequences — from the NCBI, its Whole Genome Shotgun database, and the Joint Genome Institute — in weeks, whereas previous methods that look for identical objects would have taken months. They designed their algorithm to look for genes associated with CRISPR.

    “This new algorithm allows us to parse through data in a time frame that’s short enough that we can actually recover results and make biological hypotheses,” says Soumya Kannan PhD ’23, who is a co-first author on the study. Kannan was a graduate student in Zhang’s lab when the study began and is currently a postdoc and Junior Fellow at Harvard University. Han Altae-Tran PhD ’23, a graduate student in Zhang’s lab during the study and currently a postdoc at the University of Washington, was the study’s other co-first author.

    “This is a testament to what you can do when you improve on the methods for exploration and use as much data as possible,” says Altae-Tran. “It’s really exciting to be able to improve the scale at which we search.”

    New systems

    In their analysis, Altae-Tran, Kannan, and their colleagues noticed that the thousands of CRISPR systems they found fell into a few existing and many new categories. They studied several of the new systems in greater detail in the lab.

    They found several new variants of known Type I CRISPR systems, which use a guide RNA that is 32 base pairs long rather than the 20-nucleotide guide of Cas9. Because of their longer guide RNAs, these Type I systems could potentially be used to develop more precise gene-editing technology that is less prone to off-target editing. Zhang’s team showed that two of these systems could make short edits in the DNA of human cells. And because these Type I systems are similar in size to CRISPR-Cas9, they could likely be delivered to cells in animals or humans using the same gene-delivery technologies being used today for CRISPR.

    One of the Type I systems also showed “collateral activity” — broad degradation of nucleic acids after the CRISPR protein binds its target. Scientists have used similar systems to make infectious disease diagnostics such as SHERLOCK, a tool capable of rapidly sensing a single molecule of DNA or RNA. Zhang’s team thinks the new systems could be adapted for diagnostic technologies as well.

    The researchers also uncovered new mechanisms of action for some Type IV CRISPR systems, and a Type VII system that precisely targets RNA, which could potentially be used in RNA editing. Other systems could potentially be used as recording tools — a molecular document of when a gene was expressed — or as sensors of specific activity in a living cell.

    Mining data

    The scientists say their algorithm could aid in the search for other biochemical systems. “This search algorithm could be used by anyone who wants to work with these large databases for studying how proteins evolve or discovering new genes,” Altae-Tran says.

    The researchers add that their findings illustrate not only how diverse CRISPR systems are, but also that most are rare and only found in unusual bacteria. “Some of these microbial systems were exclusively found in water from coal mines,” Kannan says. “If someone hadn’t been interested in that, we may never have seen those systems. Broadening our sampling diversity is really important to continue expanding the diversity of what we can discover.”

    This work was supported by the Howard Hughes Medical Institute; the K. Lisa Yang and Hock E. Tan Molecular Therapeutics Center at MIT; Broad Institute Programmable Therapeutics Gift Donors; The Pershing Square Foundation, William Ackman and Neri Oxman; James and Patricia Poitras; BT Charitable Foundation; Asness Family Foundation; Kenneth C. Griffin; the Phillips family; David Cheng; and Robert Metcalfe. More

  • in

    Technique enables AI on edge devices to keep learning over time

    Personalized deep-learning models can enable artificial intelligence chatbots that adapt to understand a user’s accent or smart keyboards that continuously update to better predict the next word based on someone’s typing history. This customization requires constant fine-tuning of a machine-learning model with new data.

    Because smartphones and other edge devices lack the memory and computational power necessary for this fine-tuning process, user data are typically uploaded to cloud servers where the model is updated. But data transmission uses a great deal of energy, and sending sensitive user data to a cloud server poses a security risk.  

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere developed a technique that enables deep-learning models to efficiently adapt to new sensor data directly on an edge device.

    Their on-device training method, called PockEngine, determines which parts of a huge machine-learning model need to be updated to improve accuracy, and only stores and computes with those specific pieces. It performs the bulk of these computations while the model is being prepared, before runtime, which minimizes computational overhead and boosts the speed of the fine-tuning process.    

    When compared to other methods, PockEngine significantly sped up on-device training, performing up to 15 times faster on some hardware platforms. Moreover, PockEngine didn’t cause models to have any dip in accuracy. The researchers also found that their fine-tuning method enabled a popular AI chatbot to answer complex questions more accurately.

    “On-device fine-tuning can enable better privacy, lower costs, customization ability, and also lifelong learning, but it is not easy. Everything has to happen with a limited number of resources. We want to be able to run not only inference but also training on an edge device. With PockEngine, now we can,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, a distinguished scientist at NVIDIA, and senior author of an open-access paper describing PockEngine.

    Han is joined on the paper by lead author Ligeng Zhu, an EECS graduate student, as well as others at MIT, the MIT-IBM Watson AI Lab, and the University of California San Diego. The paper was recently presented at the IEEE/ACM International Symposium on Microarchitecture.

    Layer by layer

    Deep-learning models are based on neural networks, which comprise many interconnected layers of nodes, or “neurons,” that process data to make a prediction. When the model is run, a process called inference, a data input (such as an image) is passed from layer to layer until the prediction (perhaps the image label) is output at the end. During inference, each layer no longer needs to be stored after it processes the input.

    But during training and fine-tuning, the model undergoes a process known as backpropagation. In backpropagation, the output is compared to the correct answer, and then the model is run in reverse. Each layer is updated as the model’s output gets closer to the correct answer.

    Because each layer may need to be updated, the entire model and intermediate results must be stored, making fine-tuning more memory demanding than inference

    However, not all layers in the neural network are important for improving accuracy. And even for layers that are important, the entire layer may not need to be updated. Those layers, and pieces of layers, don’t need to be stored. Furthermore, one may not need to go all the way back to the first layer to improve accuracy — the process could be stopped somewhere in the middle.

    PockEngine takes advantage of these factors to speed up the fine-tuning process and cut down on the amount of computation and memory required.

    The system first fine-tunes each layer, one at a time, on a certain task and measures the accuracy improvement after each individual layer. In this way, PockEngine identifies the contribution of each layer, as well as trade-offs between accuracy and fine-tuning cost, and automatically determines the percentage of each layer that needs to be fine-tuned.

    “This method matches the accuracy very well compared to full back propagation on different tasks and different neural networks,” Han adds.

    A pared-down model

    Conventionally, the backpropagation graph is generated during runtime, which involves a great deal of computation. Instead, PockEngine does this during compile time, while the model is being prepared for deployment.

    PockEngine deletes bits of code to remove unnecessary layers or pieces of layers, creating a pared-down graph of the model to be used during runtime. It then performs other optimizations on this graph to further improve efficiency.

    Since all this only needs to be done once, it saves on computational overhead for runtime.

    “It is like before setting out on a hiking trip. At home, you would do careful planning — which trails are you going to go on, which trails are you going to ignore. So then at execution time, when you are actually hiking, you already have a very careful plan to follow,” Han explains.

    When they applied PockEngine to deep-learning models on different edge devices, including Apple M1 Chips and the digital signal processors common in many smartphones and Raspberry Pi computers, it performed on-device training up to 15 times faster, without any drop in accuracy. PockEngine also significantly slashed the amount of memory required for fine-tuning.

    The team also applied the technique to the large language model Llama-V2. With large language models, the fine-tuning process involves providing many examples, and it’s crucial for the model to learn how to interact with users, Han says. The process is also important for models tasked with solving complex problems or reasoning about solutions.

    For instance, Llama-V2 models that were fine-tuned using PockEngine answered the question “What was Michael Jackson’s last album?” correctly, while models that weren’t fine-tuned failed. PockEngine cut the time it took for each iteration of the fine-tuning process from about seven seconds to less than one second on a NVIDIA Jetson Orin, an edge GPU platform.

    In the future, the researchers want to use PockEngine to fine-tune even larger models designed to process text and images together.

    “This work addresses growing efficiency challenges posed by the adoption of large AI models such as LLMs across diverse applications in many different industries. It not only holds promise for edge applications that incorporate larger models, but also for lowering the cost of maintaining and updating large AI models in the cloud,” says Ehry MacRostie, a senior manager in Amazon’s Artificial General Intelligence division who was not involved in this study but works with MIT on related AI research through the MIT-Amazon Science Hub.

    This work was supported, in part, by the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT-Amazon Science Hub, the National Science Foundation (NSF), and the Qualcomm Innovation Fellowship. More

  • in

    Generating opportunities with generative AI

    Talking with retail executives back in 2010, Rama Ramakrishnan came to two realizations. First, although retail systems that offered customers personalized recommendations were getting a great deal of attention, these systems often provided little payoff for retailers. Second, for many of the firms, most customers shopped only once or twice a year, so companies didn’t really know much about them.

    “But by being very diligent about noting down the interactions a customer has with a retailer or an e-commerce site, we can create a very nice and detailed composite picture of what that person does and what they care about,” says Ramakrishnan, professor of the practice at the MIT Sloan School of Management. “Once you have that, then you can apply proven algorithms from machine learning.”

    These realizations led Ramakrishnan to found CQuotient, a startup whose software has now become the foundation for Salesforce’s widely adopted AI e-commerce platform. “On Black Friday alone, CQuotient technology probably sees and interacts with over a billion shoppers on a single day,” he says.

    After a highly successful entrepreneurial career, in 2019 Ramakrishnan returned to MIT Sloan, where he had earned master’s and PhD degrees in operations research in the 1990s. He teaches students “not just how these amazing technologies work, but also how do you take these technologies and actually put them to use pragmatically in the real world,” he says.

    Additionally, Ramakrishnan enjoys participating in MIT executive education. “This is a great opportunity for me to convey the things that I have learned, but also as importantly, to learn what’s on the minds of these senior executives, and to guide them and nudge them in the right direction,” he says.

    For example, executives are understandably concerned about the need for massive amounts of data to train machine learning systems. He can now guide them to a wealth of models that are pre-trained for specific tasks. “The ability to use these pre-trained AI models, and very quickly adapt them to your particular business problem, is an incredible advance,” says Ramakrishnan.

    Rama Ramakrishnan – Utilizing AI in Real World Applications for Intelligent WorkVideo: MIT Industrial Liaison Program

    Understanding AI categories

    “AI is the quest to imbue computers with the ability to do cognitive tasks that typically only humans can do,” he says. Understanding the history of this complex, supercharged landscape aids in exploiting the technologies.

    The traditional approach to AI, which basically solved problems by applying if/then rules learned from humans, proved useful for relatively few tasks. “One reason is that we can do lots of things effortlessly, but if asked to explain how we do them, we can’t actually articulate how we do them,” Ramakrishnan comments. Also, those systems may be baffled by new situations that don’t match up to the rules enshrined in the software.

    Machine learning takes a dramatically different approach, with the software fundamentally learning by example. “You give it lots of examples of inputs and outputs, questions and answers, tasks and responses, and get the computer to automatically learn how to go from the input to the output,” he says. Credit scoring, loan decision-making, disease prediction, and demand forecasting are among the many tasks conquered by machine learning.

    But machine learning only worked well when the input data was structured, for instance in a spreadsheet. “If the input data was unstructured, such as images, video, audio, ECGs, or X-rays, it wasn’t very good at going from that to a predicted output,” Ramakrishnan says. That means humans had to manually structure the unstructured data to train the system.

    Around 2010 deep learning began to overcome that limitation, delivering the ability to directly work with unstructured input data, he says. Based on a longstanding AI strategy known as neural networks, deep learning became practical due to the global flood tide of data, the availability of extraordinarily powerful parallel processing hardware called graphics processing units (originally invented for video games) and advances in algorithms and math.

    Finally, within deep learning, the generative AI software packages appearing last year can create unstructured outputs, such as human-sounding text, images of dogs, and three-dimensional models. Large language models (LLMs) such as OpenAI’s ChatGPT go from text inputs to text outputs, while text-to-image models such as OpenAI’s DALL-E can churn out realistic-appearing images.

    Rama Ramakrishnan – Making Note of Little Data to Improve Customer ServiceVideo: MIT Industrial Liaison Program

    What generative AI can (and can’t) do

    Trained on the unimaginably vast text resources of the internet, a LLM’s “fundamental capability is to predict the next most likely, most plausible word,” Ramakrishnan says. “Then it attaches the word to the original sentence, predicts the next word again, and keeps on doing it.”

    “To the surprise of many, including a lot of researchers, an LLM can do some very complicated things,” he says. “It can compose beautifully coherent poetry, write Seinfeld episodes, and solve some kinds of reasoning problems. It’s really quite remarkable how next-word prediction can lead to these amazing capabilities.”

    “But you have to always keep in mind that what it is doing is not so much finding the correct answer to your question as finding a plausible answer your question,” Ramakrishnan emphasizes. Its content may be factually inaccurate, irrelevant, toxic, biased, or offensive.

    That puts the burden on users to make sure that the output is correct, relevant, and useful for the task at hand. “You have to make sure there is some way for you to check its output for errors and fix them before it goes out,” he says.

    Intense research is underway to find techniques to address these shortcomings, adds Ramakrishnan, who expects many innovative tools to do so.

    Finding the right corporate roles for LLMs

    Given the astonishing progress in LLMs, how should industry think about applying the software to tasks such as generating content?

    First, Ramakrishnan advises, consider costs: “Is it a much less expensive effort to have a draft that you correct, versus you creating the whole thing?” Second, if the LLM makes a mistake that slips by, and the mistaken content is released to the outside world, can you live with the consequences?

    “If you have an application which satisfies both considerations, then it’s good to do a pilot project to see whether these technologies can actually help you with that particular task,” says Ramakrishnan. He stresses the need to treat the pilot as an experiment rather than as a normal IT project.

    Right now, software development is the most mature corporate LLM application. “ChatGPT and other LLMs are text-in, text-out, and a software program is just text-out,” he says. “Programmers can go from English text-in to Python text-out, as well as you can go from English-to-English or English-to-German. There are lots of tools which help you write code using these technologies.”

    Of course, programmers must make sure the result does the job properly. Fortunately, software development already offers infrastructure for testing and verifying code. “This is a beautiful sweet spot,” he says, “where it’s much cheaper to have the technology write code for you, because you can very quickly check and verify it.”

    Another major LLM use is content generation, such as writing marketing copy or e-commerce product descriptions. “Again, it may be much cheaper to fix ChatGPT’s draft than for you to write the whole thing,” Ramakrishnan says. “However, companies must be very careful to make sure there is a human in the loop.”

    LLMs also are spreading quickly as in-house tools to search enterprise documents. Unlike conventional search algorithms, an LLM chatbot can offer a conversational search experience, because it remembers each question you ask. “But again, it will occasionally make things up,” he says. “In terms of chatbots for external customers, these are very early days, because of the risk of saying something wrong to the customer.”

    Overall, Ramakrishnan notes, we’re living in a remarkable time to grapple with AI’s rapidly evolving potentials and pitfalls. “I help companies figure out how to take these very transformative technologies and put them to work, to make products and services much more intelligent, employees much more productive, and processes much more efficient,” he says. More

  • in

    AI copilot enhances human precision for safer aviation

    Imagine you’re in an airplane with two pilots, one human and one computer. Both have their “hands” on the controllers, but they’re always looking out for different things. If they’re both paying attention to the same thing, the human gets to steer. But if the human gets distracted or misses something, the computer quickly takes over.

    Meet the Air-Guardian, a system developed by researchers at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). As modern pilots grapple with an onslaught of information from multiple monitors, especially during critical moments, Air-Guardian acts as a proactive copilot; a partnership between human and machine, rooted in understanding attention.

    But how does it determine attention, exactly? For humans, it uses eye-tracking, and for the neural system, it relies on something called “saliency maps,” which pinpoint where attention is directed. The maps serve as visual guides highlighting key regions within an image, aiding in grasping and deciphering the behavior of intricate algorithms. Air-Guardian identifies early signs of potential risks through these attention markers, instead of only intervening during safety breaches like traditional autopilot systems. 

    The broader implications of this system reach beyond aviation. Similar cooperative control mechanisms could one day be used in cars, drones, and a wider spectrum of robotics.

    “An exciting feature of our method is its differentiability,” says MIT CSAIL postdoc Lianhao Yin, a lead author on a new paper about Air-Guardian. “Our cooperative layer and the entire end-to-end process can be trained. We specifically chose the causal continuous-depth neural network model because of its dynamic features in mapping attention. Another unique aspect is adaptability. The Air-Guardian system isn’t rigid; it can be adjusted based on the situation’s demands, ensuring a balanced partnership between human and machine.”

    In field tests, both the pilot and the system made decisions based on the same raw images when navigating to the target waypoint. Air-Guardian’s success was gauged based on the cumulative rewards earned during flight and shorter path to the waypoint. The guardian reduced the risk level of flights and increased the success rate of navigating to target points. 

    “This system represents the innovative approach of human-centric AI-enabled aviation,” adds Ramin Hasani, MIT CSAIL research affiliate and inventor of liquid neural networks. “Our use of liquid neural networks provides a dynamic, adaptive approach, ensuring that the AI doesn’t merely replace human judgment but complements it, leading to enhanced safety and collaboration in the skies.”

    The true strength of Air-Guardian is its foundational technology. Using an optimization-based cooperative layer using visual attention from humans and machine, and liquid closed-form continuous-time neural networks (CfC) known for its prowess in deciphering cause-and-effect relationships, it analyzes incoming images for vital information. Complementing this is the VisualBackProp algorithm, which identifies the system’s focal points within an image, ensuring clear understanding of its attention maps. 

    For future mass adoption, there’s a need to refine the human-machine interface. Feedback suggests an indicator, like a bar, might be more intuitive to signify when the guardian system takes control.

    Air-Guardian heralds a new age of safer skies, offering a reliable safety net for those moments when human attention wavers.

    “The Air-Guardian system highlights the synergy between human expertise and machine learning, furthering the objective of using machine learning to augment pilots in challenging scenarios and reduce operational errors,” says Daniela Rus, the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT, director of CSAIL, and senior author on the paper.”One of the most interesting outcomes of using a visual attention metric in this work is the potential for allowing earlier interventions and greater interpretability by human pilots,” says Stephanie Gil, assistant professor of computer science at Harvard University, who was not involved in the work. “This showcases a great example of how AI can be used to work with a human, lowering the barrier for achieving trust by using natural communication mechanisms between the human and the AI system.”

    This research was partially funded by the U.S. Air Force (USAF) Research Laboratory, the USAF Artificial Intelligence Accelerator, the Boeing Co., and the Office of Naval Research. The findings don’t necessarily reflect the views of the U.S. government or the USAF. More