More stories

  • in

    How symmetry can come to the aid of machine learning

    Behrooz Tahmasebi — an MIT PhD student in the Department of Electrical Engineering and Computer Science (EECS) and an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) — was taking a mathematics course on differential equations in late 2021 when a glimmer of inspiration struck. In that class, he learned for the first time about Weyl’s law, which had been formulated 110 years earlier by the German mathematician Hermann Weyl. Tahmasebi realized it might have some relevance to the computer science problem he was then wrestling with, even though the connection appeared — on the surface — to be thin, at best. Weyl’s law, he says, provides a formula that measures the complexity of the spectral information, or data, contained within the fundamental frequencies of a drum head or guitar string.

    Tahmasebi was, at the same time, thinking about measuring the complexity of the input data to a neural network, wondering whether that complexity could be reduced by taking into account some of the symmetries inherent to the dataset. Such a reduction, in turn, could facilitate — as well as speed up — machine learning processes.

    Weyl’s law, conceived about a century before the boom in machine learning, had traditionally been applied to very different physical situations — such as those concerning the vibrations of a string or the spectrum of electromagnetic (black-body) radiation given off by a heated object. Nevertheless, Tahmasebi believed that a customized version of that law might help with the machine learning problem he was pursuing. And if the approach panned out, the payoff could be considerable.

    He spoke with his advisor, Stefanie Jegelka — an associate professor in EECS and affiliate of CSAIL and the MIT Institute for Data, Systems, and Society — who believed the idea was definitely worth looking into. As Tahmasebi saw it, Weyl’s law had to do with gauging the complexity of data, and so did this project. But Weyl’s law, in its original form, said nothing about symmetry.

    He and Jegelka have now succeeded in modifying Weyl’s law so that symmetry can be factored into the assessment of a dataset’s complexity. “To the best of my knowledge,” Tahmasebi says, “this is the first time Weyl’s law has been used to determine how machine learning can be enhanced by symmetry.”

    The paper he and Jegelka wrote earned a “Spotlight” designation when it was presented at the December 2023 conference on Neural Information Processing Systems — widely regarded as the world’s top conference on machine learning.

    This work, comments Soledad Villar, an applied mathematician at Johns Hopkins University, “shows that models that satisfy the symmetries of the problem are not only correct but also can produce predictions with smaller errors, using a small amount of training points. [This] is especially important in scientific domains, like computational chemistry, where training data can be scarce.”

    In their paper, Tahmasebi and Jegelka explored the ways in which symmetries, or so-called “invariances,” could benefit machine learning. Suppose, for example, the goal of a particular computer run is to pick out every image that contains the numeral 3. That task can be a lot easier, and go a lot quicker, if the algorithm can identify the 3 regardless of where it is placed in the box — whether it’s exactly in the center or off to the side — and whether it is pointed right-side up, upside down, or oriented at a random angle. An algorithm equipped with the latter capability can take advantage of the symmetries of translation and rotations, meaning that a 3, or any other object, is not changed in itself by altering its position or by rotating it around an arbitrary axis. It is said to be invariant to those shifts. The same logic can be applied to algorithms charged with identifying dogs or cats. A dog is a dog is a dog, one might say, irrespective of how it is embedded within an image. 

    The point of the entire exercise, the authors explain, is to exploit a dataset’s intrinsic symmetries in order to reduce the complexity of machine learning tasks. That, in turn, can lead to a reduction in the amount of data needed for learning. Concretely, the new work answers the question: How many fewer data are needed to train a machine learning model if the data contain symmetries?

    There are two ways of achieving a gain, or benefit, by capitalizing on the symmetries present. The first has to do with the size of the sample to be looked at. Let’s imagine that you are charged, for instance, with analyzing an image that has mirror symmetry — the right side being an exact replica, or mirror image, of the left. In that case, you don’t have to look at every pixel; you can get all the information you need from half of the image — a factor of two improvement. If, on the other hand, the image can be partitioned into 10 identical parts, you can get a factor of 10 improvement. This kind of boosting effect is linear.

    To take another example, imagine you are sifting through a dataset, trying to find sequences of blocks that have seven different colors — black, blue, green, purple, red, white, and yellow. Your job becomes much easier if you don’t care about the order in which the blocks are arranged. If the order mattered, there would be 5,040 different combinations to look for. But if all you care about are sequences of blocks in which all seven colors appear, then you have reduced the number of things — or sequences — you are searching for from 5,040 to just one.

    Tahmasebi and Jegelka discovered that it is possible to achieve a different kind of gain — one that is exponential — that can be reaped for symmetries that operate over many dimensions. This advantage is related to the notion that the complexity of a learning task grows exponentially with the dimensionality of the data space. Making use of a multidimensional symmetry can therefore yield a disproportionately large return. “This is a new contribution that is basically telling us that symmetries of higher dimension are more important because they can give us an exponential gain,” Tahmasebi says. 

    The NeurIPS 2023 paper that he wrote with Jegelka contains two theorems that were proved mathematically. “The first theorem shows that an improvement in sample complexity is achievable with the general algorithm we provide,” Tahmasebi says. The second theorem complements the first, he added, “showing that this is the best possible gain you can get; nothing else is achievable.”

    He and Jegelka have provided a formula that predicts the gain one can obtain from a particular symmetry in a given application. A virtue of this formula is its generality, Tahmasebi notes. “It works for any symmetry and any input space.” It works not only for symmetries that are known today, but it could also be applied in the future to symmetries that are yet to be discovered. The latter prospect is not too farfetched to consider, given that the search for new symmetries has long been a major thrust in physics. That suggests that, as more symmetries are found, the methodology introduced by Tahmasebi and Jegelka should only get better over time.

    According to Haggai Maron, a computer scientist at Technion (the Israel Institute of Technology) and NVIDIA who was not involved in the work, the approach presented in the paper “diverges substantially from related previous works, adopting a geometric perspective and employing tools from differential geometry. This theoretical contribution lends mathematical support to the emerging subfield of ‘Geometric Deep Learning,’ which has applications in graph learning, 3D data, and more. The paper helps establish a theoretical basis to guide further developments in this rapidly expanding research area.” More

  • in

    Creating new skills and new connections with MIT’s Quantitative Methods Workshop

    Starting on New Year’s Day, when many people were still clinging to holiday revelry, scores of students and faculty members from about a dozen partner universities instead flipped open their laptops for MIT’s Quantitative Methods Workshop, a jam-packed, weeklong introduction to how computational and mathematical techniques can be applied to neuroscience and biology research. But don’t think of QMW as a “crash course.” Instead the program’s purpose is to help elevate each participant’s scientific outlook, both through the skills and concepts it imparts and the community it creates.

    “It broadens their horizons, it shows them significant applications they’ve never thought of, and introduces them to people whom as researchers they will come to know and perhaps collaborate with one day,” says Susan L. Epstein, a Hunter College computer science professor and education coordinator of MIT’s Center for Brains, Minds, and Machines, which hosts the program with the departments of Biology and Brain and Cognitive Sciences and The Picower Institute for Learning and Memory. “It is a model of interdisciplinary scholarship.”

    This year 83 undergraduates and faculty members from institutions that primarily serve groups underrepresented in STEM fields took part in the QMW, says organizer Mandana Sassanfar, senior lecturer and director of diversity and science outreach across the four hosting MIT entities. Since the workshop launched in 2010, it has engaged more than 1,000 participants, of whom more than 170 have gone on to participate in MIT Summer Research Programs (such as MSRP-BIO), and 39 have come to MIT for graduate school.

    Individual goals, shared experience

    Undergraduates and faculty in various STEM disciplines often come to QMW to gain an understanding of, or expand their expertise in, computational and mathematical data analysis. Computer science- and statistics-minded participants come to learn more about how such techniques can be applied in life sciences fields. In lectures; in hands-on labs where they used the computer programming language Python to process, analyze, and visualize data; and in less formal settings such as tours and lunches with MIT faculty, participants worked and learned together, and informed each other’s perspectives.

    Brain and Cognitive Sciences Professor Nancy Kanwisher delivers a lecture in MIT’s Building 46 on functional brain imaging to QMW participants.

    Photo: Mandana Sassanfar

    Previous item
    Next item

    And regardless of their field of study, participants made connections with each other and with the MIT students and faculty who taught and spoke over the course of the week.

    Hunter College computer science sophomore Vlad Vostrikov says that while he has already worked with machine learning and other programming concepts, he was interested to “branch out” by seeing how they are used to analyze scientific datasets. He also valued the chance to learn the experiences of the graduate students who teach QMW’s hands-on labs.

    “This was a good way to explore computational biology and neuroscience,” Vostrikov says. “I also really enjoy hearing from the people who teach us. It’s interesting to hear where they come from and what they are doing.”

    Jariatu Kargbo, a biology and chemistry sophomore at University of Maryland Baltimore County, says when she first learned of the QMW she wasn’t sure it was for her. It seemed very computation-focused. But her advisor Holly Willoughby encouraged Kargbo to attend to learn about how programming could be useful in future research — currently she is taking part in research on the retina at UMBC. More than that, Kargbo also realized it would be a good opportunity to make connections at MIT in advance of perhaps applying for MSRP this summer.

    “I thought this would be a great way to meet up with faculty and see what the environment is like here because I’ve never been to MIT before,” Kargbo says. “It’s always good to meet other people in your field and grow your network.”

    QMW is not just for students. It’s also for their professors, who said they can gain valuable professional education for their research and teaching.

    Fayuan Wen, an assistant professor of biology at Howard University, is no stranger to computational biology, having performed big data genetic analyses of sickle cell disease (SCD). But she’s mostly worked with the R programming language and QMW’s focus is on Python. As she looks ahead to projects in which she wants analyze genomic data to help predict disease outcomes in SCD and HIV, she says a QMW session delivered by biology graduate student Hannah Jacobs was perfectly on point.

    “This workshop has the skills I want to have,” Wen says.

    Moreover, Wen says she is looking to start a machine-learning class in the Howard biology department and was inspired by some of the teaching materials she encountered at QMW — for example, online curriculum modules developed by Taylor Baum, an MIT graduate student in electrical engineering and computer science and Picower Institute labs, and Paloma Sánchez-Jáuregui, a coordinator who works with Sassanfar.

    Tiziana Ligorio, a Hunter College computer science doctoral lecturer who together with Epstein teaches a deep machine-learning class at the City University of New York campus, felt similarly. Rather than require a bunch of prerequisites that might drive students away from the class, Ligorio was looking to QMW’s intense but introductory curriculum as a resource for designing a more inclusive way of getting students ready for the class.

    Instructive interactions

    Each day runs from 9 a.m. to 5 p.m., including morning and afternoon lectures and hands-on sessions. Class topics ranged from statistical data analysis and machine learning to brain-computer interfaces, brain imaging, signal processing of neural activity data, and cryogenic electron microscopy.

    “This workshop could not happen without dedicated instructors — grad students, postdocs, and faculty — who volunteer to give lectures, design and teach hands-on computer labs, and meet with students during the very first week of January,” Saassanfar says.

    MIT assistant professor of biology Brady Weissbourd (center) converses with QMW student participants during a lunch break.

    Photo: Mandana Sassanfar

    Previous item
    Next item

    The sessions surround student lunches with MIT faculty members. For example, at midday Jan. 2, assistant professor of biology Brady Weissbourd, an investigator in the Picower Institute, sat down with seven students in one of Building 46’s curved sofas to field questions about his neuroscience research in jellyfish and how he uses quantitative techniques as part of that work. He also described what it’s like to be a professor, and other topics that came to the students’ minds.

    Then the participants all crossed Vassar Street to Building 26’s Room 152, where they formed different but similarly sized groups for the hands-on lab “Machine learning applications to studying the brain,” taught by Baum. She guided the class through Python exercises she developed illustrating “supervised” and “unsupervised” forms of machine learning, including how the latter method can be used to discern what a person is seeing based on magnetic readings of brain activity.

    As students worked through the exercises, tablemates helped each other by supplementing Baum’s instruction. Ligorio, Vostrikov, and Kayla Blincow, assistant professor of biology at the University of the Virgin Islands, for instance, all leapt to their feet to help at their tables.

    Hunter College lecturer of computer science Tiziana Ligorio (standing) explains a Python programming concept to students at her table during a workshop session.

    Photo: David Orenstein

    Previous item
    Next item

    At the end of the class, when Baum asked students what they had learned, they offered a litany of new knowledge. Survey data that Sassanfar and Sánchez-Jáuregui use to anonymously track QMW outcomes, revealed many more such attestations of the value of the sessions. With a prompt asking how one might apply what they’ve learned, one respondent wrote: “Pursue a research career or endeavor in which I apply the concepts of computer science and neuroscience together.”

    Enduring connections

    While some new QMW attendees might only be able to speculate about how they’ll apply their new skills and relationships, Luis Miguel de Jesús Astacio could testify to how attending QMW as an undergraduate back in 2014 figured into a career where he is now a faculty member in physics at the University of Puerto Rico Rio Piedras Campus. After QMW, he returned to MIT that summer as a student in the lab of neuroscientist and Picower Professor Susumu Tonegawa. He came back again in 2016 to the lab of physicist and Francis Friedman Professor Mehran Kardar. What’s endured for the decade has been his connection to Sassanfar. So while he was once a student at QMW, this year he was back with a cohort of undergraduates as a faculty member.

    Michael Aldarondo-Jeffries, director of academic advancement programs at the University of Central Florida, seconded the value of the networking that takes place at QMW. He has brought students for a decade, including four this year. What he’s observed is that as students come together in settings like QMW or UCF’s McNair program, which helps to prepare students for graduate school, they become inspired about a potential future as researchers.

    “The thing that stands out is just the community that’s formed,” he says. “For many of the students, it’s the first time that they’re in a group that understands what they’re moving toward. They don’t have to explain why they’re excited to read papers on a Friday night.”

    Or why they are excited to spend a week including New Year’s Day at MIT learning how to apply quantitative methods to life sciences data. More

  • in

    New hope for early pancreatic cancer intervention via AI-based risk prediction

    The first documented case of pancreatic cancer dates back to the 18th century. Since then, researchers have undertaken a protracted and challenging odyssey to understand the elusive and deadly disease. To date, there is no better cancer treatment than early intervention. Unfortunately, the pancreas, nestled deep within the abdomen, is particularly elusive for early detection. 

    MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) scientists, alongside Limor Appelbaum, a staff scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), were eager to better identify potential high-risk patients. They set out to develop two machine-learning models for early detection of pancreatic ductal adenocarcinoma (PDAC), the most common form of the cancer. To access a broad and diverse database, the team synced up with a federated network company, using electronic health record data from various institutions across the United States. This vast pool of data helped ensure the models’ reliability and generalizability, making them applicable across a wide range of populations, geographical locations, and demographic groups.

    The two models — the “PRISM” neural network, and the logistic regression model (a statistical technique for probability), outperformed current methods. The team’s comparison showed that while standard screening criteria identify about 10 percent of PDAC cases using a five-times higher relative risk threshold, Prism can detect 35 percent of PDAC cases at this same threshold. 

    Using AI to detect cancer risk is not a new phenomena — algorithms analyze mammograms, CT scans for lung cancer, and assist in the analysis of Pap smear tests and HPV testing, to name a few applications. “The PRISM models stand out for their development and validation on an extensive database of over 5 million patients, surpassing the scale of most prior research in the field,” says Kai Jia, an MIT PhD student in electrical engineering and computer science (EECS), MIT CSAIL affiliate, and first author on an open-access paper in eBioMedicine outlining the new work. “The model uses routine clinical and lab data to make its predictions, and the diversity of the U.S. population is a significant advancement over other PDAC models, which are usually confined to specific geographic regions, like a few health-care centers in the U.S. Additionally, using a unique regularization technique in the training process enhanced the models’ generalizability and interpretability.” 

    “This report outlines a powerful approach to use big data and artificial intelligence algorithms to refine our approach to identifying risk profiles for cancer,” says David Avigan, a Harvard Medical School professor and the cancer center director and chief of hematology and hematologic malignancies at BIDMC, who was not involved in the study. “This approach may lead to novel strategies to identify patients with high risk for malignancy that may benefit from focused screening with the potential for early intervention.” 

    Prismatic perspectives

    The journey toward the development of PRISM began over six years ago, fueled by firsthand experiences with the limitations of current diagnostic practices. “Approximately 80-85 percent of pancreatic cancer patients are diagnosed at advanced stages, where cure is no longer an option,” says senior author Appelbaum, who is also a Harvard Medical School instructor as well as radiation oncologist. “This clinical frustration sparked the idea to delve into the wealth of data available in electronic health records (EHRs).”The CSAIL group’s close collaboration with Appelbaum made it possible to understand the combined medical and machine learning aspects of the problem better, eventually leading to a much more accurate and transparent model. “The hypothesis was that these records contained hidden clues — subtle signs and symptoms that could act as early warning signals of pancreatic cancer,” she adds. “This guided our use of federated EHR networks in developing these models, for a scalable approach for deploying risk prediction tools in health care.”Both PrismNN and PrismLR models analyze EHR data, including patient demographics, diagnoses, medications, and lab results, to assess PDAC risk. PrismNN uses artificial neural networks to detect intricate patterns in data features like age, medical history, and lab results, yielding a risk score for PDAC likelihood. PrismLR uses logistic regression for a simpler analysis, generating a probability score of PDAC based on these features. Together, the models offer a thorough evaluation of different approaches in predicting PDAC risk from the same EHR data.

    One paramount point for gaining the trust of physicians, the team notes, is better understanding how the models work, known in the field as interpretability. The scientists pointed out that while logistic regression models are inherently easier to interpret, recent advancements have made deep neural networks somewhat more transparent. This helped the team to refine the thousands of potentially predictive features derived from EHR of a single patient to approximately 85 critical indicators. These indicators, which include patient age, diabetes diagnosis, and an increased frequency of visits to physicians, are automatically discovered by the model but match physicians’ understanding of risk factors associated with pancreatic cancer. 

    The path forward

    Despite the promise of the PRISM models, as with all research, some parts are still a work in progress. U.S. data alone are the current diet for the models, necessitating testing and adaptation for global use. The path forward, the team notes, includes expanding the model’s applicability to international datasets and integrating additional biomarkers for more refined risk assessment.

    “A subsequent aim for us is to facilitate the models’ implementation in routine health care settings. The vision is to have these models function seamlessly in the background of health care systems, automatically analyzing patient data and alerting physicians to high-risk cases without adding to their workload,” says Jia. “A machine-learning model integrated with the EHR system could empower physicians with early alerts for high-risk patients, potentially enabling interventions well before symptoms manifest. We are eager to deploy our techniques in the real world to help all individuals enjoy longer, healthier lives.” 

    Jia wrote the paper alongside Applebaum and MIT EECS Professor and CSAIL Principal Investigator Martin Rinard, who are both senior authors of the paper. Researchers on the paper were supported during their time at MIT CSAIL, in part, by the Defense Advanced Research Projects Agency, Boeing, the National Science Foundation, and Aarno Labs. TriNetX provided resources for the project, and the Prevent Cancer Foundation also supported the team. More

  • in

    Self-powered sensor automatically harvests magnetic energy

    MIT researchers have developed a battery-free, self-powered sensor that can harvest energy from its environment.

    Because it requires no battery that must be recharged or replaced, and because it requires no special wiring, such a sensor could be embedded in a hard-to-reach place, like inside the inner workings of a ship’s engine. There, it could automatically gather data on the machine’s power consumption and operations for long periods of time.

    The researchers built a temperature-sensing device that harvests energy from the magnetic field generated in the open air around a wire. One could simply clip the sensor around a wire that carries electricity — perhaps the wire that powers a motor — and it will automatically harvest and store energy which it uses to monitor the motor’s temperature.

    “This is ambient power — energy that I don’t have to make a specific, soldered connection to get. And that makes this sensor very easy to install,” says Steve Leeb, the Emanuel E. Landsman Professor of Electrical Engineering and Computer Science (EECS) and professor of mechanical engineering, a member of the Research Laboratory of Electronics, and senior author of a paper on the energy-harvesting sensor.

    In the paper, which appeared as the featured article in the January issue of the IEEE Sensors Journal, the researchers offer a design guide for an energy-harvesting sensor that lets an engineer balance the available energy in the environment with their sensing needs.

    The paper lays out a roadmap for the key components of a device that can sense and control the flow of energy continually during operation.

    The versatile design framework is not limited to sensors that harvest magnetic field energy, and can be applied to those that use other power sources, like vibrations or sunlight. It could be used to build networks of sensors for factories, warehouses, and commercial spaces that cost less to install and maintain.

    “We have provided an example of a battery-less sensor that does something useful, and shown that it is a practically realizable solution. Now others will hopefully use our framework to get the ball rolling to design their own sensors,” says lead author Daniel Monagle, an EECS graduate student.

    Monagle and Leeb are joined on the paper by EECS graduate student Eric Ponce.

    John Donnal, an associate professor of weapons and controls engineering at the U.S. Naval Academy who was not involved with this work, studies techniques to monitor ship systems. Getting access to power on a ship can be difficult, he says, since there are very few outlets and strict restrictions as to what equipment can be plugged in.

    “Persistently measuring the vibration of a pump, for example, could give the crew real-time information on the health of the bearings and mounts, but powering a retrofit sensor often requires so much additional infrastructure that the investment is not worthwhile,” Donnal adds. “Energy-harvesting systems like this could make it possible to retrofit a wide variety of diagnostic sensors on ships and significantly reduce the overall cost of maintenance.”

    A how-to guide

    The researchers had to meet three key challenges to develop an effective, battery-free, energy-harvesting sensor.

    First, the system must be able to cold start, meaning it can fire up its electronics with no initial voltage. They accomplished this with a network of integrated circuits and transistors that allow the system to store energy until it reaches a certain threshold. The system will only turn on once it has stored enough power to fully operate.

    Second, the system must store and convert the energy it harvests efficiently, and without a battery. While the researchers could have included a battery, that would add extra complexities to the system and could pose a fire risk.

    “You might not even have the luxury of sending out a technician to replace a battery. Instead, our system is maintenance-free. It harvests energy and operates itself,” Monagle adds.

    To avoid using a battery, they incorporate internal energy storage that can include a series of capacitors. Simpler than a battery, a capacitor stores energy in the electrical field between conductive plates. Capacitors can be made from a variety of materials, and their capabilities can be tuned to a range of operating conditions, safety requirements, and available space.

    The team carefully designed the capacitors so they are big enough to store the energy the device needs to turn on and start harvesting power, but small enough that the charge-up phase doesn’t take too long.

    In addition, since a sensor might go weeks or even months before turning on to take a measurement, they ensured the capacitors can hold enough energy even if some leaks out over time.

    Finally, they developed a series of control algorithms that dynamically measure and budget the energy collected, stored, and used by the device. A microcontroller, the “brain” of the energy management interface, constantly checks how much energy is stored and infers whether to turn the sensor on or off, take a measurement, or kick the harvester into a higher gear so it can gather more energy for more complex sensing needs.

    “Just like when you change gears on a bike, the energy management interface looks at how the harvester is doing, essentially seeing whether it is pedaling too hard or too soft, and then it varies the electronic load so it can maximize the amount of power it is harvesting and match the harvest to the needs of the sensor,” Monagle explains.

    Self-powered sensor

    Using this design framework, they built an energy management circuit for an off-the-shelf temperature sensor. The device harvests magnetic field energy and uses it to continually sample temperature data, which it sends to a smartphone interface using Bluetooth.

    The researchers used super-low-power circuits to design the device, but quickly found that these circuits have tight restrictions on how much voltage they can withstand before breaking down. Harvesting too much power could cause the device to explode.

    To avoid that, their energy harvester operating system in the microcontroller automatically adjusts or reduces the harvest if the amount of stored energy becomes excessive.

    They also found that communication — transmitting data gathered by the temperature sensor — was by far the most power-hungry operation.

    “Ensuring the sensor has enough stored energy to transmit data is a constant challenge that involves careful design,” Monagle says.

    In the future, the researchers plan to explore less energy-intensive means of transmitting data, such as using optics or acoustics. They also want to more rigorously model and predict how much energy might be coming into a system, or how much energy a sensor might need to take measurements, so a device could effectively gather even more data.

    “If you only make the measurements you think you need, you may miss something really valuable. With more information, you might be able to learn something you didn’t expect about a device’s operations. Our framework lets you balance those considerations,” Leeb says.  

    “This paper is well-documented regarding what a practical self-powered sensor node should internally entail for realistic scenarios. The overall design guidelines, particularly on the cold-start issue, are very helpful,” says Jinyeong Moon, an assistant professor of electrical and computer engineering at Florida State University College of Engineering who was not involved with this work. “Engineers planning to design a self-powering module for a wireless sensor node will greatly benefit from these guidelines, easily ticking off traditionally cumbersome cold-start-related checklists.”

    The work is supported, in part, by the Office of Naval Research and The Grainger Foundation. More

  • in

    Multiple AI models help robots execute complex plans more transparently

    Your daily to-do list is likely pretty straightforward: wash the dishes, buy groceries, and other minutiae. It’s unlikely you wrote out “pick up the first dirty dish,” or “wash that plate with a sponge,” because each of these miniature steps within the chore feels intuitive. While we can routinely complete each step without much thought, a robot requires a complex plan that involves more detailed outlines.

    MIT’s Improbable AI Lab, a group within the Computer Science and Artificial Intelligence Laboratory (CSAIL), has offered these machines a helping hand with a new multimodal framework: Compositional Foundation Models for Hierarchical Planning (HiP), which develops detailed, feasible plans with the expertise of three different foundation models. Like OpenAI’s GPT-4, the foundation model that ChatGPT and Bing Chat were built upon, these foundation models are trained on massive quantities of data for applications like generating images, translating text, and robotics.Unlike RT2 and other multimodal models that are trained on paired vision, language, and action data, HiP uses three different foundation models each trained on different data modalities. Each foundation model captures a different part of the decision-making process and then works together when it’s time to make decisions. HiP removes the need for access to paired vision, language, and action data, which is difficult to obtain. HiP also makes the reasoning process more transparent.

    What’s considered a daily chore for a human can be a robot’s “long-horizon goal” — an overarching objective that involves completing many smaller steps first — requiring sufficient data to plan, understand, and execute objectives. While computer vision researchers have attempted to build monolithic foundation models for this problem, pairing language, visual, and action data is expensive. Instead, HiP represents a different, multimodal recipe: a trio that cheaply incorporates linguistic, physical, and environmental intelligence into a robot.

    “Foundation models do not have to be monolithic,” says NVIDIA AI researcher Jim Fan, who was not involved in the paper. “This work decomposes the complex task of embodied agent planning into three constituent models: a language reasoner, a visual world model, and an action planner. It makes a difficult decision-making problem more tractable and transparent.”The team believes that their system could help these machines accomplish household chores, such as putting away a book or placing a bowl in the dishwasher. Additionally, HiP could assist with multistep construction and manufacturing tasks, like stacking and placing different materials in specific sequences.Evaluating HiP

    The CSAIL team tested HiP’s acuity on three manipulation tasks, outperforming comparable frameworks. The system reasoned by developing intelligent plans that adapt to new information.

    First, the researchers requested that it stack different-colored blocks on each other and then place others nearby. The catch: Some of the correct colors weren’t present, so the robot had to place white blocks in a color bowl to paint them. HiP often adjusted to these changes accurately, especially compared to state-of-the-art task planning systems like Transformer BC and Action Diffuser, by adjusting its plans to stack and place each square as needed.

    Another test: arranging objects such as candy and a hammer in a brown box while ignoring other items. Some of the objects it needed to move were dirty, so HiP adjusted its plans to place them in a cleaning box, and then into the brown container. In a third demonstration, the bot was able to ignore unnecessary objects to complete kitchen sub-goals such as opening a microwave, clearing a kettle out of the way, and turning on a light. Some of the prompted steps had already been completed, so the robot adapted by skipping those directions.

    A three-pronged hierarchy

    HiP’s three-pronged planning process operates as a hierarchy, with the ability to pre-train each of its components on different sets of data, including information outside of robotics. At the bottom of that order is a large language model (LLM), which starts to ideate by capturing all the symbolic information needed and developing an abstract task plan. Applying the common sense knowledge it finds on the internet, the model breaks its objective into sub-goals. For example, “making a cup of tea” turns into “filling a pot with water,” “boiling the pot,” and the subsequent actions required.

    “All we want to do is take existing pre-trained models and have them successfully interface with each other,” says Anurag Ajay, a PhD student in the MIT Department of Electrical Engineering and Computer Science (EECS) and a CSAIL affiliate. “Instead of pushing for one model to do everything, we combine multiple ones that leverage different modalities of internet data. When used in tandem, they help with robotic decision-making and can potentially aid with tasks in homes, factories, and construction sites.”

    These models also need some form of “eyes” to understand the environment they’re operating in and correctly execute each sub-goal. The team used a large video diffusion model to augment the initial planning completed by the LLM, which collects geometric and physical information about the world from footage on the internet. In turn, the video model generates an observation trajectory plan, refining the LLM’s outline to incorporate new physical knowledge.This process, known as iterative refinement, allows HiP to reason about its ideas, taking in feedback at each stage to generate a more practical outline. The flow of feedback is similar to writing an article, where an author may send their draft to an editor, and with those revisions incorporated in, the publisher reviews for any last changes and finalizes.

    In this case, the top of the hierarchy is an egocentric action model, or a sequence of first-person images that infer which actions should take place based on its surroundings. During this stage, the observation plan from the video model is mapped over the space visible to the robot, helping the machine decide how to execute each task within the long-horizon goal. If a robot uses HiP to make tea, this means it will have mapped out exactly where the pot, sink, and other key visual elements are, and begin completing each sub-goal.Still, the multimodal work is limited by the lack of high-quality video foundation models. Once available, they could interface with HiP’s small-scale video models to further enhance visual sequence prediction and robot action generation. A higher-quality version would also reduce the current data requirements of the video models.That being said, the CSAIL team’s approach only used a tiny bit of data overall. Moreover, HiP was cheap to train and demonstrated the potential of using readily available foundation models to complete long-horizon tasks. “What Anurag has demonstrated is proof-of-concept of how we can take models trained on separate tasks and data modalities and combine them into models for robotic planning. In the future, HiP could be augmented with pre-trained models that can process touch and sound to make better plans,” says senior author Pulkit Agrawal, MIT assistant professor in EECS and director of the Improbable AI Lab. The group is also considering applying HiP to solving real-world long-horizon tasks in robotics.Ajay and Agrawal are lead authors on a paper describing the work. They are joined by MIT professors and CSAIL principal investigators Tommi Jaakkola, Joshua Tenenbaum, and Leslie Pack Kaelbling; CSAIL research affiliate and MIT-IBM AI Lab research manager Akash Srivastava; graduate students Seungwook Han and Yilun Du ’19; former postdoc Abhishek Gupta, who is now assistant professor at University of Washington; and former graduate student Shuang Li PhD ’23.

    The team’s work was supported, in part, by the National Science Foundation, the U.S. Defense Advanced Research Projects Agency, the U.S. Army Research Office, the U.S. Office of Naval Research Multidisciplinary University Research Initiatives, and the MIT-IBM Watson AI Lab. Their findings were presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS). More

  • in

    Co-creating climate futures with real-time data and spatial storytelling

    Virtual story worlds and game engines aren’t just for video games anymore. They are now tools for scientists and storytellers to digitally twin existing physical spaces and then turn them into vessels to dream up speculative climate stories and build collective designs of the future. That’s the theory and practice behind the MIT WORLDING initiative.

    Twice this year, WORLDING matched world-class climate story teams working in XR (extended reality) with relevant labs and researchers across MIT. One global group returned for a virtual gathering online in partnership with Unity for Humanity, while another met for one weekend in person, hosted at the MIT Media Lab.

    “We are witnessing the birth of an emergent field that fuses climate science, urban planning, real-time 3D engines, nonfiction storytelling, and speculative fiction, and it is all fueled by the urgency of the climate crises,” says Katerina Cizek, lead designer of the WORLDING initiative at the Co-Creation Studio of MIT Open Documentary Lab. “Interdisciplinary teams are forming and blossoming around the planet to collectively imagine and tell stories of healthy, livable worlds in virtual 3D spaces and then finding direct ways to translate that back to earth, literally.”

    At this year’s virtual version of WORLDING, five multidisciplinary teams were selected from an open call. In a week-long series of research and development gatherings, the teams met with MIT scientists, staff, fellows, students, and graduates, as well as other leading figures in the field. Guests ranged from curators at film festivals such as Sundance and Venice, climate policy specialists, and award-winning media creators to software engineers and renowned Earth and atmosphere scientists. The teams heard from MIT scholars in diverse domains, including geomorphology, urban planning as acts of democracy, and climate researchers at MIT Media Lab.

    Mapping climate data

    “We are measuring the Earth’s environment in increasingly data-driven ways. Hundreds of terabytes of data are taken every day about our planet in order to study the Earth as a holistic system, so we can address key questions about global climate change,” explains Rachel Connolly, an MIT Media Lab research scientist focused in the “Future Worlds” research theme, in a talk to the group. “Why is this important for your work and storytelling in general? Having the capacity to understand and leverage this data is critical for those who wish to design for and successfully operate in the dynamic Earth environment.”

    Making sense of billions of data points was a key theme during this year’s sessions. In another talk, Taylor Perron, an MIT professor of Earth, atmospheric and planetary sciences, shared how his team uses computational modeling combined with many other scientific processes to better understand how geology, climate, and life intertwine to shape the surfaces of Earth and other planets. His work resonated with one WORLDING team in particular, one aiming to digitally reconstruct the pre-Hispanic Lake Texcoco — where current day Mexico City is now situated — as a way to contrast and examine the region’s current water crisis.

    Democratizing the future

    While WORLDING approaches rely on rigorous science and the interrogation of large datasets, they are also founded on democratizing community-led approaches.

    MIT Department of Urban Studies and Planning graduate Lafayette Cruise MCP ’19 met with the teams to discuss how he moved his own practice as a trained urban planner to include a futurist component involving participatory methods. “I felt we were asking the same limited questions in regards to the future we were wanting to produce. We’re very limited, very constrained, as to whose values and comforts are being centered. There are so many possibilities for how the future could be.”

    Scaling to reach billions

    This work scales from the very local to massive global populations. Climate policymakers are concerned with reaching billions of people in the line of fire. “We have a goal to reach 1 billion people with climate resilience solutions,” says Nidhi Upadhyaya, deputy director at Atlantic Council’s Adrienne Arsht-Rockefeller Foundation Resilience Center. To get that reach, Upadhyaya is turning to games. “There are 3.3 billion-plus people playing video games across the world. Half of these players are women. This industry is worth $300 billion. Africa is currently among the fastest-growing gaming markets in the world, and 55 percent of the global players are in the Asia Pacific region.” She reminded the group that this conversation is about policy and how formats of mass communication can be used for policymaking, bringing about change, changing behavior, and creating empathy within audiences.

    Socially engaged game development is also connected to education at Unity Technologies, a game engine company. “We brought together our education and social impact work because we really see it as a critical flywheel for our business,” said Jessica Lindl, vice president and global head of social impact/education at Unity Technologies, in the opening talk of WORLDING. “We upscale about 900,000 students, in university and high school programs around the world, and about 800,000 adults who are actively learning and reskilling and upskilling in Unity. Ultimately resulting in our mission of the ‘world is a better place with more creators in it,’ millions of creators who reach billions of consumers — telling the world stories, and fostering a more inclusive, sustainable, and equitable world.”

    Access to these technologies is key, especially the hardware. “Accessibility has been missing in XR,” explains Reginé Gilbert, who studies and teaches accessibility and disability in user experience design at New York University. “XR is being used in artificial intelligence, assistive technology, business, retail, communications, education, empathy, entertainment, recreation, events, gaming, health, rehabilitation meetings, navigation, therapy, training, video programming, virtual assistance wayfinding, and so many other uses. This is a fun fact for folks: 97.8 percent of the world hasn’t tried VR [virtual reality] yet, actually.”

    Meanwhile, new hardware is on its way. The WORLDING group got early insights into the highly anticipated Apple Vision Pro headset, which promises to integrate many forms of XR and personal computing in one device. “They’re really pushing this kind of pass-through or mixed reality,” said Dan Miller, a Unity engineer on the poly spatial team, collaborating with Apple, who described the experience of the device as “You are viewing the real world. You’re pulling up windows, you’re interacting with content. It’s a kind of spatial computing device where you have multiple apps open, whether it’s your email client next to your messaging client with a 3D game in the middle. You’re interacting with all these things in the same space and at different times.”

    “WORLDING combines our passion for social-impact storytelling and incredible innovative storytelling,” said Paisley Smith of the Unity for Humanity Program at Unity Technologies. She added, “This is an opportunity for creators to incubate their game-changing projects and connect with experts across climate, story, and technology.”

    Meeting at MIT

    In a new in-person iteration of WORLDING this year, organizers collaborated closely with Connolly at the MIT Media Lab to co-design an in-person weekend conference Oct. 25 – Nov. 7 with 45 scholars and professionals who visualize climate data at NASA, the National Oceanic and Atmospheric Administration, planetariums, and museums across the United States.

    A participant said of the event, “An incredible workshop that had had a profound effect on my understanding of climate data storytelling and how to combine different components together for a more [holistic] solution.”

    “With this gathering under our new Future Worlds banner,” says Dava Newman, director of the MIT Media Lab and Apollo Program Professor of Astronautics chair, “the Media Lab seeks to affect human behavior and help societies everywhere to improve life here on Earth and in worlds beyond, so that all — the sentient, natural, and cosmic — worlds may flourish.” 

    “WORLDING’s virtual-only component has been our biggest strength because it has enabled a true, international cohort to gather, build, and create together. But this year, an in-person version showed broader opportunities that spatial interactivity generates — informal Q&As, physical worksheets, and larger-scale ideation, all leading to deeper trust-building,” says WORLDING producer Srushti Kamat SM ’23.

    The future and potential of WORLDING lies in the ongoing dialogue between the virtual and physical, both in the work itself and in the format of the workshops. More

  • in

    Technique could efficiently solve partial differential equations for numerous applications

    In fields such as physics and engineering, partial differential equations (PDEs) are used to model complex physical processes to generate insight into how some of the most complicated physical and natural systems in the world function.

    To solve these difficult equations, researchers use high-fidelity numerical solvers, which can be very time-consuming and computationally expensive to run. The current simplified alternative, data-driven surrogate models, compute the goal property of a solution to PDEs rather than the whole solution. Those are trained on a set of data that has been generated by the high-fidelity solver, to predict the output of the PDEs for new inputs. This is data-intensive and expensive because complex physical systems require a large number of simulations to generate enough data. 

    In a new paper, “Physics-enhanced deep surrogates for partial differential equations,” published in December in Nature Machine Intelligence, a new method is proposed for developing data-driven surrogate models for complex physical systems in such fields as mechanics, optics, thermal transport, fluid dynamics, physical chemistry, and climate models.

    The paper was authored by MIT’s professor of applied mathematics Steven G. Johnson along with Payel Das and Youssef Mroueh of the MIT-IBM Watson AI Lab and IBM Research; Chris Rackauckas of Julia Lab; and Raphaël Pestourie, a former MIT postdoc who is now at Georgia Tech. The authors call their method “physics-enhanced deep surrogate” (PEDS), which combines a low-fidelity, explainable physics simulator with a neural network generator. The neural network generator is trained end-to-end to match the output of the high-fidelity numerical solver.

    “My aspiration is to replace the inefficient process of trial and error with systematic, computer-aided simulation and optimization,” says Pestourie. “Recent breakthroughs in AI like the large language model of ChatGPT rely on hundreds of billions of parameters and require vast amounts of resources to train and evaluate. In contrast, PEDS is affordable to all because it is incredibly efficient in computing resources and has a very low barrier in terms of infrastructure needed to use it.”

    In the article, they show that PEDS surrogates can be up to three times more accurate than an ensemble of feedforward neural networks with limited data (approximately 1,000 training points), and reduce the training data needed by at least a factor of 100 to achieve a target error of 5 percent. Developed using the MIT-designed Julia programming language, this scientific machine-learning method is thus efficient in both computing and data.

    The authors also report that PEDS provides a general, data-driven strategy to bridge the gap between a vast array of simplified physical models with corresponding brute-force numerical solvers modeling complex systems. This technique offers accuracy, speed, data efficiency, and physical insights into the process.

    Says Pestourie, “Since the 2000s, as computing capabilities improved, the trend of scientific models has been to increase the number of parameters to fit the data better, sometimes at the cost of a lower predictive accuracy. PEDS does the opposite by choosing its parameters smartly. It leverages the technology of automatic differentiation to train a neural network that makes a model with few parameters accurate.”

    “The main challenge that prevents surrogate models from being used more widely in engineering is the curse of dimensionality — the fact that the needed data to train a model increases exponentially with the number of model variables,” says Pestourie. “PEDS reduces this curse by incorporating information from the data and from the field knowledge in the form of a low-fidelity model solver.”

    The researchers say that PEDS has the potential to revive a whole body of the pre-2000 literature dedicated to minimal models — intuitive models that PEDS could make more accurate while also being predictive for surrogate model applications.

    “The application of the PEDS framework is beyond what we showed in this study,” says Das. “Complex physical systems governed by PDEs are ubiquitous, from climate modeling to seismic modeling and beyond. Our physics-inspired fast and explainable surrogate models will be of great use in those applications, and play a complementary role to other emerging techniques, like foundation models.”

    The research was supported by the MIT-IBM Watson AI Lab and the U.S. Army Research Office through the Institute for Soldier Nanotechnologies.  More

  • in

    “MIT can give you ‘superpowers’”

    Speaking at the virtual MITx MicroMasters Program Joint Completion Celebration last summer, Diogo da Silva Branco Magalhães described watching a Spider-Man movie with his 8-year-old son and realizing that his son thought MIT was a fictional entity that existed only in the Marvel universe.

    “I had to tell him that MIT also exists in the real world, and that some of the programs are available online for everyone,” says da Silva Branco Magalhães, who earned his credential in the MicroMasters in Statistics and Data Science program. “You don’t need to be a superhero to participate in an MIT program, but MIT can give you ‘superpowers.’ In my case, the superpower that I was looking to acquire was a better understanding of the key technologies that are shaping the future of transportation.

    Part of MIT Open Learning, the MicroMasters programs have drawn in almost 1.4 million learners, spanning nearly every country in the world. More than 7,500 people have earned their credentials across the MicroMasters programs, including: Statistics and Data Science; Supply Chain Management; Data, Economics, and Design of Policy; Principles of Manufacturing; and Finance. 

    Earning his MicroMasters credential not only gave da Silva Branco Magalhães a strong foundation to tackle more complex transportation problems, but it also opened the door to pursuing an accelerated graduate degree via a Northwestern University online program.

    Learners who earn their MicroMasters credentials gain the opportunity to apply to and continue their studies at a pathway school. The MicroMasters in Statistics and Data Science credential can be applied as credit for a master’s program at more than 30 universities, as well as MIT’s PhD Program in Social and Engineering Systems. Da Silva Branco Magalhães, originally from Portugal and now based in Australia, seized this opportunity and enrolled in Northwestern University’s Master’s in Data Science for MIT MicroMasters Credential Holders. 

    The pathway to an enhanced career

    The pathway model launched in 2016 with the MicroMasters in Supply Chain Management. Now, there are over 50 pathway institutions that offer more than 100 different programs for master’s degrees. With pathway institutions located around the world, MicroMasters credential holders can obtain master’s degrees from local residential or virtual programs, at a location convenient to them. They can receive credit for their MicroMasters courses upon acceptance, providing flexibility for online programs and also shortening the time needed on site for residential programs.

    “The pathways expand opportunities for learners, and also help universities attract a broader range of potential students, which can enrich their programs,” says Dana Doyle, senior director for the MicroMasters Program at MIT Open Learning. “This is a tangible way we can achieve our mission of expanding education access.”

    Da Silva Branco Magalhães began the MicroMasters in Statistics and Data Science program in 2020, ultimately completing the program in 2022.

    “After having worked for 20 years in the transportation sector in various roles, I realized I was no longer equipped as a professional to deal with the new technologies that were set to disrupt the mobility sector,” says da Silva Branco Magalhães. “It became clear to me that data and AI were the driving forces behind new products and services such as autonomous vehicles, on-demand transport, or mobility as a service, but I didn’t really understand how data was being used to achieve these outcomes, so I needed to improve my knowledge.”

    July 2023 MicroMasters Program Joint Completion Celebration for SCM, DEDP, PoM, SDS, and FinVideo: MIT Open Learning

    The MicroMasters in Statistics and Data Science was developed by the MIT Institute for Data, Systems, and Society and MITx. Credential holders are required to complete four courses equivalent to graduate-level courses in statistics and data science at MIT and a capstone exam comprising four two-hour proctored exams.

    “The content is world-class,” da Silva Branco Magalhães says of the program. “Even the most complex concepts were explained in a very intuitive way. The exercises and the capstone exam are challenging and stimulating — and MIT-level — which makes this credential highly valuable in the market.”

    Da Silva Branco Magalhães also found the discussion forum very useful, and valued conversations with his colleagues, noting that many of these discussions later continued after completion of the program.

    Gaining analysis and leadership skills

    Now in the Northwestern pathway program, da Silva Branco Magalhães finds that the MicroMasters in Statistics and Data Science program prepared him well for this next step in his studies. The nine-course, accelerated, online master’s program is designed to offer the same depth and rigor of Northwestern’s 12-course MS in Data Science program, aiming to help students build essential analysis and leadership skills that can be directly implemented into the professional realm. Students learn how to make reliable predictions using traditional statistics and machine learning methods.

    Da Silva Branco Magalhães says he has appreciated the remote nature of the Northwestern program, as he started it in France and then completed the first three courses in Australia. He also values the high number of elective courses, allowing students to design the master’s program according to personal preferences and interests.

    “I want to be prepared to meet the challenges and seize the opportunities that AI and data science technologies will bring to the professional realm,” he says. “With this credential, there are no limits to what you can achieve in the field of data science.” More