More stories

  • in

    A technique for more effective multipurpose robots

    Let’s say you want to train a robot so it understands how to use tools and can then quickly learn to make repairs around your house with a hammer, wrench, and screwdriver. To do that, you would need an enormous amount of data demonstrating tool use.Existing robotic datasets vary widely in modality — some include color images while others are composed of tactile imprints, for instance. Data could also be collected in different domains, like simulation or human demos. And each dataset may capture a unique task and environment.It is difficult to efficiently incorporate data from so many sources in one machine-learning model, so many methods use just one type of data to train a robot. But robots trained this way, with a relatively small amount of task-specific data, are often unable to perform new tasks in unfamiliar environments.In an effort to train better multipurpose robots, MIT researchers developed a technique to combine multiple sources of data across domains, modalities, and tasks using a type of generative AI known as diffusion models.They train a separate diffusion model to learn a strategy, or policy, for completing one task using one specific dataset. Then they combine the policies learned by the diffusion models into a general policy that enables a robot to perform multiple tasks in various settings.In simulations and real-world experiments, this training approach enabled a robot to perform multiple tool-use tasks and adapt to new tasks it did not see during training. The method, known as Policy Composition (PoCo), led to a 20 percent improvement in task performance when compared to baseline techniques.“Addressing heterogeneity in robotic datasets is like a chicken-egg problem. If we want to use a lot of data to train general robot policies, then we first need deployable robots to get all this data. I think that leveraging all the heterogeneous data available, similar to what researchers have done with ChatGPT, is an important step for the robotics field,” says Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on PoCo.     Wang’s coauthors include Jialiang Zhao, a mechanical engineering graduate student; Yilun Du, an EECS graduate student; Edward Adelson, the John and Dorothy Wilson Professor of Vision Science in the Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Russ Tedrake, the Toyota Professor of EECS, Aeronautics and Astronautics, and Mechanical Engineering, and a member of CSAIL. The research will be presented at the Robotics: Science and Systems Conference.Combining disparate datasetsA robotic policy is a machine-learning model that takes inputs and uses them to perform an action. One way to think about a policy is as a strategy. In the case of a robotic arm, that strategy might be a trajectory, or a series of poses that move the arm so it picks up a hammer and uses it to pound a nail.Datasets used to learn robotic policies are typically small and focused on one particular task and environment, like packing items into boxes in a warehouse.“Every single robotic warehouse is generating terabytes of data, but it only belongs to that specific robot installation working on those packages. It is not ideal if you want to use all of these data to train a general machine,” Wang says.The MIT researchers developed a technique that can take a series of smaller datasets, like those gathered from many robotic warehouses, learn separate policies from each one, and combine the policies in a way that enables a robot to generalize to many tasks.They represent each policy using a type of generative AI model known as a diffusion model. Diffusion models, often used for image generation, learn to create new data samples that resemble samples in a training dataset by iteratively refining their output.But rather than teaching a diffusion model to generate images, the researchers teach it to generate a trajectory for a robot. They do this by adding noise to the trajectories in a training dataset. The diffusion model gradually removes the noise and refines its output into a trajectory.This technique, known as Diffusion Policy, was previously introduced by researchers at MIT, Columbia University, and the Toyota Research Institute. PoCo builds off this Diffusion Policy work. The team trains each diffusion model with a different type of dataset, such as one with human video demonstrations and another gleaned from teleoperation of a robotic arm.Then the researchers perform a weighted combination of the individual policies learned by all the diffusion models, iteratively refining the output so the combined policy satisfies the objectives of each individual policy.Greater than the sum of its parts“One of the benefits of this approach is that we can combine policies to get the best of both worlds. For instance, a policy trained on real-world data might be able to achieve more dexterity, while a policy trained on simulation might be able to achieve more generalization,” Wang says.

    With policy composition, researchers are able to combine datasets from multiple sources so they can teach a robot to effectively use a wide range of tools, like a hammer, screwdriver, or this spatula.Image: Courtesy of the researchers

    Because the policies are trained separately, one could mix and match diffusion policies to achieve better results for a certain task. A user could also add data in a new modality or domain by training an additional Diffusion Policy with that dataset, rather than starting the entire process from scratch.

    The policy composition technique the researchers developed can be used to effectively teach a robot to use tools even when objects are placed around it to try and distract it from its task, as seen here.Image: Courtesy of the researchers

    The researchers tested PoCo in simulation and on real robotic arms that performed a variety of tools tasks, such as using a hammer to pound a nail and flipping an object with a spatula. PoCo led to a 20 percent improvement in task performance compared to baseline methods.“The striking thing was that when we finished tuning and visualized it, we can clearly see that the composed trajectory looks much better than either one of them individually,” Wang says.In the future, the researchers want to apply this technique to long-horizon tasks where a robot would pick up one tool, use it, then switch to another tool. They also want to incorporate larger robotics datasets to improve performance.“We will need all three kinds of data to succeed for robotics: internet data, simulation data, and real robot data. How to combine them effectively will be the million-dollar question. PoCo is a solid step on the right track,” says Jim Fan, senior research scientist at NVIDIA and leader of the AI Agents Initiative, who was not involved with this work.This research is funded, in part, by Amazon, the Singapore Defense Science and Technology Agency, the U.S. National Science Foundation, and the Toyota Research Institute. More

  • in

    Looking for a specific action in a video? This AI-based method can find it for you

    The internet is awash in instructional videos that can teach curious viewers everything from cooking the perfect pancake to performing a life-saving Heimlich maneuver.But pinpointing when and where a particular action happens in a long video can be tedious. To streamline the process, scientists are trying to teach computers to perform this task. Ideally, a user could just describe the action they’re looking for, and an AI model would skip to its location in the video.However, teaching machine-learning models to do this usually requires a great deal of expensive video data that have been painstakingly hand-labeled.A new, more efficient approach from researchers at MIT and the MIT-IBM Watson AI Lab trains a model to perform this task, known as spatio-temporal grounding, using only videos and their automatically generated transcripts.The researchers teach a model to understand an unlabeled video in two distinct ways: by looking at small details to figure out where objects are located (spatial information) and looking at the bigger picture to understand when the action occurs (temporal information).Compared to other AI approaches, their method more accurately identifies actions in longer videos with multiple activities. Interestingly, they found that simultaneously training on spatial and temporal information makes a model better at identifying each individually.In addition to streamlining online learning and virtual training processes, this technique could also be useful in health care settings by rapidly finding key moments in videos of diagnostic procedures, for example.“We disentangle the challenge of trying to encode spatial and temporal information all at once and instead think about it like two experts working on their own, which turns out to be a more explicit way to encode the information. Our model, which combines these two separate branches, leads to the best performance,” says Brian Chen, lead author of a paper on this technique.Chen, a 2023 graduate of Columbia University who conducted this research while a visiting student at the MIT-IBM Watson AI Lab, is joined on the paper by James Glass, senior research scientist, member of the MIT-IBM Watson AI Lab, and head of the Spoken Language Systems Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL); Hilde Kuehne, a member of the MIT-IBM Watson AI Lab who is also affiliated with Goethe University Frankfurt; and others at MIT, Goethe University, the MIT-IBM Watson AI Lab, and Quality Match GmbH. The research will be presented at the Conference on Computer Vision and Pattern Recognition.Global and local learningResearchers usually teach models to perform spatio-temporal grounding using videos in which humans have annotated the start and end times of particular tasks.Not only is generating these data expensive, but it can be difficult for humans to figure out exactly what to label. If the action is “cooking a pancake,” does that action start when the chef begins mixing the batter or when she pours it into the pan?“This time, the task may be about cooking, but next time, it might be about fixing a car. There are so many different domains for people to annotate. But if we can learn everything without labels, it is a more general solution,” Chen says.For their approach, the researchers use unlabeled instructional videos and accompanying text transcripts from a website like YouTube as training data. These don’t need any special preparation.They split the training process into two pieces. For one, they teach a machine-learning model to look at the entire video to understand what actions happen at certain times. This high-level information is called a global representation.For the second, they teach the model to focus on a specific region in parts of the video where action is happening. In a large kitchen, for instance, the model might only need to focus on the wooden spoon a chef is using to mix pancake batter, rather than the entire counter. This fine-grained information is called a local representation.The researchers incorporate an additional component into their framework to mitigate misalignments that occur between narration and video. Perhaps the chef talks about cooking the pancake first and performs the action later.To develop a more realistic solution, the researchers focused on uncut videos that are several minutes long. In contrast, most AI techniques train using few-second clips that someone trimmed to show only one action.A new benchmarkBut when they came to evaluate their approach, the researchers couldn’t find an effective benchmark for testing a model on these longer, uncut videos — so they created one.To build their benchmark dataset, the researchers devised a new annotation technique that works well for identifying multistep actions. They had users mark the intersection of objects, like the point where a knife edge cuts a tomato, rather than drawing a box around important objects.“This is more clearly defined and speeds up the annotation process, which reduces the human labor and cost,” Chen says.Plus, having multiple people do point annotation on the same video can better capture actions that occur over time, like the flow of milk being poured. All annotators won’t mark the exact same point in the flow of liquid.When they used this benchmark to test their approach, the researchers found that it was more accurate at pinpointing actions than other AI techniques.Their method was also better at focusing on human-object interactions. For instance, if the action is “serving a pancake,” many other approaches might focus only on key objects, like a stack of pancakes sitting on a counter. Instead, their method focuses on the actual moment when the chef flips a pancake onto a plate.Next, the researchers plan to enhance their approach so models can automatically detect when text and narration are not aligned, and switch focus from one modality to the other. They also want to extend their framework to audio data, since there are usually strong correlations between actions and the sounds objects make.“AI research has made incredible progress towards creating models like ChatGPT that understand images. But our progress on understanding video is far behind. This work represents a significant step forward in that direction,” says Kate Saenko, a professor in the Department of Computer Science at Boston University who was not involved with this work.This research is funded, in part, by the MIT-IBM Watson AI Lab. More

  • in

    A community collaboration for progress

    While decades of discriminatory policies and practices continue to fuel the affordable housing crisis in the United States, less than three miles from the MIT campus exists a beacon of innovation and community empowerment.“We are very proud to continue MIT’s long-standing partnership with Camfield Estates,” says Catherine D’Ignazio, associate professor of urban science and planning. “Camfield has long been an incubator of creative ideas focused on uplifting their community.”D’Ignazio co-leads a research team focused on housing as part of the MIT Initiative for Combatting Systemic Racism (ICSR) led by the Institute for Data, Systems, and Society (IDSS). The group researches the uneven impacts of data, AI, and algorithmic systems on housing in the United States, as well as ways that these same tools could be used to address racial disparities. The Camfield Tenant Association is a research partner providing insight into the issue and relevant data, as well as opportunities for MIT researchers to solve real challenges and make a local impact.

    Play video

    MIT Initiative on Combatting Systemic Racism – Housing Video: MIT Sociotechnical Systems Research Center

    Formerly known as “Camfield Gardens,” the 102-unit housing development in Roxbury, Massachusetts, was among the pioneering sites in the 1990s to engage in the U.S. Department of Housing and Urban Development’s (HUD) program aimed at revitalizing disrepaired public housing across the country. This also served as the catalyst for their collaboration with MIT, which began in the early 2000s.“The program gave Camfield the money and energy to tear everything on the site down and build it back up anew, in addition to allowing them to buy the property from the city for $1 and take full ownership of the site,” explains Nolen Scruggs, a master’s student in the MIT Department of Urban Studies and Planning (DUSP) who has worked with Camfield over the past few years as part of ICSR’s housing vertical team. “At the time, MIT graduate students helped start a ‘digital divide’ bridge gap program that later evolved into the tech lab that is still there today, continuing to enable residents to learn computer skills and things they might need to get a hand up.”Because of that early collaboration, Camfield Estates reached out to MIT in 2022 to start a new chapter of collaboration with students. Scruggs spent a few months building a team of students from Harvard University, Wentworth Institute of Technology, and MIT to work on a housing design project meant to help the Camfield Tenants Association prepare for their looming redevelopment needs.“One of the things that’s been really important to the work of the ICSR housing vertical is historical context,” says Peko Hosoi, a professor of mechanical engineering and mathematics who co-leads the ICSR Housing vertical with D’Ignazio. “We didn’t get to the place we are right now with housing in an instant. There’s a lot of things that have happened in the U.S. like redlining, predatory lending, and different ways of investing in infrastructure that add important contexts.”“Quantitative methods are a great way to look across macroscale phenomena, but our team recognizes and values qualitative and participatory methods as well, to get a more grounded picture of what community needs really are and what kinds of innovations can bubble up from communities themselves,” D’Ignazio adds. “This is where the partnership with Camfield Estates comes in, which Nolen has been leading.”Finding creative solutionsBefore coming to MIT, Scruggs, a proud New Yorker, worked on housing issues while interning for his local congressperson, House Minority Leader Hakeem Jeffries. He called residents to discuss their housing concerns, learning about the affordability issues that were making it hard for lower- and middle-income families to find places to live.“Having this behind-the-scenes experience set the stage for my involvement in Camfield,” Scruggs says, recalling his start at Camfield conducting participatory action research, meeting with Camfield seniors to discuss and capture their concerns.Scruggs says the biggest issue they have been trying to tackle with Camfield is twofold: creating more space for new residents while also helping current residents achieve their end goal of homeownership.“This speaks to some of the larger issues our group at ICSR is working on in terms of housing affordability,” he says. “With Camfield it is looking at where can people with Section 8 vouchers move, what limits do they have, and what barriers do they face — whether it’s through big tech systems, or individual preferences coming from landlords.”Scruggs adds, “The discrimination those people face while trying to find a house, lock it down, talk to a bank, etc. — it can be very, very difficult and discouraging.” Scruggs says one attempt to combat this issue would be through hiring a caseworker to assist people through the process — one of many ideas that came from a Camfield collaboration with the FHLBank Affordable Housing Development Competition.As part of the competition, the goal for Scruggs’s team was to help Camfield tenants understand all of their options and their potential trade-offs, so that in the end they can make informed decisions about what they want to do with their space.“So often redevelopment schemes don’t ensure people can come back.” Scruggs says. “There are specific design proposals being made to ensure that the structure of people’s lifestyles wouldn’t be disrupted.”Scruggs says that tentative recommendations discussed with tenant association president Paulette Ford include replacing the community center with a high-rise development that would increase the number of units available.“I think they are thinking really creatively about their options,” Hosoi says. “Paulette Ford, and her mother before her, have always referred to Camfield as a ‘hand up,’ with the idea that people come to Camfield to live until they can afford a home of their own locally.”Scruggs’s other partnership with Camfield involves working with MIT undergraduate Amelie Nagle as part of the Undergraduate Research Opportunities Program to create programing that will teach computer design and coding to Camfield community kids — in the very TechLab that goes back to MIT and Camfield’s first collaboration.“Nolen has a real commitment to community-led knowledge production,” says D’Ignazio. “It has been a pleasure to work with him and see how he takes all his urban planning skills (GIS, mapping, urban design, photography, and more) to work in respectful ways that foreground community innovation.”She adds: “We are hopeful that the process will yield some high-quality architectural and planning ideas, and help Camfield take the next step towards realizing their innovative vision.” More

  • in

    From steel engineering to ovarian tumor research

    Ashutosh Kumar is a classically trained materials engineer. Having grown up with a passion for making things, he has explored steel design and studied stress fractures in alloys.Throughout Kumar’s education, however, he was also drawn to biology and medicine. When he was accepted into an undergraduate metallurgical engineering and materials science program at Indian Institute of Technology (IIT) Bombay, the native of Jamshedpur was very excited — and “a little dissatisfied, since I couldn’t do biology anymore.”Now a PhD candidate and a MathWorks Fellow in MIT’s Department of Materials Science and Engineering, Kumar can merge his wide-ranging interests. He studies the effect of certain bacteria that have been observed encouraging the spread of ovarian cancer and possibly reducing the effectiveness of chemotherapy and immunotherapy.“Some microbes have an affinity toward infecting ovarian cancer cells, which can lead to changes in the cellular structure and reprogramming cells to survive in stressful conditions,” Kumar says. “This means that cells can migrate to different sites and may have a mechanism to develop chemoresistance. This opens an avenue to develop therapies to see if we can start to undo some of these changes.”Kumar’s research combines microbiology, bioengineering, artificial intelligence, big data, and materials science. Using microbiome sequencing and AI, he aims to define microbiome changes that may correlate with poor patient outcomes. Ultimately, his goal is to engineer bacteriophage viruses to reprogram bacteria to work therapeutically.Kumar started inching toward work in the health sciences just months into earning his bachelor’s degree at IIT Bombay.“I realized engineering is so flexible that its applications extend to any field,” he says, adding that he started working with biomaterials “to respect both my degree program and my interests.”“I loved it so much that I decided to go to graduate school,” he adds.Starting his PhD program at MIT, he says, “was a fantastic opportunity to switch gears and work on more interdisciplinary or ‘MIT-type’ work.”Kumar says he and Angela Belcher, the James Mason Crafts Professor of biological engineering and materials science, began discussing the impact of the microbiome on ovarian cancer when he first arrived at MIT.“I shared my enthusiasm about human health and biology, and we started brainstorming,” he says. “We realized that there’s an unmet need to understand a lot of gynecological cancers. Ovarian cancer is an aggressive cancer, which is usually diagnosed when it’s too late and has already spread.”In 2022, Kumar was awarded a MathWorks Fellowship. The fellowships are awarded to School of Engineering graduate students, preferably those who use MATLAB or Simulink — which were developed by the mathematical computer software company MathWorks — in their research. The philanthropic support fueled Kumar’s full transition into health science research.“The work we are doing now was initially not funded by traditional sources, and the MathWorks Fellowship gave us the flexibility to pursue this field,” Kumar says. “It provided me with opportunities to learn new skills and ask questions about this topic. MathWorks gave me a chance to explore my interests and helped me navigate from being a steel engineer to a cancer scientist.”Kumar’s work on the relationship between bacteria and ovarian cancer started with studying which bacteria are incorporated into tumors in mouse models.“We started looking closely at changes in cell structure and how those changes impact cancer progression,” he says, adding that MATLAB image processing helps him and his collaborators track tumor metastasis.The research team also uses RNA sequencing and MATLAB algorithms to construct a taxonomy of the bacteria.“Once we have identified the microbiome composition,” Kumar says, “we want to see how the microbiome changes as cancer progresses and identify changes in, let’s say, patients who develop chemoresistance.”He says recent findings that ovarian cancer may originate in the fallopian tubes are promising because detecting cancer-related biomarkers or lesions before cancer spreads to the ovaries could lead to better prognoses.As he pursues his research, Kumar says he is extremely thankful to Belcher “for believing in me to work on this project.“She trusted me and my passion for making an impact on human health — even though I come from a materials engineering background — and supported me throughout. It was her passion to take on new challenges that made it possible for me to work on this idea. She has been an amazing mentor and motivated me to continue moving forward.”For her part, Belcher is equally enthralled.“It has been amazing to work with Ashutosh on this ovarian cancer microbiome project,” she says. “He has been so passionate and dedicated to looking for less-conventional approaches to solve this debilitating disease. His innovations around looking for very early changes in the microenvironment of this disease could be critical in interception and prevention of ovarian cancer. We started this project with very little preliminary data, so his MathWorks fellowship was critical in the initiation of the project.”Kumar, who has been very active in student government and community-building activities, believes it is very important for students to feel included and at home at their institutions so they can develop in ways outside of academics. He says that his own involvement helps him take time off from work.“Science can never stop, and there will always be something to do,” he says, explaining that he deliberately schedules time off and that social engagement helps him to experience downtime. “Engaging with community members through events on campus or at the dorm helps set a mental boundary with work.”Regarding his unusual route through materials science to cancer research, Kumar regards it as something that occurred organically.“I have observed that life is very dynamic,” he says. “What we think we might do versus what we end up doing is never consistent. Five years back, I had no idea I would be at MIT working with such excellent scientific mentors around me.” More

  • in

    Exploring the mysterious alphabet of sperm whales

    The allure of whales has stoked human consciousness for millennia, casting these ocean giants as enigmatic residents of the deep seas. From the biblical Leviathan to Herman Melville’s formidable Moby Dick, whales have been central to mythologies and folklore. And while cetology, or whale science, has improved our knowledge of these marine mammals in the past century in particular, studying whales has remained a formidable a challenge.Now, thanks to machine learning, we’re a little closer to understanding these gentle giants. Researchers from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Project CETI (Cetacean Translation Initiative) recently used algorithms to decode the “sperm whale phonetic alphabet,” revealing sophisticated structures in sperm whale communication akin to human phonetics and communication systems in other animal species. In a new open-access study published in Nature Communications, the research shows that sperm whales codas, or short bursts of clicks that they use to communicate, vary significantly in structure depending on the conversational context, revealing a communication system far more intricate than previously understood. 

    Play video

    The Secret Language of Sperm Whales, DecodedVideo: MIT CSAIL

    Nine thousand codas, collected from Eastern Caribbean sperm whale families observed by the Dominica Sperm Whale Project, proved an instrumental starting point in uncovering the creatures’ complex communication system. Alongside the data gold mine, the team used a mix of algorithms for pattern recognition and classification, as well as on-body recording equipment. It turned out that sperm whale communications were indeed not random or simplistic, but rather structured in a complex, combinatorial manner. The researchers identified something of a “sperm whale phonetic alphabet,” where various elements that researchers call  “rhythm,” “tempo,” “rubato,” and “ornamentation” interplay to form a vast array of distinguishable codas. For example, the whales would systematically modulate certain aspects of their codas based on the conversational context, such as smoothly varying the duration of the calls — rubato — or adding extra ornamental clicks. But even more remarkably, they found that the basic building blocks of these codas could be combined in a combinatorial fashion, allowing the whales to construct a vast repertoire of distinct vocalizations.The experiments were conducted using acoustic bio-logging tags (specifically something called “D-tags”) deployed on whales from the Eastern Caribbean clan. These tags captured the intricate details of the whales’ vocal patterns. By developing new visualization and data analysis techniques, the CSAIL researchers found that individual sperm whales could emit various coda patterns in long exchanges, not just repeats of the same coda. These patterns, they say, are nuanced, and include fine-grained variations that other whales also produce and recognize.“We are venturing into the unknown, to decipher the mysteries of sperm whale communication without any pre-existing ground truth data,” says Daniela Rus, CSAIL director and professor of electrical engineering and computer science (EECS) at MIT. “Using machine learning is important for identifying the features of their communications and predicting what they say next. Our findings indicate the presence of structured information content and also challenges the prevailing belief among many linguists that complex communication is unique to humans. This is a step toward showing that other species have levels of communication complexity that have not been identified so far, deeply connected to behavior. Our next steps aim to decipher the meaning behind these communications and explore the societal-level correlations between what is being said and group actions.”Whaling aroundSperm whales have the largest brains among all known animals. This is accompanied by very complex social behaviors between families and cultural groups, necessitating strong communication for coordination, especially in pressurized environments like deep sea hunting.Whales owe much to Roger Payne, former Project CETI advisor, whale biologist, conservationist, and MacArthur Fellow who was a major figure in elucidating their musical careers. In the noted 1971 Science article “Songs of Humpback Whales,” Payne documented how whales can sing. His work later catalyzed the “Save the Whales” movement, a successful and timely conservation initiative.“Roger’s research highlights the impact science can have on society. His finding that whales sing led to the marine mammal protection act and helped save several whale species from extinction. This interdisciplinary research now brings us one step closer to knowing what sperm whales are saying,” says David Gruber, lead and founder of Project CETI and distinguished professor of biology at the City University of New York.Today, CETI’s upcoming research aims to discern whether elements like rhythm, tempo, ornamentation, and rubato carry specific communicative intents, potentially providing insights into the “duality of patterning” — a linguistic phenomenon where simple elements combine to convey complex meanings previously thought unique to human language.Aliens among us“One of the intriguing aspects of our research is that it parallels the hypothetical scenario of contacting alien species. It’s about understanding a species with a completely different environment and communication protocols, where their interactions are distinctly different from human norms,” says Pratyusha Sharma, an MIT PhD student in EECS, CSAIL affiliate, and the study’s lead author. “We’re exploring how to interpret the basic units of meaning in their communication. This isn’t just about teaching animals a subset of human language, but decoding a naturally evolved communication system within their unique biological and environmental constraints. Essentially, our work could lay the groundwork for deciphering how an ‘alien civilization’ might communicate, providing insights into creating algorithms or systems to understand entirely unfamiliar forms of communication.”“Many animal species have repertoires of several distinct signals, but we are only beginning to uncover the extent to which they combine these signals to create new messages,” says Robert Seyfarth, a University of Pennsylvania professor emeritus of psychology who was not involved in the research. “Scientists are particularly interested in whether signal combinations vary according to the social or ecological context in which they are given, and the extent to which signal combinations follow discernible ‘rules’ that are recognized by listeners. The problem is particularly challenging in the case of marine mammals, because scientists usually cannot see their subjects or identify in complete detail the context of communication. Nonetheless, this paper offers new, tantalizing details of call combinations and the rules that underlie them in sperm whales.”Joining Sharma, Rus, and Gruber are two others from MIT, both CSAIL principal investigators and professors in EECS: Jacob Andreas and Antonio Torralba. They join Shane Gero, biology lead at CETI, founder of the Dominica Sperm Whale Project, and scientist-in residence at Carleton University. The paper was funded by Project CETI via Dalio Philanthropies and Ocean X, Sea Grape Foundation, Rosamund Zander/Hansjorg Wyss, and Chris Anderson/Jacqueline Novogratz through The Audacious Project: a collaborative funding initiative housed at TED, with further support from the J.H. and E.V. Wade Fund at MIT. More

  • in

    Fostering research, careers, and community in materials science

    Gabrielle Wood, a junior at Howard University majoring in chemical engineering, is on a mission to improve the sustainability and life cycles of natural resources and materials. Her work in the Materials Initiative for Comprehensive Research Opportunity (MICRO) program has given her hands-on experience with many different aspects of research, including MATLAB programming, experimental design, data analysis, figure-making, and scientific writing.Wood is also one of 10 undergraduates from 10 universities around the United States to participate in the first MICRO Summit earlier this year. The internship program, developed by the MIT Department of Materials Science and Engineering (DMSE), first launched in fall 2021. Now in its third year, the program continues to grow, providing even more opportunities for non-MIT undergraduate students — including the MICRO Summit and the program’s expansion to include Northwestern University.“I think one of the most valuable aspects of the MICRO program is the ability to do research long term with an experienced professor in materials science and engineering,” says Wood. “My school has limited opportunities for undergraduate research in sustainable polymers, so the MICRO program allowed me to gain valuable experience in this field, which I would not otherwise have.”Like Wood, Griheydi Garcia, a senior chemistry major at Manhattan College, values the exposure to materials science, especially since she is not able to learn as much about it at her home institution.“I learned a lot about crystallography and defects in materials through the MICRO curriculum, especially through videos,” says Garcia. “The research itself is very valuable, as well, because we get to apply what we’ve learned through the videos in the research we do remotely.”Expanding research opportunitiesFrom the beginning, the MICRO program was designed as a fully remote, rigorous education and mentoring program targeted toward students from underserved backgrounds interested in pursuing graduate school in materials science or related fields. Interns are matched with faculty to work on their specific research interests.Jessica Sandland ’99, PhD ’05, principal lecturer in DMSE and co-founder of MICRO, says that research projects for the interns are designed to be work that they can do remotely, such as developing a machine-learning algorithm or a data analysis approach.“It’s important to note that it’s not just about what the program and faculty are bringing to the student interns,” says Sandland, a member of the MIT Digital Learning Lab, a joint program between MIT Open Learning and the Institute’s academic departments. “The students are doing real research and work, and creating things of real value. It’s very much an exchange.”Cécile Chazot PhD ’22, now an assistant professor of materials science and engineering at Northwestern University, had helped to establish MICRO at MIT from the very beginning. Once at Northwestern, she quickly realized that expanding MICRO to Northwestern would offer even more research opportunities to interns than by relying on MIT alone — leveraging the university’s strong materials science and engineering department, as well as offering resources for biomaterials research through Northwestern’s medical school. The program received funding from 3M and officially launched at Northwestern in fall 2023. Approximately half of the MICRO interns are now in the program with MIT and half are with Northwestern. Wood and Garcia both participate in the program via Northwestern.“By expanding to another school, we’ve been able to have interns work with a much broader range of research projects,” says Chazot. “It has become easier for us to place students with faculty and research that match their interests.”Building communityThe MICRO program received a Higher Education Innovation grant from the Abdul Latif Jameel World Education Lab, part of MIT Open Learning, to develop an in-person summit. In January 2024, interns visited MIT for three days of presentations, workshops, and campus tours — including a tour of the MIT.nano building — as well as various community-building activities.“A big part of MICRO is the community,” says Chazot. “A highlight of the summit was just seeing the students come together.”The summit also included panel discussions that allowed interns to gain insights and advice from graduate students and professionals. The graduate panel discussion included MIT graduate students Sam Figueroa (mechanical engineering), Isabella Caruso (DMSE), and Eliana Feygin (DMSE). The career panel was led by Chazot and included Jatin Patil PhD ’23, head of product at SiTration; Maureen Reitman ’90, ScD ’93, group vice president and principal engineer at Exponent; Lucas Caretta PhD ’19, assistant professor of engineering at Brown University; Raquel D’Oyen ’90, who holds a PhD from Northwestern University and is a senior engineer at Raytheon; and Ashley Kaiser MS ’19, PhD ’21, senior process engineer at 6K.Students also had an opportunity to share their work with each other through research presentations. Their presentations covered a wide range of topics, including: developing a computer program to calculate solubility parameters for polymers used in textile manufacturing; performing a life-cycle analysis of a photonic chip and evaluating its environmental impact in comparison to a standard silicon microchip; and applying machine learning algorithms to scanning transmission electron microscopy images of CrSBr, a two-dimensional magnetic material. “The summit was wonderful and the best academic experience I have had as a first-year college student,” says MICRO intern Gabriella La Cour, who is pursuing a major in chemistry and dual degree biomedical engineering at Spelman College and participates in MICRO through MIT. “I got to meet so many students who were all in grades above me … and I learned a little about how to navigate college as an upperclassman.” “I actually have an extremely close friendship with one of the students, and we keep in touch regularly,” adds La Cour. “Professor Chazot gave valuable advice about applications and recommendation letters that will be useful when I apply to REUs [Research Experiences for Undergraduates] and graduate schools.”Looking to the future, MICRO organizers hope to continue to grow the program’s reach.“We would love to see other schools taking on this model,” says Sandland. “There are a lot of opportunities out there. The more departments, research groups, and mentors that get involved with this program, the more impact it can have.” More

  • in

    An AI dataset carves new paths to tornado detection

    The return of spring in the Northern Hemisphere touches off tornado season. A tornado’s twisting funnel of dust and debris seems an unmistakable sight. But that sight can be obscured to radar, the tool of meteorologists. It’s hard to know exactly when a tornado has formed, or even why.

    A new dataset could hold answers. It contains radar returns from thousands of tornadoes that have hit the United States in the past 10 years. Storms that spawned tornadoes are flanked by other severe storms, some with nearly identical conditions, that never did. MIT Lincoln Laboratory researchers who curated the dataset, called TorNet, have now released it open source. They hope to enable breakthroughs in detecting one of nature’s most mysterious and violent phenomena.

    “A lot of progress is driven by easily available, benchmark datasets. We hope TorNet will lay a foundation for machine learning algorithms to both detect and predict tornadoes,” says Mark Veillette, the project’s co-principal investigator with James Kurdzo. Both researchers work in the Air Traffic Control Systems Group. 

    Along with the dataset, the team is releasing models trained on it. The models show promise for machine learning’s ability to spot a twister. Building on this work could open new frontiers for forecasters, helping them provide more accurate warnings that might save lives. 

    Swirling uncertainty

    About 1,200 tornadoes occur in the United States every year, causing millions to billions of dollars in economic damage and claiming 71 lives on average. Last year, one unusually long-lasting tornado killed 17 people and injured at least 165 others along a 59-mile path in Mississippi.  

    Yet tornadoes are notoriously difficult to forecast because scientists don’t have a clear picture of why they form. “We can see two storms that look identical, and one will produce a tornado and one won’t. We don’t fully understand it,” Kurdzo says.

    A tornado’s basic ingredients are thunderstorms with instability caused by rapidly rising warm air and wind shear that causes rotation. Weather radar is the primary tool used to monitor these conditions. But tornadoes lay too low to be detected, even when moderately close to the radar. As the radar beam with a given tilt angle travels further from the antenna, it gets higher above the ground, mostly seeing reflections from rain and hail carried in the “mesocyclone,” the storm’s broad, rotating updraft. A mesocyclone doesn’t always produce a tornado.

    With this limited view, forecasters must decide whether or not to issue a tornado warning. They often err on the side of caution. As a result, the rate of false alarms for tornado warnings is more than 70 percent. “That can lead to boy-who-cried-wolf syndrome,” Kurdzo says.  

    In recent years, researchers have turned to machine learning to better detect and predict tornadoes. However, raw datasets and models have not always been accessible to the broader community, stifling progress. TorNet is filling this gap.

    The dataset contains more than 200,000 radar images, 13,587 of which depict tornadoes. The rest of the images are non-tornadic, taken from storms in one of two categories: randomly selected severe storms or false-alarm storms (those that led a forecaster to issue a warning but that didn’t produce a tornado).

    Each sample of a storm or tornado comprises two sets of six radar images. The two sets correspond to different radar sweep angles. The six images portray different radar data products, such as reflectivity (showing precipitation intensity) or radial velocity (indicating if winds are moving toward or away from the radar).

    A challenge in curating the dataset was first finding tornadoes. Within the corpus of weather radar data, tornadoes are extremely rare events. The team then had to balance those tornado samples with difficult non-tornado samples. If the dataset were too easy, say by comparing tornadoes to snowstorms, an algorithm trained on the data would likely over-classify storms as tornadic.

    “What’s beautiful about a true benchmark dataset is that we’re all working with the same data, with the same level of difficulty, and can compare results,” Veillette says. “It also makes meteorology more accessible to data scientists, and vice versa. It becomes easier for these two parties to work on a common problem.”

    Both researchers represent the progress that can come from cross-collaboration. Veillette is a mathematician and algorithm developer who has long been fascinated by tornadoes. Kurdzo is a meteorologist by training and a signal processing expert. In grad school, he chased tornadoes with custom-built mobile radars, collecting data to analyze in new ways.

    “This dataset also means that a grad student doesn’t have to spend a year or two building a dataset. They can jump right into their research,” Kurdzo says.

    This project was funded by Lincoln Laboratory’s Climate Change Initiative, which aims to leverage the laboratory’s diverse technical strengths to help address climate problems threatening human health and global security.

    Chasing answers with deep learning

    Using the dataset, the researchers developed baseline artificial intelligence (AI) models. They were particularly eager to apply deep learning, a form of machine learning that excels at processing visual data. On its own, deep learning can extract features (key observations that an algorithm uses to make a decision) from images across a dataset. Other machine learning approaches require humans to first manually label features. 

    “We wanted to see if deep learning could rediscover what people normally look for in tornadoes and even identify new things that typically aren’t searched for by forecasters,” Veillette says.

    The results are promising. Their deep learning model performed similar to or better than all tornado-detecting algorithms known in literature. The trained algorithm correctly classified 50 percent of weaker EF-1 tornadoes and over 85 percent of tornadoes rated EF-2 or higher, which make up the most devastating and costly occurrences of these storms.

    They also evaluated two other types of machine-learning models, and one traditional model to compare against. The source code and parameters of all these models are freely available. The models and dataset are also described in a paper submitted to a journal of the American Meteorological Society (AMS). Veillette presented this work at the AMS Annual Meeting in January.

    “The biggest reason for putting our models out there is for the community to improve upon them and do other great things,” Kurdzo says. “The best solution could be a deep learning model, or someone might find that a non-deep learning model is actually better.”

    TorNet could be useful in the weather community for others uses too, such as for conducting large-scale case studies on storms. It could also be augmented with other data sources, like satellite imagery or lightning maps. Fusing multiple types of data could improve the accuracy of machine learning models.

    Taking steps toward operations

    On top of detecting tornadoes, Kurdzo hopes that models might help unravel the science of why they form.

    “As scientists, we see all these precursors to tornadoes — an increase in low-level rotation, a hook echo in reflectivity data, specific differential phase (KDP) foot and differential reflectivity (ZDR) arcs. But how do they all go together? And are there physical manifestations we don’t know about?” he asks.

    Teasing out those answers might be possible with explainable AI. Explainable AI refers to methods that allow a model to provide its reasoning, in a format understandable to humans, of why it came to a certain decision. In this case, these explanations might reveal physical processes that happen before tornadoes. This knowledge could help train forecasters, and models, to recognize the signs sooner. 

    “None of this technology is ever meant to replace a forecaster. But perhaps someday it could guide forecasters’ eyes in complex situations, and give a visual warning to an area predicted to have tornadic activity,” Kurdzo says.

    Such assistance could be especially useful as radar technology improves and future networks potentially grow denser. Data refresh rates in a next-generation radar network are expected to increase from every five minutes to approximately one minute, perhaps faster than forecasters can interpret the new information. Because deep learning can process huge amounts of data quickly, it could be well-suited for monitoring radar returns in real time, alongside humans. Tornadoes can form and disappear in minutes.

    But the path to an operational algorithm is a long road, especially in safety-critical situations, Veillette says. “I think the forecaster community is still, understandably, skeptical of machine learning. One way to establish trust and transparency is to have public benchmark datasets like this one. It’s a first step.”

    The next steps, the team hopes, will be taken by researchers across the world who are inspired by the dataset and energized to build their own algorithms. Those algorithms will in turn go into test beds, where they’ll eventually be shown to forecasters, to start a process of transitioning into operations.

    In the end, the path could circle back to trust.

    “We may never get more than a 10- to 15-minute tornado warning using these tools. But if we could lower the false-alarm rate, we could start to make headway with public perception,” Kurdzo says. “People are going to use those warnings to take the action they need to save their lives.” More

  • in

    This tiny chip can safeguard user data while enabling efficient computing on a smartphone

    Health-monitoring apps can help people manage chronic diseases or stay on track with fitness goals, using nothing more than a smartphone. However, these apps can be slow and energy-inefficient because the vast machine-learning models that power them must be shuttled between a smartphone and a central memory server.

    Engineers often speed things up using hardware that reduces the need to move so much data back and forth. While these machine-learning accelerators can streamline computation, they are susceptible to attackers who can steal secret information.

    To reduce this vulnerability, researchers from MIT and the MIT-IBM Watson AI Lab created a machine-learning accelerator that is resistant to the two most common types of attacks. Their chip can keep a user’s health records, financial information, or other sensitive data private while still enabling huge AI models to run efficiently on devices.

    The team developed several optimizations that enable strong security while only slightly slowing the device. Moreover, the added security does not impact the accuracy of computations. This machine-learning accelerator could be particularly beneficial for demanding AI applications like augmented and virtual reality or autonomous driving.

    While implementing the chip would make a device slightly more expensive and less energy-efficient, that is sometimes a worthwhile price to pay for security, says lead author Maitreyi Ashok, an electrical engineering and computer science (EECS) graduate student at MIT.

    “It is important to design with security in mind from the ground up. If you are trying to add even a minimal amount of security after a system has been designed, it is prohibitively expensive. We were able to effectively balance a lot of these tradeoffs during the design phase,” says Ashok.

    Her co-authors include Saurav Maji, an EECS graduate student; Xin Zhang and John Cohn of the MIT-IBM Watson AI Lab; and senior author Anantha Chandrakasan, MIT’s chief innovation and strategy officer, dean of the School of Engineering, and the Vannevar Bush Professor of EECS. The research will be presented at the IEEE Custom Integrated Circuits Conference.

    Side-channel susceptibility

    The researchers targeted a type of machine-learning accelerator called digital in-memory compute. A digital IMC chip performs computations inside a device’s memory, where pieces of a machine-learning model are stored after being moved over from a central server.

    The entire model is too big to store on the device, but by breaking it into pieces and reusing those pieces as much as possible, IMC chips reduce the amount of data that must be moved back and forth.

    But IMC chips can be susceptible to hackers. In a side-channel attack, a hacker monitors the chip’s power consumption and uses statistical techniques to reverse-engineer data as the chip computes. In a bus-probing attack, the hacker can steal bits of the model and dataset by probing the communication between the accelerator and the off-chip memory.

    Digital IMC speeds computation by performing millions of operations at once, but this complexity makes it tough to prevent attacks using traditional security measures, Ashok says.

    She and her collaborators took a three-pronged approach to blocking side-channel and bus-probing attacks.

    First, they employed a security measure where data in the IMC are split into random pieces. For instance, a bit zero might be split into three bits that still equal zero after a logical operation. The IMC never computes with all pieces in the same operation, so a side-channel attack could never reconstruct the real information.

    But for this technique to work, random bits must be added to split the data. Because digital IMC performs millions of operations at once, generating so many random bits would involve too much computing. For their chip, the researchers found a way to simplify computations, making it easier to effectively split data while eliminating the need for random bits.

    Second, they prevented bus-probing attacks using a lightweight cipher that encrypts the model stored in off-chip memory. This lightweight cipher only requires simple computations. In addition, they only decrypted the pieces of the model stored on the chip when necessary.

    Third, to improve security, they generated the key that decrypts the cipher directly on the chip, rather than moving it back and forth with the model. They generated this unique key from random variations in the chip that are introduced during manufacturing, using what is known as a physically unclonable function.

    “Maybe one wire is going to be a little bit thicker than another. We can use these variations to get zeros and ones out of a circuit. For every chip, we can get a random key that should be consistent because these random properties shouldn’t change significantly over time,” Ashok explains.

    They reused the memory cells on the chip, leveraging the imperfections in these cells to generate the key. This requires less computation than generating a key from scratch.

    “As security has become a critical issue in the design of edge devices, there is a need to develop a complete system stack focusing on secure operation. This work focuses on security for machine-learning workloads and describes a digital processor that uses cross-cutting optimization. It incorporates encrypted data access between memory and processor, approaches to preventing side-channel attacks using randomization, and exploiting variability to generate unique codes. Such designs are going to be critical in future mobile devices,” says Chandrakasan.

    Safety testing

    To test their chip, the researchers took on the role of hackers and tried to steal secret information using side-channel and bus-probing attacks.

    Even after making millions of attempts, they couldn’t reconstruct any real information or extract pieces of the model or dataset. The cipher also remained unbreakable. By contrast, it took only about 5,000 samples to steal information from an unprotected chip.

    The addition of security did reduce the energy efficiency of the accelerator, and it also required a larger chip area, which would make it more expensive to fabricate.

    The team is planning to explore methods that could reduce the energy consumption and size of their chip in the future, which would make it easier to implement at scale.

    “As it becomes too expensive, it becomes harder to convince someone that security is critical. Future work could explore these tradeoffs. Maybe we could make it a little less secure but easier to implement and less expensive,” Ashok says.

    The research is funded, in part, by the MIT-IBM Watson AI Lab, the National Science Foundation, and a Mathworks Engineering Fellowship. More