More stories

  • in

    Putting AI into the hands of people with problems to solve

    As Media Lab students in 2010, Karthik Dinakar SM ’12, PhD ’17 and Birago Jones SM ’12 teamed up for a class project to build a tool that would help content moderation teams at companies like Twitter (now X) and YouTube. The project generated a huge amount of excitement, and the researchers were invited to give a demonstration at a cyberbullying summit at the White House — they just had to get the thing working.

    The day before the White House event, Dinakar spent hours trying to put together a working demo that could identify concerning posts on Twitter. Around 11 p.m., he called Jones to say he was giving up.

    Then Jones decided to look at the data. It turned out Dinakar’s model was flagging the right types of posts, but the posters were using teenage slang terms and other indirect language that Dinakar didn’t pick up on. The problem wasn’t the model; it was the disconnect between Dinakar and the teens he was trying to help.

    “We realized then, right before we got to the White House, that the people building these models should not be folks who are just machine-learning engineers,” Dinakar says. “They should be people who best understand their data.”

    The insight led the researchers to develop point-and-click tools that allow nonexperts to build machine-learning models. Those tools became the basis for Pienso, which today is helping people build large language models for detecting misinformation, human trafficking, weapons sales, and more, without writing any code.

    “These kinds of applications are important to us because our roots are in cyberbullying and understanding how to use AI for things that really help humanity,” says Jones.

    As for the early version of the system shown at the White House, the founders ended up collaborating with students at nearby schools in Cambridge, Massachusetts, to let them train the models.

    “The models those kids trained were so much better and nuanced than anything I could’ve ever come up with,” Dinakar says. “Birago and I had this big ‘Aha!’ moment where we realized empowering domain experts — which is different from democratizing AI — was the best path forward.”

    A project with purpose

    Jones and Dinakar met as graduate students in the Software Agents research group of the MIT Media Lab. Their work on what became Pienso started in Course 6.864 (Natural Language Processing) and continued until they earned their master’s degrees in 2012.

    It turned out 2010 wasn’t the last time the founders were invited to the White House to demo their project. The work generated a lot of enthusiasm, but the founders worked on Pienso part time until 2016, when Dinakar finished his PhD at MIT and deep learning began to explode in popularity.

    “We’re still connected to many people around campus,” Dinakar says. “The exposure we had at MIT, the melding of human and computer interfaces, widened our understanding. Our philosophy at Pienso couldn’t be possible without the vibrancy of MIT’s campus.”

    The founders also credit MIT’s Industrial Liaison Program (ILP) and Startup Accelerator (STEX) for connecting them to early partners.

    One early partner was SkyUK. The company’s customer success team used Pienso to build models to understand their customer’s most common problems. Today those models are helping to process half a million customer calls a day, and the founders say they have saved the company over £7 million pounds to date by shortening the length of calls into the company’s call center.

    “The difference between democratizing AI and empowering people with AI comes down to who understands the data best — you or a doctor or a journalist or someone who works with customers every day?” Jones says. “Those are the people who should be creating the models. That’s how you get insights out of your data.”

    In 2020, just as Covid-19 outbreaks began in the U.S., government officials contacted the founders to use their tool to better understand the emerging disease. Pienso helped experts in virology and infectious disease set up machine-learning models to mine thousands of research articles about coronaviruses. Dinakar says they later learned the work helped the government identify and strengthen critical supply chains for drugs, including the popular antiviral remdesivir.

    “Those compounds were surfaced by a team that did not know deep learning but was able to use our platform,” Dinakar says.

    Building a better AI future

    Because Pienso can run on internal servers and cloud infrastructure, the founders say it offers an alternative for businesses being forced to donate their data by using services offered by other AI companies.

    “The Pienso interface is a series of web apps stitched together,” Dinakar explains. “You can think of it like an Adobe Photoshop for large language models, but in the web. You can point and import data without writing a line of code. You can refine the data, prepare it for deep learning, analyze it, give it structure if it’s not labeled or annotated, and you can walk away with fine-tuned, large language model in a matter of 25 minutes.”

    Earlier this year, Pienso announced a partnership with GraphCore, which provides a faster, more efficient computing platform for machine learning. The founders say the partnership will further lower barriers to leveraging AI by dramatically reducing latency.

    “If you’re building an interactive AI platform, users aren’t going to have a cup of coffee every time they click a button,” Dinakar says. “It needs to be fast and responsive.”

    The founders believe their solution is enabling a future where more effective AI models are developed for specific use cases by the people who are most familiar with the problems they are trying to solve.

    “No one model can do everything,” Dinakar says. “Everyone’s application is different, their needs are different, their data is different. It’s highly unlikely that one model will do everything for you. It’s about bringing a garden of models together and allowing them to collaborate with each other and orchestrating them in a way that makes sense — and the people doing that orchestration should be the people who understand the data best.” More

  • in

    Six MIT students selected as spring 2024 MIT-Pillar AI Collective Fellows

    The MIT-Pillar AI Collective has announced six fellows for the spring 2024 semester. With support from the program, the graduate students, who are in their final year of a master’s or PhD program, will conduct research in the areas of AI, machine learning, and data science with the aim of commercializing their innovations.

    Launched by MIT’s School of Engineering and Pillar VC in 2022, the MIT-Pillar AI Collective supports faculty, postdocs, and students conducting research on AI, machine learning, and data science. Supported by a gift from Pillar VC and administered by the MIT Deshpande Center for Technological Innovation, the mission of the program is to advance research toward commercialization.

    The spring 2024 MIT-Pillar AI Collective Fellows are:

    Yasmeen AlFaraj

    Yasmeen AlFaraj is a PhD candidate in chemistry whose interest is in the application of data science and machine learning to soft materials design to enable next-generation, sustainable plastics, rubber, and composite materials. More specifically, she is applying machine learning to the design of novel molecular additives to enable the low-cost manufacturing of chemically deconstructable thermosets and composites. AlFaraj’s work has led to the discovery of scalable, translatable new materials that could address thermoset plastic waste. As a Pillar Fellow, she will pursue bringing this technology to market, initially focusing on wind turbine blade manufacturing and conformal coatings. Through the Deshpande Center for Technological Innovation, AlFaraj serves as a lead for a team developing a spinout focused on recyclable versions of existing high-performance thermosets by incorporating small quantities of a degradable co-monomer. In addition, she participated in the National Science Foundation Innovation Corps program and recently graduated from the Clean Tech Open, where she focused on enhancing her business plan, analyzing potential markets, ensuring a complete IP portfolio, and connecting with potential funders. AlFaraj earned a BS in chemistry from University of California at Berkeley.

    Ruben Castro Ornelas

    Ruben Castro Ornelas is a PhD student in mechanical engineering who is passionate about the future of multipurpose robots and designing the hardware to use them with AI control solutions. Combining his expertise in programming, embedded systems, machine design, reinforcement learning, and AI, he designed a dexterous robotic hand capable of carrying out useful everyday tasks without sacrificing size, durability, complexity, or simulatability. Ornelas’s innovative design holds significant commercial potential in domestic, industrial, and health-care applications because it could be adapted to hold everything from kitchenware to delicate objects. As a Pillar Fellow, he will focus on identifying potential commercial markets, determining the optimal approach for business-to-business sales, and identifying critical advisors. Ornelas served as co-director of StartLabs, an undergraduate entrepreneurship club at MIT, where he earned an BS in mechanical engineering.

    Keeley Erhardt

    Keeley Erhardt is a PhD candidate in media arts and sciences whose research interests lie in the transformative potential of AI in network analysis, particularly for entity correlation and hidden link detection within and across domains. She has designed machine learning algorithms to identify and track temporal correlations and hidden signals in large-scale networks, uncovering online influence campaigns originating from multiple countries. She has similarly demonstrated the use of graph neural networks to identify coordinated cryptocurrency accounts by analyzing financial time series data and transaction dynamics. As a Pillar Fellow, Erhardt will pursue the potential commercial applications of her work, such as detecting fraud, propaganda, money laundering, and other covert activity in the finance, energy, and national security sectors. She has had internships at Google, Facebook, and Apple and held software engineering roles at multiple tech unicorns. Erhardt earned an MEng in electrical engineering and computer science and a BS in computer science, both from MIT.

    Vineet Jagadeesan Nair

    Vineet Jagadeesan Nair is a PhD candidate in mechanical engineering whose research focuses on modeling power grids and designing electricity markets to integrate renewables, batteries, and electric vehicles. He is broadly interested in developing computational tools to tackle climate change. As a Pillar Fellow, Nair will explore the application of machine learning and data science to power systems. Specifically, he will experiment with approaches to improve the accuracy of forecasting electricity demand and supply with high spatial-temporal resolution. In collaboration with Project Tapestry @ Google X, he is also working on fusing physics-informed machine learning with conventional numerical methods to increase the speed and accuracy of high-fidelity simulations. Nair’s work could help realize future grids with high penetrations of renewables and other clean, distributed energy resources. Outside academics, Nair is active in entrepreneurship, most recently helping to organize the 2023 MIT Global Startup Workshop in Greece. He earned an MS in computational science and engineering from MIT, an MPhil in energy technologies from Cambridge University as a Gates Scholar, and a BS in mechanical engineering and a BA in economics from University of California at Berkeley.

    Mahdi Ramadan

    Mahdi Ramadan is a PhD candidate in brain and cognitive sciences whose research interests lie at the intersection of cognitive science, computational modeling, and neural technologies. His work uses novel unsupervised methods for learning and generating interpretable representations of neural dynamics, capitalizing on recent advances in AI, specifically contrastive and geometric deep learning techniques capable of uncovering the latent dynamics underlying neural processes with high fidelity. As a Pillar Fellow, he will leverage these methods to gain a better understanding of dynamical models of muscle signals for generative motor control. By supplementing current spinal prosthetics with generative AI motor models that can streamline, speed up, and correct limb muscle activations in real time, as well as potentially using multimodal vision-language models to infer the patients’ high-level intentions, Ramadan aspires to build truly scalable, accessible, and capable commercial neuroprosthetics. Ramadan’s entrepreneurial experience includes being the co-founder of UltraNeuro, a neurotechnology startup, and co-founder of Presizely, a computer vision startup. He earned a BS in neurobiology from University of Washington.

    Rui (Raymond) Zhou

    Rui (Raymond) Zhou is a PhD candidate in mechanical engineering whose research focuses on multimodal AI for engineering design. As a Pillar Fellow, he will advance models that could enable designers to translate information in any modality or combination of modalities into comprehensive 2D and 3D designs, including parametric data, component visuals, assembly graphs, and sketches. These models could also optimize existing human designs to accomplish goals such as improving ergonomics or reducing drag coefficient. Ultimately, Zhou aims to translate his work into a software-as-a-service platform that redefines product design across various sectors, from automotive to consumer electronics. His efforts have the potential to not only accelerate the design process but also reduce costs, opening the door to unprecedented levels of customization, idea generation, and rapid prototyping. Beyond his academic pursuits, Zhou founded UrsaTech, a startup that integrates AI into education and engineering design. He earned a BS in electrical engineering and computer sciences from University of California at Berkeley. More

  • in

    How symmetry can come to the aid of machine learning

    Behrooz Tahmasebi — an MIT PhD student in the Department of Electrical Engineering and Computer Science (EECS) and an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) — was taking a mathematics course on differential equations in late 2021 when a glimmer of inspiration struck. In that class, he learned for the first time about Weyl’s law, which had been formulated 110 years earlier by the German mathematician Hermann Weyl. Tahmasebi realized it might have some relevance to the computer science problem he was then wrestling with, even though the connection appeared — on the surface — to be thin, at best. Weyl’s law, he says, provides a formula that measures the complexity of the spectral information, or data, contained within the fundamental frequencies of a drum head or guitar string.

    Tahmasebi was, at the same time, thinking about measuring the complexity of the input data to a neural network, wondering whether that complexity could be reduced by taking into account some of the symmetries inherent to the dataset. Such a reduction, in turn, could facilitate — as well as speed up — machine learning processes.

    Weyl’s law, conceived about a century before the boom in machine learning, had traditionally been applied to very different physical situations — such as those concerning the vibrations of a string or the spectrum of electromagnetic (black-body) radiation given off by a heated object. Nevertheless, Tahmasebi believed that a customized version of that law might help with the machine learning problem he was pursuing. And if the approach panned out, the payoff could be considerable.

    He spoke with his advisor, Stefanie Jegelka — an associate professor in EECS and affiliate of CSAIL and the MIT Institute for Data, Systems, and Society — who believed the idea was definitely worth looking into. As Tahmasebi saw it, Weyl’s law had to do with gauging the complexity of data, and so did this project. But Weyl’s law, in its original form, said nothing about symmetry.

    He and Jegelka have now succeeded in modifying Weyl’s law so that symmetry can be factored into the assessment of a dataset’s complexity. “To the best of my knowledge,” Tahmasebi says, “this is the first time Weyl’s law has been used to determine how machine learning can be enhanced by symmetry.”

    The paper he and Jegelka wrote earned a “Spotlight” designation when it was presented at the December 2023 conference on Neural Information Processing Systems — widely regarded as the world’s top conference on machine learning.

    This work, comments Soledad Villar, an applied mathematician at Johns Hopkins University, “shows that models that satisfy the symmetries of the problem are not only correct but also can produce predictions with smaller errors, using a small amount of training points. [This] is especially important in scientific domains, like computational chemistry, where training data can be scarce.”

    In their paper, Tahmasebi and Jegelka explored the ways in which symmetries, or so-called “invariances,” could benefit machine learning. Suppose, for example, the goal of a particular computer run is to pick out every image that contains the numeral 3. That task can be a lot easier, and go a lot quicker, if the algorithm can identify the 3 regardless of where it is placed in the box — whether it’s exactly in the center or off to the side — and whether it is pointed right-side up, upside down, or oriented at a random angle. An algorithm equipped with the latter capability can take advantage of the symmetries of translation and rotations, meaning that a 3, or any other object, is not changed in itself by altering its position or by rotating it around an arbitrary axis. It is said to be invariant to those shifts. The same logic can be applied to algorithms charged with identifying dogs or cats. A dog is a dog is a dog, one might say, irrespective of how it is embedded within an image. 

    The point of the entire exercise, the authors explain, is to exploit a dataset’s intrinsic symmetries in order to reduce the complexity of machine learning tasks. That, in turn, can lead to a reduction in the amount of data needed for learning. Concretely, the new work answers the question: How many fewer data are needed to train a machine learning model if the data contain symmetries?

    There are two ways of achieving a gain, or benefit, by capitalizing on the symmetries present. The first has to do with the size of the sample to be looked at. Let’s imagine that you are charged, for instance, with analyzing an image that has mirror symmetry — the right side being an exact replica, or mirror image, of the left. In that case, you don’t have to look at every pixel; you can get all the information you need from half of the image — a factor of two improvement. If, on the other hand, the image can be partitioned into 10 identical parts, you can get a factor of 10 improvement. This kind of boosting effect is linear.

    To take another example, imagine you are sifting through a dataset, trying to find sequences of blocks that have seven different colors — black, blue, green, purple, red, white, and yellow. Your job becomes much easier if you don’t care about the order in which the blocks are arranged. If the order mattered, there would be 5,040 different combinations to look for. But if all you care about are sequences of blocks in which all seven colors appear, then you have reduced the number of things — or sequences — you are searching for from 5,040 to just one.

    Tahmasebi and Jegelka discovered that it is possible to achieve a different kind of gain — one that is exponential — that can be reaped for symmetries that operate over many dimensions. This advantage is related to the notion that the complexity of a learning task grows exponentially with the dimensionality of the data space. Making use of a multidimensional symmetry can therefore yield a disproportionately large return. “This is a new contribution that is basically telling us that symmetries of higher dimension are more important because they can give us an exponential gain,” Tahmasebi says. 

    The NeurIPS 2023 paper that he wrote with Jegelka contains two theorems that were proved mathematically. “The first theorem shows that an improvement in sample complexity is achievable with the general algorithm we provide,” Tahmasebi says. The second theorem complements the first, he added, “showing that this is the best possible gain you can get; nothing else is achievable.”

    He and Jegelka have provided a formula that predicts the gain one can obtain from a particular symmetry in a given application. A virtue of this formula is its generality, Tahmasebi notes. “It works for any symmetry and any input space.” It works not only for symmetries that are known today, but it could also be applied in the future to symmetries that are yet to be discovered. The latter prospect is not too farfetched to consider, given that the search for new symmetries has long been a major thrust in physics. That suggests that, as more symmetries are found, the methodology introduced by Tahmasebi and Jegelka should only get better over time.

    According to Haggai Maron, a computer scientist at Technion (the Israel Institute of Technology) and NVIDIA who was not involved in the work, the approach presented in the paper “diverges substantially from related previous works, adopting a geometric perspective and employing tools from differential geometry. This theoretical contribution lends mathematical support to the emerging subfield of ‘Geometric Deep Learning,’ which has applications in graph learning, 3D data, and more. The paper helps establish a theoretical basis to guide further developments in this rapidly expanding research area.” More

  • in

    Generating the policy of tomorrow

    As first-year students in the Social and Engineering Systems (SES) doctoral program within the MIT Institute for Data, Systems, and Society (IDSS), Eric Liu and Ashely Peake share an interest in investigating housing inequality issues.

    They also share a desire to dive head-first into their research.

    “In the first year of your PhD, you’re taking classes and still getting adjusted, but we came in very eager to start doing research,” Liu says.

    Liu, Peake, and many others found an opportunity to do hands-on research on real-world problems at the MIT Policy Hackathon, an initiative organized by students in IDSS, including the Technology and Policy Program (TPP). The weekend-long, interdisciplinary event — now in its sixth year — continues to gather hundreds of participants from around the globe to explore potential solutions to some of society’s greatest challenges.

    This year’s theme, “Hack-GPT: Generating the Policy of Tomorrow,” sought to capitalize on the popularity of generative AI (like the chatbot ChatGPT) and the ways it is changing how we think about technical and policy-based challenges, according to Dansil Green, a second-year TPP master’s student and co-chair of the event.

    “We encouraged our teams to utilize and cite these tools, thinking about the implications that generative AI tools have on their different challenge categories,” Green says.

    After 2022’s hybrid event, this year’s organizers pivoted back to a virtual-only approach, allowing them to increase the overall number of participants in addition to increasing the number of teams per challenge by 20 percent.

    “Virtual allows you to reach more people — we had a high number of international participants this year — and it helps reduce some of the costs,” Green says. “I think going forward we are going to try and switch back and forth between virtual and in-person because there are different benefits to each.”

    “When the magic hits”

    Liu and Peake competed in the housing challenge category, where they could gain research experience in their actual field of study. 

    “While I am doing housing research, I haven’t necessarily had a lot of opportunities to work with actual housing data before,” says Peake, who recently joined the SES doctoral program after completing an undergraduate degree in applied math last year. “It was a really good experience to get involved with an actual data problem, working closer with Eric, who’s also in my lab group, in addition to meeting people from MIT and around the world who are interested in tackling similar questions and seeing how they think about things differently.”

    Joined by Adrian Butterton, a Boston-based paralegal, as well as Hudson Yuen and Ian Chan, two software engineers from Canada, Liu and Peake formed what would end up being the winning team in their category: “Team Ctrl+Alt+Defeat.” They quickly began organizing a plan to address the eviction crisis in the United States.

    “I think we were kind of surprised by the scope of the question,” Peake laughs. “In the end, I think having such a large scope motivated us to think about it in a more realistic kind of way — how could we come up with a solution that was adaptable and therefore could be replicated to tackle different kinds of problems.”

    Watching the challenge on the livestream together on campus, Liu says they immediately went to work, and could not believe how quickly things came together.

    “We got our challenge description in the evening, came out to the purple common area in the IDSS building and literally it took maybe an hour and we drafted up the entire project from start to finish,” Liu says. “Then our software engineer partners had a dashboard built by 1 a.m. — I feel like the hackathon really promotes that really fast dynamic work stream.”

    “People always talk about the grind or applying for funding — but when that magic hits, it just reminds you of the part of research that people don’t talk about, and it was really a great experience to have,” Liu adds.

    A fresh perspective

    “We’ve organized hackathons internally at our company and they are great for fostering innovation and creativity,” says Letizia Bordoli, senior AI product manager at Veridos, a German-based identity solutions company that provided this year’s challenge in Data Systems for Human Rights. “It is a great opportunity to connect with talented individuals and explore new ideas and solutions that we might not have thought about.”

    The challenge provided by Veridos was focused on finding innovative solutions to universal birth registration, something Bordoli says only benefited from the fact that the hackathon participants were from all over the world.

    “Many had local and firsthand knowledge about certain realities and challenges [posed by the lack of] birth registration,” Bordoli says. “It brings fresh perspectives to existing challenges, and it gave us an energy boost to try to bring innovative solutions that we may not have considered before.”

    New frontiers

    Alongside the housing and data systems for human rights challenges was a challenge in health, as well as a first-time opportunity to tackle an aerospace challenge in the area of space for environmental justice.

    “Space can be a very hard challenge category to do data-wise since a lot of data is proprietary, so this really developed over the last few months with us having to think about how we could do more with open-source data,” Green explains. “But I am glad we went the environmental route because it opened the challenge up to not only space enthusiasts, but also environment and climate people.”

    One of the participants to tackle this new challenge category was Yassine Elhallaoui, a system test engineer from Norway who specializes in AI solutions and has 16 years of experience working in the oil and gas fields. Elhallaoui was a member of Team EcoEquity, which proposed an increase in policies supporting the use of satellite data to ensure proper evaluation and increase water resiliency for vulnerable communities.

    “The hackathons I have participated in in the past were more technical,” Elhallaoui says. “Starting with [MIT Science and Technology Policy Institute Director Kristen Kulinowski’s] workshop about policy writers and the solutions they came up with, and the analysis they had to do … it really changed my perspective on what a hackathon can do.”

    “A policy hackathon is something that can make real changes in the world,” she adds. More

  • in

    New hope for early pancreatic cancer intervention via AI-based risk prediction

    The first documented case of pancreatic cancer dates back to the 18th century. Since then, researchers have undertaken a protracted and challenging odyssey to understand the elusive and deadly disease. To date, there is no better cancer treatment than early intervention. Unfortunately, the pancreas, nestled deep within the abdomen, is particularly elusive for early detection. 

    MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) scientists, alongside Limor Appelbaum, a staff scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), were eager to better identify potential high-risk patients. They set out to develop two machine-learning models for early detection of pancreatic ductal adenocarcinoma (PDAC), the most common form of the cancer. To access a broad and diverse database, the team synced up with a federated network company, using electronic health record data from various institutions across the United States. This vast pool of data helped ensure the models’ reliability and generalizability, making them applicable across a wide range of populations, geographical locations, and demographic groups.

    The two models — the “PRISM” neural network, and the logistic regression model (a statistical technique for probability), outperformed current methods. The team’s comparison showed that while standard screening criteria identify about 10 percent of PDAC cases using a five-times higher relative risk threshold, Prism can detect 35 percent of PDAC cases at this same threshold. 

    Using AI to detect cancer risk is not a new phenomena — algorithms analyze mammograms, CT scans for lung cancer, and assist in the analysis of Pap smear tests and HPV testing, to name a few applications. “The PRISM models stand out for their development and validation on an extensive database of over 5 million patients, surpassing the scale of most prior research in the field,” says Kai Jia, an MIT PhD student in electrical engineering and computer science (EECS), MIT CSAIL affiliate, and first author on an open-access paper in eBioMedicine outlining the new work. “The model uses routine clinical and lab data to make its predictions, and the diversity of the U.S. population is a significant advancement over other PDAC models, which are usually confined to specific geographic regions, like a few health-care centers in the U.S. Additionally, using a unique regularization technique in the training process enhanced the models’ generalizability and interpretability.” 

    “This report outlines a powerful approach to use big data and artificial intelligence algorithms to refine our approach to identifying risk profiles for cancer,” says David Avigan, a Harvard Medical School professor and the cancer center director and chief of hematology and hematologic malignancies at BIDMC, who was not involved in the study. “This approach may lead to novel strategies to identify patients with high risk for malignancy that may benefit from focused screening with the potential for early intervention.” 

    Prismatic perspectives

    The journey toward the development of PRISM began over six years ago, fueled by firsthand experiences with the limitations of current diagnostic practices. “Approximately 80-85 percent of pancreatic cancer patients are diagnosed at advanced stages, where cure is no longer an option,” says senior author Appelbaum, who is also a Harvard Medical School instructor as well as radiation oncologist. “This clinical frustration sparked the idea to delve into the wealth of data available in electronic health records (EHRs).”The CSAIL group’s close collaboration with Appelbaum made it possible to understand the combined medical and machine learning aspects of the problem better, eventually leading to a much more accurate and transparent model. “The hypothesis was that these records contained hidden clues — subtle signs and symptoms that could act as early warning signals of pancreatic cancer,” she adds. “This guided our use of federated EHR networks in developing these models, for a scalable approach for deploying risk prediction tools in health care.”Both PrismNN and PrismLR models analyze EHR data, including patient demographics, diagnoses, medications, and lab results, to assess PDAC risk. PrismNN uses artificial neural networks to detect intricate patterns in data features like age, medical history, and lab results, yielding a risk score for PDAC likelihood. PrismLR uses logistic regression for a simpler analysis, generating a probability score of PDAC based on these features. Together, the models offer a thorough evaluation of different approaches in predicting PDAC risk from the same EHR data.

    One paramount point for gaining the trust of physicians, the team notes, is better understanding how the models work, known in the field as interpretability. The scientists pointed out that while logistic regression models are inherently easier to interpret, recent advancements have made deep neural networks somewhat more transparent. This helped the team to refine the thousands of potentially predictive features derived from EHR of a single patient to approximately 85 critical indicators. These indicators, which include patient age, diabetes diagnosis, and an increased frequency of visits to physicians, are automatically discovered by the model but match physicians’ understanding of risk factors associated with pancreatic cancer. 

    The path forward

    Despite the promise of the PRISM models, as with all research, some parts are still a work in progress. U.S. data alone are the current diet for the models, necessitating testing and adaptation for global use. The path forward, the team notes, includes expanding the model’s applicability to international datasets and integrating additional biomarkers for more refined risk assessment.

    “A subsequent aim for us is to facilitate the models’ implementation in routine health care settings. The vision is to have these models function seamlessly in the background of health care systems, automatically analyzing patient data and alerting physicians to high-risk cases without adding to their workload,” says Jia. “A machine-learning model integrated with the EHR system could empower physicians with early alerts for high-risk patients, potentially enabling interventions well before symptoms manifest. We are eager to deploy our techniques in the real world to help all individuals enjoy longer, healthier lives.” 

    Jia wrote the paper alongside Applebaum and MIT EECS Professor and CSAIL Principal Investigator Martin Rinard, who are both senior authors of the paper. Researchers on the paper were supported during their time at MIT CSAIL, in part, by the Defense Advanced Research Projects Agency, Boeing, the National Science Foundation, and Aarno Labs. TriNetX provided resources for the project, and the Prevent Cancer Foundation also supported the team. More

  • in

    Multiple AI models help robots execute complex plans more transparently

    Your daily to-do list is likely pretty straightforward: wash the dishes, buy groceries, and other minutiae. It’s unlikely you wrote out “pick up the first dirty dish,” or “wash that plate with a sponge,” because each of these miniature steps within the chore feels intuitive. While we can routinely complete each step without much thought, a robot requires a complex plan that involves more detailed outlines.

    MIT’s Improbable AI Lab, a group within the Computer Science and Artificial Intelligence Laboratory (CSAIL), has offered these machines a helping hand with a new multimodal framework: Compositional Foundation Models for Hierarchical Planning (HiP), which develops detailed, feasible plans with the expertise of three different foundation models. Like OpenAI’s GPT-4, the foundation model that ChatGPT and Bing Chat were built upon, these foundation models are trained on massive quantities of data for applications like generating images, translating text, and robotics.Unlike RT2 and other multimodal models that are trained on paired vision, language, and action data, HiP uses three different foundation models each trained on different data modalities. Each foundation model captures a different part of the decision-making process and then works together when it’s time to make decisions. HiP removes the need for access to paired vision, language, and action data, which is difficult to obtain. HiP also makes the reasoning process more transparent.

    What’s considered a daily chore for a human can be a robot’s “long-horizon goal” — an overarching objective that involves completing many smaller steps first — requiring sufficient data to plan, understand, and execute objectives. While computer vision researchers have attempted to build monolithic foundation models for this problem, pairing language, visual, and action data is expensive. Instead, HiP represents a different, multimodal recipe: a trio that cheaply incorporates linguistic, physical, and environmental intelligence into a robot.

    “Foundation models do not have to be monolithic,” says NVIDIA AI researcher Jim Fan, who was not involved in the paper. “This work decomposes the complex task of embodied agent planning into three constituent models: a language reasoner, a visual world model, and an action planner. It makes a difficult decision-making problem more tractable and transparent.”The team believes that their system could help these machines accomplish household chores, such as putting away a book or placing a bowl in the dishwasher. Additionally, HiP could assist with multistep construction and manufacturing tasks, like stacking and placing different materials in specific sequences.Evaluating HiP

    The CSAIL team tested HiP’s acuity on three manipulation tasks, outperforming comparable frameworks. The system reasoned by developing intelligent plans that adapt to new information.

    First, the researchers requested that it stack different-colored blocks on each other and then place others nearby. The catch: Some of the correct colors weren’t present, so the robot had to place white blocks in a color bowl to paint them. HiP often adjusted to these changes accurately, especially compared to state-of-the-art task planning systems like Transformer BC and Action Diffuser, by adjusting its plans to stack and place each square as needed.

    Another test: arranging objects such as candy and a hammer in a brown box while ignoring other items. Some of the objects it needed to move were dirty, so HiP adjusted its plans to place them in a cleaning box, and then into the brown container. In a third demonstration, the bot was able to ignore unnecessary objects to complete kitchen sub-goals such as opening a microwave, clearing a kettle out of the way, and turning on a light. Some of the prompted steps had already been completed, so the robot adapted by skipping those directions.

    A three-pronged hierarchy

    HiP’s three-pronged planning process operates as a hierarchy, with the ability to pre-train each of its components on different sets of data, including information outside of robotics. At the bottom of that order is a large language model (LLM), which starts to ideate by capturing all the symbolic information needed and developing an abstract task plan. Applying the common sense knowledge it finds on the internet, the model breaks its objective into sub-goals. For example, “making a cup of tea” turns into “filling a pot with water,” “boiling the pot,” and the subsequent actions required.

    “All we want to do is take existing pre-trained models and have them successfully interface with each other,” says Anurag Ajay, a PhD student in the MIT Department of Electrical Engineering and Computer Science (EECS) and a CSAIL affiliate. “Instead of pushing for one model to do everything, we combine multiple ones that leverage different modalities of internet data. When used in tandem, they help with robotic decision-making and can potentially aid with tasks in homes, factories, and construction sites.”

    These models also need some form of “eyes” to understand the environment they’re operating in and correctly execute each sub-goal. The team used a large video diffusion model to augment the initial planning completed by the LLM, which collects geometric and physical information about the world from footage on the internet. In turn, the video model generates an observation trajectory plan, refining the LLM’s outline to incorporate new physical knowledge.This process, known as iterative refinement, allows HiP to reason about its ideas, taking in feedback at each stage to generate a more practical outline. The flow of feedback is similar to writing an article, where an author may send their draft to an editor, and with those revisions incorporated in, the publisher reviews for any last changes and finalizes.

    In this case, the top of the hierarchy is an egocentric action model, or a sequence of first-person images that infer which actions should take place based on its surroundings. During this stage, the observation plan from the video model is mapped over the space visible to the robot, helping the machine decide how to execute each task within the long-horizon goal. If a robot uses HiP to make tea, this means it will have mapped out exactly where the pot, sink, and other key visual elements are, and begin completing each sub-goal.Still, the multimodal work is limited by the lack of high-quality video foundation models. Once available, they could interface with HiP’s small-scale video models to further enhance visual sequence prediction and robot action generation. A higher-quality version would also reduce the current data requirements of the video models.That being said, the CSAIL team’s approach only used a tiny bit of data overall. Moreover, HiP was cheap to train and demonstrated the potential of using readily available foundation models to complete long-horizon tasks. “What Anurag has demonstrated is proof-of-concept of how we can take models trained on separate tasks and data modalities and combine them into models for robotic planning. In the future, HiP could be augmented with pre-trained models that can process touch and sound to make better plans,” says senior author Pulkit Agrawal, MIT assistant professor in EECS and director of the Improbable AI Lab. The group is also considering applying HiP to solving real-world long-horizon tasks in robotics.Ajay and Agrawal are lead authors on a paper describing the work. They are joined by MIT professors and CSAIL principal investigators Tommi Jaakkola, Joshua Tenenbaum, and Leslie Pack Kaelbling; CSAIL research affiliate and MIT-IBM AI Lab research manager Akash Srivastava; graduate students Seungwook Han and Yilun Du ’19; former postdoc Abhishek Gupta, who is now assistant professor at University of Washington; and former graduate student Shuang Li PhD ’23.

    The team’s work was supported, in part, by the National Science Foundation, the U.S. Defense Advanced Research Projects Agency, the U.S. Army Research Office, the U.S. Office of Naval Research Multidisciplinary University Research Initiatives, and the MIT-IBM Watson AI Lab. Their findings were presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS). More

  • in

    Technique could efficiently solve partial differential equations for numerous applications

    In fields such as physics and engineering, partial differential equations (PDEs) are used to model complex physical processes to generate insight into how some of the most complicated physical and natural systems in the world function.

    To solve these difficult equations, researchers use high-fidelity numerical solvers, which can be very time-consuming and computationally expensive to run. The current simplified alternative, data-driven surrogate models, compute the goal property of a solution to PDEs rather than the whole solution. Those are trained on a set of data that has been generated by the high-fidelity solver, to predict the output of the PDEs for new inputs. This is data-intensive and expensive because complex physical systems require a large number of simulations to generate enough data. 

    In a new paper, “Physics-enhanced deep surrogates for partial differential equations,” published in December in Nature Machine Intelligence, a new method is proposed for developing data-driven surrogate models for complex physical systems in such fields as mechanics, optics, thermal transport, fluid dynamics, physical chemistry, and climate models.

    The paper was authored by MIT’s professor of applied mathematics Steven G. Johnson along with Payel Das and Youssef Mroueh of the MIT-IBM Watson AI Lab and IBM Research; Chris Rackauckas of Julia Lab; and Raphaël Pestourie, a former MIT postdoc who is now at Georgia Tech. The authors call their method “physics-enhanced deep surrogate” (PEDS), which combines a low-fidelity, explainable physics simulator with a neural network generator. The neural network generator is trained end-to-end to match the output of the high-fidelity numerical solver.

    “My aspiration is to replace the inefficient process of trial and error with systematic, computer-aided simulation and optimization,” says Pestourie. “Recent breakthroughs in AI like the large language model of ChatGPT rely on hundreds of billions of parameters and require vast amounts of resources to train and evaluate. In contrast, PEDS is affordable to all because it is incredibly efficient in computing resources and has a very low barrier in terms of infrastructure needed to use it.”

    In the article, they show that PEDS surrogates can be up to three times more accurate than an ensemble of feedforward neural networks with limited data (approximately 1,000 training points), and reduce the training data needed by at least a factor of 100 to achieve a target error of 5 percent. Developed using the MIT-designed Julia programming language, this scientific machine-learning method is thus efficient in both computing and data.

    The authors also report that PEDS provides a general, data-driven strategy to bridge the gap between a vast array of simplified physical models with corresponding brute-force numerical solvers modeling complex systems. This technique offers accuracy, speed, data efficiency, and physical insights into the process.

    Says Pestourie, “Since the 2000s, as computing capabilities improved, the trend of scientific models has been to increase the number of parameters to fit the data better, sometimes at the cost of a lower predictive accuracy. PEDS does the opposite by choosing its parameters smartly. It leverages the technology of automatic differentiation to train a neural network that makes a model with few parameters accurate.”

    “The main challenge that prevents surrogate models from being used more widely in engineering is the curse of dimensionality — the fact that the needed data to train a model increases exponentially with the number of model variables,” says Pestourie. “PEDS reduces this curse by incorporating information from the data and from the field knowledge in the form of a low-fidelity model solver.”

    The researchers say that PEDS has the potential to revive a whole body of the pre-2000 literature dedicated to minimal models — intuitive models that PEDS could make more accurate while also being predictive for surrogate model applications.

    “The application of the PEDS framework is beyond what we showed in this study,” says Das. “Complex physical systems governed by PDEs are ubiquitous, from climate modeling to seismic modeling and beyond. Our physics-inspired fast and explainable surrogate models will be of great use in those applications, and play a complementary role to other emerging techniques, like foundation models.”

    The research was supported by the MIT-IBM Watson AI Lab and the U.S. Army Research Office through the Institute for Soldier Nanotechnologies.  More

  • in

    Leveraging language to understand machines

    Natural language conveys ideas, actions, information, and intent through context and syntax; further, there are volumes of it contained in databases. This makes it an excellent source of data to train machine-learning systems on. Two master’s of engineering students in the 6A MEng Thesis Program at MIT, Irene Terpstra ’23 and Rujul Gandhi ’22, are working with mentors in the MIT-IBM Watson AI Lab to use this power of natural language to build AI systems.

    As computing is becoming more advanced, researchers are looking to improve the hardware that they run on; this means innovating to create new computer chips. And, since there is literature already available on modifications that can be made to achieve certain parameters and performance, Terpstra and her mentors and advisors Anantha Chandrakasan, MIT School of Engineering dean and the Vannevar Bush Professor of Electrical Engineering and Computer Science, and IBM’s researcher Xin Zhang, are developing an AI algorithm that assists in chip design.

    “I’m creating a workflow to systematically analyze how these language models can help the circuit design process. What reasoning powers do they have, and how can it be integrated into the chip design process?” says Terpstra. “And then on the other side, if that proves to be useful enough, [we’ll] see if they can automatically design the chips themselves, attaching it to a reinforcement learning algorithm.”

    To do this, Terpstra’s team is creating an AI system that can iterate on different designs. It means experimenting with various pre-trained large language models (like ChatGPT, Llama 2, and Bard), using an open-source circuit simulator language called NGspice, which has the parameters of the chip in code form, and a reinforcement learning algorithm. With text prompts, researchers will be able to query how the physical chip should be modified to achieve a certain goal in the language model and produced guidance for adjustments. This is then transferred into a reinforcement learning algorithm that updates the circuit design and outputs new physical parameters of the chip.

    “The final goal would be to combine the reasoning powers and the knowledge base that is baked into these large language models and combine that with the optimization power of the reinforcement learning algorithms and have that design the chip itself,” says Terpstra.

    Rujul Gandhi works with the raw language itself. As an undergraduate at MIT, Gandhi explored linguistics and computer sciences, putting them together in her MEng work. “I’ve been interested in communication, both between just humans and between humans and computers,” Gandhi says.

    Robots or other interactive AI systems are one area where communication needs to be understood by both humans and machines. Researchers often write instructions for robots using formal logic. This helps ensure that commands are being followed safely and as intended, but formal logic can be difficult for users to understand, while natural language comes easily. To ensure this smooth communication, Gandhi and her advisors Yang Zhang of IBM and MIT assistant professor Chuchu Fan are building a parser that converts natural language instructions into a machine-friendly form. Leveraging the linguistic structure encoded by the pre-trained encoder-decoder model T5, and a dataset of annotated, basic English commands for performing certain tasks, Gandhi’s system identifies the smallest logical units, or atomic propositions, which are present in a given instruction.

    “Once you’ve given your instruction, the model identifies all the smaller sub-tasks you want it to carry out,” Gandhi says. “Then, using a large language model, each sub-task can be compared against the available actions and objects in the robot’s world, and if any sub-task can’t be carried out because a certain object is not recognized, or an action is not possible, the system can stop right there to ask the user for help.”

    This approach of breaking instructions into sub-tasks also allows her system to understand logical dependencies expressed in English, like, “do task X until event Y happens.” Gandhi uses a dataset of step-by-step instructions across robot task domains like navigation and manipulation, with a focus on household tasks. Using data that are written just the way humans would talk to each other has many advantages, she says, because it means a user can be more flexible about how they phrase their instructions.

    Another of Gandhi’s projects involves developing speech models. In the context of speech recognition, some languages are considered “low resource” since they might not have a lot of transcribed speech available, or might not have a written form at all. “One of the reasons I applied to this internship at the MIT-IBM Watson AI Lab was an interest in language processing for low-resource languages,” she says. “A lot of language models today are very data-driven, and when it’s not that easy to acquire all of that data, that’s when you need to use the limited data efficiently.” 

    Speech is just a stream of sound waves, but humans having a conversation can easily figure out where words and thoughts start and end. In speech processing, both humans and language models use their existing vocabulary to recognize word boundaries and understand the meaning. In low- or no-resource languages, a written vocabulary might not exist at all, so researchers can’t provide one to the model. Instead, the model can make note of what sound sequences occur together more frequently than others, and infer that those might be individual words or concepts. In Gandhi’s research group, these inferred words are then collected into a pseudo-vocabulary that serves as a labeling method for the low-resource language, creating labeled data for further applications.

    The applications for language technology are “pretty much everywhere,” Gandhi says. “You could imagine people being able to interact with software and devices in their native language, their native dialect. You could imagine improving all the voice assistants that we use. You could imagine it being used for translation or interpretation.” More