More stories

  • in

    Novo Nordisk to support MIT postdocs working at the intersection of AI and life sciences

    MIT’s School of Engineering and global health care company Novo Nordisk has announced the launch of a multi-year program to support postdoctoral fellows conducting research at the intersection of artificial intelligence and data science with life sciences. The MIT-Novo Nordisk Artificial Intelligence Postdoctoral Fellows Program will welcome its first cohort of up to 10 postdocs for a two-year term this fall. The program will provide up to $10 million for an annual cohort of up to 10 postdoc for two-year terms.

    “The research being conducted at the intersection of AI and life sciences has the potential to transform health care as we know it,” says Anantha Chandrakasan, dean of the School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science. “I am thrilled that the MIT-Novo Nordisk Program will support early-career researchers who work in this space.”

    The launch of the MIT-Novo Nordisk Program coincides with the 100th anniversary celebration of Novo Nordisk. The company was founded in 1923 and treated its first patients with insulin, which had recently been discovered in March of that year.

    “The use of AI in the health care industry presents a massive opportunity to improve the lives of people living with chronic diseases,” says Thomas Senderovitz, senior vice president for data science at Novo Nordisk. “Novo Nordisk is committed to the development of new, innovative solutions, and MIT hosts some of the most outstanding researchers in the field. We are therefore excited to support postdocs working on the cutting edge of AI and life sciences.”

    The MIT-Novo Nordisk Program will support postdocs advancing the use of AI in life science and health. Postdocs will join an annual cohort that participates in frequent events and gatherings. The cohort will meet regularly to exchange ideas about their work and discuss ways to amplify their impact.

    “We are excited to welcome postdocs working on AI, data science, health, and life sciences — research areas of strategic importance across MIT,” adds Chandrakasan.

    A central focus of the program will be offering postdocs professional development and mentorship opportunities. Fellows will be invited to entrepreneurship-focused workshops that enable them to learn from company founders, venture capitalists, and other entrepreneurial leaders. Fellows will also have the opportunity to receive mentorship from experts in life sciences and data science.

    “MIT is always exploring opportunities to innovate and enhance the postdoctoral experience,” adds MIT Provost Cynthia Barnhart. “The MIT-Novo Nordisk Program has been thoughtfully designed to introduce fellows to a wealth of experiences, skill sets, and perspectives that support their professional growth while prioritizing a sense of community with their cohort.”

    Angela Belcher, head of the Department of Biological Engineering, the James Mason Crafts Professor of Biological Engineering and Materials Science, and member of the Koch Institute for Integrative Cancer Research, and Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science, will serve as co-faculty leads for the program.

    The new program complements a separate postdoctoral fellowship program at MIT supported by the Novo Nordisk Foundation that focuses on enabling interdisciplinary research. More

  • in

    Q&A: Are far-reaching fires the new normal?

    Where there’s smoke, there is fire. But with climate change, larger and longer-burning wildfires are sending smoke farther from their source, often to places that are unaccustomed to the exposure. That’s been the case this week, as smoke continues to drift south from massive wildfires in Canada, prompting warnings of hazardous air quality, and poor visibility in states across New England, the mid-Atlantic, and the Midwest.

    As wildfire season is just getting going, many may be wondering: Are the air-polluting effects of wildfires a new normal?

    MIT News spoke with Professor Colette Heald of the Department of Civil and Environmental Engineering and the Department of Earth, Atmospheric and Planetary Sciences, and Professor Noelle Selin of the Institute for Data, Systems and Society and the Department of Earth, Atmospheric and Planetary Sciences. Heald specializes in atmospheric chemistry and has studied the climate and health effects associated with recent wildfires, while Selin works with atmospheric models to track air pollutants around the world, which she uses to inform policy decisions on mitigating  pollution and climate change. The researchers shared some of their insights on the immediate impacts of Canada’s current wildfires and what downwind regions may expect in the coming months, as the wildfire season stretches into summer.  

    Q: What role has climate change and human activity played in the wildfires we’ve seen so far this year?

    Heald: Unusually warm and dry conditions have dramatically increased fire susceptibility in Canada this year. Human-induced climate change makes such dry and warm conditions more likely. Smoke from fires in Alberta and Nova Scotia in May, and Quebec in early June, has led to some of the worst air quality conditions measured locally in Canada. This same smoke has been transported into the United States and degraded air quality here as well. Local officials have determined that ignitions have been associated with lightning strikes, but human activity has also played a role igniting some of the fires in Alberta.

    Q: What can we expect for the coming months in terms of the pattern of wildfires and their associated air pollution across the United States?

    Heald: The Government of Canada is projecting higher-than-normal fire activity throughout the 2023 fire season. Fire susceptibility will continue to respond to changing weather conditions, and whether the U.S. is impacted will depend on the winds and how air is transported across those regions. So far, the fire season in the United States has been below average, but fire risk is expected to increase modestly through the summer, so we may see local smoke influences as well.

    Q: How has air pollution from wildfires affected human health in the U.S. this year so far?

    Selin: The pollutant of most concern in wildfire smoke is fine particulate matter (PM2.5) – fine particles in the atmosphere that can be inhaled deep into the lungs, causing health damages. Exposure to PM2.5 causes respiratory and cardiovascular damage, including heart attacks and premature deaths. It can also cause symptoms like coughing and difficulty breathing. In New England this week, people have been breathing much higher concentrations of PM2.5 than usual. People who are particularly vulnerable to the effects are likely experiencing more severe impacts, such as older people and people with underlying conditions. But PM2.5 affects everyone. While the number and impact of wildfires varies from year to year, the associated air pollution from them generally lead to tens of thousands of premature deaths in the U.S. overall annually. There is also some evidence that PM2.5 from fires could be particularly damaging to health.

    While we in New England usually have relatively lower levels of pollution, it’s important also to note that some cities around the globe experience very high PM2.5 on a regular basis, not only from wildfires, but other sources such as power plants and industry. So, while we’re feeling the effects over the past few days, we should remember the broader importance of reducing PM2.5 levels overall for human health everywhere.

    Q: While firefighters battle fires directly this wildfire season, what can we do to reduce the effects of associated air pollution? And what can we do in the long-term, to prevent or reduce wildfire impacts?

    Selin: In the short term, protecting yourself from the impacts of PM2.5 is important. Limiting time outdoors, avoiding outdoor exercise, and wearing a high-quality mask are some strategies that can minimize exposure. Air filters can help reduce the concentrations of particles in indoor air. Taking measures to avoid exposure is particularly important for vulnerable groups. It’s also important to note that these strategies aren’t equally possible for everyone (for example, people who work outside) — stressing the importance of developing new strategies to address the underlying causes of increasing wildfires.

    Over the long term, mitigating climate change is important — because warm and dry conditions lead to wildfires, warming increases fire risk. Preventing the fires that are ignited by people or human activities can help.  Another way that damages can be mitigated in the longer term is by exploring land management strategies that could help manage fire intensity. More

  • in

    Bringing the social and ethical responsibilities of computing to the forefront

    There has been a remarkable surge in the use of algorithms and artificial intelligence to address a wide range of problems and challenges. While their adoption, particularly with the rise of AI, is reshaping nearly every industry sector, discipline, and area of research, such innovations often expose unexpected consequences that involve new norms, new expectations, and new rules and laws.

    To facilitate deeper understanding, the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Schwarzman College of Computing, recently brought together social scientists and humanists with computer scientists, engineers, and other computing faculty for an exploration of the ways in which the broad applicability of algorithms and AI has presented both opportunities and challenges in many aspects of society.

    “The very nature of our reality is changing. AI has the ability to do things that until recently were solely the realm of human intelligence — things that can challenge our understanding of what it means to be human,” remarked Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, in his opening address at the inaugural SERC Symposium. “This poses philosophical, conceptual, and practical questions on a scale not experienced since the start of the Enlightenment. In the face of such profound change, we need new conceptual maps for navigating the change.”

    The symposium offered a glimpse into the vision and activities of SERC in both research and education. “We believe our responsibility with SERC is to educate and equip our students and enable our faculty to contribute to responsible technology development and deployment,” said Georgia Perakis, the William F. Pounds Professor of Management in the MIT Sloan School of Management, co-associate dean of SERC, and the lead organizer of the symposium. “We’re drawing from the many strengths and diversity of disciplines across MIT and beyond and bringing them together to gain multiple viewpoints.”

    Through a succession of panels and sessions, the symposium delved into a variety of topics related to the societal and ethical dimensions of computing. In addition, 37 undergraduate and graduate students from a range of majors, including urban studies and planning, political science, mathematics, biology, electrical engineering and computer science, and brain and cognitive sciences, participated in a poster session to exhibit their research in this space, covering such topics as quantum ethics, AI collusion in storage markets, computing waste, and empowering users on social platforms for better content credibility.

    Showcasing a diversity of work

    In three sessions devoted to themes of beneficent and fair computing, equitable and personalized health, and algorithms and humans, the SERC Symposium showcased work by 12 faculty members across these domains.

    One such project from a multidisciplinary team of archaeologists, architects, digital artists, and computational social scientists aimed to preserve endangered heritage sites in Afghanistan with digital twins. The project team produced highly detailed interrogable 3D models of the heritage sites, in addition to extended reality and virtual reality experiences, as learning resources for audiences that cannot access these sites.

    In a project for the United Network for Organ Sharing, researchers showed how they used applied analytics to optimize various facets of an organ allocation system in the United States that is currently undergoing a major overhaul in order to make it more efficient, equitable, and inclusive for different racial, age, and gender groups, among others.

    Another talk discussed an area that has not yet received adequate public attention: the broader implications for equity that biased sensor data holds for the next generation of models in computing and health care.

    A talk on bias in algorithms considered both human bias and algorithmic bias, and the potential for improving results by taking into account differences in the nature of the two kinds of bias.

    Other highlighted research included the interaction between online platforms and human psychology; a study on whether decision-makers make systemic prediction mistakes on the available information; and an illustration of how advanced analytics and computation can be leveraged to inform supply chain management, operations, and regulatory work in the food and pharmaceutical industries.

    Improving the algorithms of tomorrow

    “Algorithms are, without question, impacting every aspect of our lives,” said Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science, in kicking off a panel she moderated on the implications of data and algorithms.

    “Whether it’s in the context of social media, online commerce, automated tasks, and now a much wider range of creative interactions with the advent of generative AI tools and large language models, there’s little doubt that much more is to come,” Ozdaglar said. “While the promise is evident to all of us, there’s a lot to be concerned as well. This is very much time for imaginative thinking and careful deliberation to improve the algorithms of tomorrow.”

    Turning to the panel, Ozdaglar asked experts from computing, social science, and data science for insights on how to understand what is to come and shape it to enrich outcomes for the majority of humanity.

    Sarah Williams, associate professor of technology and urban planning at MIT, emphasized the critical importance of comprehending the process of how datasets are assembled, as data are the foundation for all models. She also stressed the need for research to address the potential implication of biases in algorithms that often find their way in through their creators and the data used in their development. “It’s up to us to think about our own ethical solutions to these problems,” she said. “Just as it’s important to progress with the technology, we need to start the field of looking at these questions of what biases are in the algorithms? What biases are in the data, or in that data’s journey?”

    Shifting focus to generative models and whether the development and use of these technologies should be regulated, the panelists — which also included MIT’s Srini Devadas, professor of electrical engineering and computer science, John Horton, professor of information technology, and Simon Johnson, professor of entrepreneurship — all concurred that regulating open-source algorithms, which are publicly accessible, would be difficult given that regulators are still catching up and struggling to even set guardrails for technology that is now 20 years old.

    Returning to the question of how to effectively regulate the use of these technologies, Johnson proposed a progressive corporate tax system as a potential solution. He recommends basing companies’ tax payments on their profits, especially for large corporations whose massive earnings go largely untaxed due to offshore banking. By doing so, Johnson said that this approach can serve as a regulatory mechanism that discourages companies from trying to “own the entire world” by imposing disincentives.

    The role of ethics in computing education

    As computing continues to advance with no signs of slowing down, it is critical to educate students to be intentional in the social impact of the technologies they will be developing and deploying into the world. But can one actually be taught such things? If so, how?

    Caspar Hare, professor of philosophy at MIT and co-associate dean of SERC, posed this looming question to faculty on a panel he moderated on the role of ethics in computing education. All experienced in teaching ethics and thinking about the social implications of computing, each panelist shared their perspective and approach.

    A strong advocate for the importance of learning from history, Eden Medina, associate professor of science, technology, and society at MIT, said that “often the way we frame computing is that everything is new. One of the things that I do in my teaching is look at how people have confronted these issues in the past and try to draw from them as a way to think about possible ways forward.” Medina regularly uses case studies in her classes and referred to a paper written by Yale University science historian Joanna Radin on the Pima Indian Diabetes Dataset that raised ethical issues on the history of that particular collection of data that many don’t consider as an example of how decisions around technology and data can grow out of very specific contexts.

    Milo Phillips-Brown, associate professor of philosophy at Oxford University, talked about the Ethical Computing Protocol that he co-created while he was a SERC postdoc at MIT. The protocol, a four-step approach to building technology responsibly, is designed to train computer science students to think in a better and more accurate way about the social implications of technology by breaking the process down into more manageable steps. “The basic approach that we take very much draws on the fields of value-sensitive design, responsible research and innovation, participatory design as guiding insights, and then is also fundamentally interdisciplinary,” he said.

    Fields such as biomedicine and law have an ethics ecosystem that distributes the function of ethical reasoning in these areas. Oversight and regulation are provided to guide front-line stakeholders and decision-makers when issues arise, as are training programs and access to interdisciplinary expertise that they can draw from. “In this space, we have none of that,” said John Basl, associate professor of philosophy at Northeastern University. “For current generations of computer scientists and other decision-makers, we’re actually making them do the ethical reasoning on their own.” Basl commented further that teaching core ethical reasoning skills across the curriculum, not just in philosophy classes, is essential, and that the goal shouldn’t be for every computer scientist be a professional ethicist, but for them to know enough of the landscape to be able to ask the right questions and seek out the relevant expertise and resources that exists.

    After the final session, interdisciplinary groups of faculty, students, and researchers engaged in animated discussions related to the issues covered throughout the day during a reception that marked the conclusion of the symposium. More

  • in

    MIT researchers make language models scalable self-learners

    Socrates once said: “It is not the size of a thing, but the quality that truly matters. For it is in the nature of substance, not its volume, that true value is found.”

    Does size always matter for large language models (LLMs)? In a technological landscape bedazzled by LLMs taking center stage, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers think smaller models shouldn’t be overlooked, especially for natural language understanding products widely deployed in the industry.

    To that end, the researchers cooked up an approach to long-standing problems of inefficiency and privacy associated with big, text-based AI models — a logic-aware model that outperforms 500-times-bigger counterparts on some language understanding tasks without human-generated annotations, while preserving privacy and robustness with high performance.

    LLMs, which have shown some promising skills in generating language, art, and code, are computationally expensive, and their data requirements can risk privacy leaks when using application programming interfaces for data upload. Smaller models have been historically less capable, particularly in multitasking and weakly supervised tasks, compared to their larger counterparts.

    So what’s helping these smaller models act so mighty, then? Something called “textual entailment,” a way to help these models understand a variety of language tasks, where if one sentence (the premise) is true, then the other sentence (the hypothesis) is likely to be true as well. For example, if the premise is, “all cats have tails” then the hypothesis “a tabby cat has a tail” would be entailed by the premise. This concept is used to train an “entailment model” that proved to be less biased than other language models, from the team’s previous research. They then created “prompts” that the models can use to figure out if certain information is entailed by a given sentence or phrase according to different tasks. This method improved the model’s ability to adapt to different tasks without any additional training, known as zero-shot adaptation.

    In the realm of “natural language understanding,” there are various applications that hinge on determining the relationship between two pieces of text. For example, in sentiment classification, a statement like “I think the movie is good” can be inferred or entailed from a movie review that says, “I like the story and the acting is great,” indicating a positive sentiment. Another is news classification, where the topic of a news article can be inferred from its content. For example, a statement like “the news article is about sports” can be entailed if the main content of the article reports on an NBA game. The key insight was that many existing natural language understanding tasks could be recast as an entailment (i.e., logical inference in natural language) task. 

    “Our research is about improving the ability of computer programs to understand and process natural language — the way humans speak and write. Our self-trained, 350-million-parameter entailment models, without human-generated labels, outperform supervised language models with 137 to 175 billion parameters,” says MIT CSAIL postdoc Hongyin Luo, lead author on a new paper about the study. “This has potential to reshape the landscape of AI and machine learning, providing a more scalable, trustworthy, and cost-effective solution to language modeling,” says Luo. “By proving that smaller models can perform at the same level as larger ones for language understanding, this work paves the way for more sustainable and privacy-preserving AI technologies.” 

    The team discovered that they could improve the model’s performance even more by using a technique called “self-training,” where the model uses its own predictions to teach itself, effectively learning without human supervision and additional annotated training data.The self-training method significantly improved performance on a bunch of downstream tasks, including sentiment analysis, question-answering, and news classification. It outperformed both Google’s LaMDA and FLAN in zero-shot capabilities, GPT models, and other supervised algorithms. 

    However, one challenge with self-training is that the model can sometimes generate incorrect or noisy labels that harm performance. To overcome this, they developed a new algorithm called ‘SimPLE’ (Simple Pseudo-Label Editing), a process to review and modify the pseudo-labels made in initial rounds of learning. By correcting any mislabeled instances, it improved the overall quality of the self-generated labels. This not only made the models more effective at understanding language, but more robust when faced with adversarial data. 

    As with most research, there are some limitations. The self-training on multi-class classification tasks didn’t perform as well as on binary natural language understanding tasks, indicating the challenge of applying entailment models to multi-choice tasks.“This research presents an efficient and effective way to train large language models (LLMs) by formulating natural language understanding tasks as contextual entailment problems and employing a pseudo-labeling self-training mechanism to incorporate large quantities of unlabelled text data in the training process,” adds CSAIL Senior Research Scientist James Glass, who is also an author on the paper. “While the field of LLMs is undergoing rapid and dramatic changes, this research shows that it is possible to produce relatively compact language models that perform very well on benchmark understanding tasks compared to their peers of roughly the same size, or even much larger language models.”

    “Entailment task is a popular proxy to evaluate “understanding” of a given context by an AI model,” says Leonid Karlinsky, research staff member at the MIT-IBM Watson AI Lab. “It is used in many areas analyzing models with unimodal, like LLMs, and and multi-modal, like VLMs [visual language models] inputs, simplifying the task of question-answering about a given input context to a binary classification problem — does this context entail a certain (e.g., text) conclusion or not? This paper makes two contributions in this space. First, it proposes a way to improve the zero-shot (without additional tuning) NLU performance and robustness to adversarial attacks via tuning with synthesized (specialized) entailment tasks generated for the primal NLU task. Second, it offers a self-supervised SimPLE method including pseudo-labeling and confidence-based filtering to further improve large LLMs’ NLU performance.”

    Luo and Glass wrote the paper with Yoon Kim, a CSAIL member and assistant professor in MIT’s Department of Electrical Engineering and Computer Science, and Jiaxin Ge of Peking University. Their work will be presented at the meeting of the Association for Computational Linguistics in Toronto, Ontario this July. This research was supported by a grant from the Hong Kong Innovation AI program. More

  • in

    Scaling audio-visual learning without labels

    Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere have developed a new technique for analyzing unlabeled audio and visual data that could improve the performance of machine-learning models used in applications like speech recognition and object detection. The work, for the first time, combines two architectures of self-supervised learning, contrastive learning and masked data modeling, in an effort to scale machine-learning tasks like event classification in single- and multimodal data without the need for annotation, thereby replicating how humans understand and perceive our world.

    “A larger portion of human knowledge is learned in a self-supervised way, because we don’t always get supervision signals, and we want to enable the machine-learning model to have the same ability,” says Yuan Gong, an MIT postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    “So, another way to put it is that self-supervised learning often forms the foundation of an initial model, because it can learn on vast amounts of unlabeled data. And then you can use classical, supervised learning or reinforcement learning to fine tune the model to something particular if you want to,” says Jim Glass, an MIT senior research scientist and member of the MIT-IBM Watson AI Lab.

    The technique, called the contrastive audio-visual masked autoencoder (CAV-MAE), is a type of neural network that can learn to extract and map meaningful latent representations into high-dimensional space from acoustic and visual data by training on large YouTube datasets of audio and video 10-second clips. The researchers say the technique is more effective than previous approaches because it explicitly models the relationships between audio and visual data in a way that other methods do not.

    Joining Gong and Glass on the study are graduate students Andrew Rouditchenko and Alexander H. Liu of MIT, David Harwath PhD ’18 of the University of Texas at Austin, and MIT-IBM Watson AI Lab members Leonid Karlinsky and Hilde Kuehne. Kuehne is also affiliated with Goethe University Frankfurt. The method was recently presented at the International Conference on Learning Representations.

    A joint and coordinated approach

    The CAV-MAE works by “learning by prediction” and “learning by comparison,” says Gong. The masked data modeling, or the prediction method, takes a video along with its coordinated audio waveform, converts the audio to a spectrogram, and masks 75 percent of both. The unmasked data is tokenized, then fed into separate audio and visual encoders before entering a joint encoder/decoder, where the model is asked to recover the missing data. The difference (reconstruction loss) between the resulting reconstructed prediction and the original audio-visual combination is then used to train the model for better performance. An example of this would be covering part of a video of a piano and part of a spectrogram of piano music, and then asking the model to try to determine the masked inputs. Unfortunately, this method may not capture the association between the video and audio pair, whereas contrastive learning leverages this, but may discard some modality-unique information, like the background in a video.

    Contrastive learning aims to map representations that are similar close to each other. For example, the model will attempt to place different video and audio data of different parrots close to each other and further away from pairs of video and audio of guitars playing. In a similar fashion to masked autoencoding, audio-visual pairs are passed into separate modality encoders; however, the audio and visual components are kept separately within the joint encoder before the model performs pooling and contrastive loss. In this way, contrastive learning tries to identify the parts of each audio or video that are most relevant to the other. For example, if a video shows someone speaking and the corresponding audio clip contains speech, the autoencoder will learn to associate the mouth movements of the speaker with the words being spoken. It will then adjust the model’s parameters so that those inputs are represented close to each other. Ultimately, the CAV-MAE method combines both techniques with multiple forward data streams with masking as a first step, modality-specific encoders, and layer normalization so that the representation strengths are similar.

    “We [then] wanted to compare the proposed CAV-MAE with a model trained only with a masked autoencoder and a model trained only with contrastive learning, because we want to show that by combining masked autoencoder and contrastive learning, we can get some performance improvement,” says Gong, “and the results support our hypothesis that there’s obvious improvement.”

    The researchers tested CAV-MAE — as well as their method without contrastive loss or a masked autoencoder — against other state-of-the-art methods on audio-visual retrieval and audio-visual event classification tasks using standard AudioSet (20K and 2M) and VGGSound datasets — labeled, realistic short clips, which could include multiple sounds. Audio-visual retrieval means that the model sees either the audio or visual component of a query pair and searches for the missing one; event classification includes identifying actions or sounds within data, like a person singing or a car driving.

    Overall, they found that contrastive learning and masked data modeling are complementary methods. CAV-MAE was able to outperform previous techniques (with fully self-supervised pre-training) by about 2 percent for event classification performance verses models with comparable computation and, more impressively, kept pace with or outperformed models with industry-level computational resources. The team’s model ranked similarly to models trained with only the contrastive loss. And surprisingly, the team says, the incorporation of multi-modal data into CAV-MAE pre-training greatly improves the fine-tuning of single-modality representation via supervised learning (with some labeled data) and performance on audio-only event classification tasks. This demonstrates that, like humans, multi-modal information provides an additional “soft label” boost even for audio or visual only tasks; for instance, it helps the model to understand if it’s looking for an electric or acoustic guitar — a richer supervision signal.

    “I think people like the elegance of this model for combining information in the different audio and visual streams. It has the contrastive and the reconstruction loss, and compared to models that have been evaluated with similar data, it clearly does very well across a range of these tasks,” says Glass.

    Building on this, “one special thing is, our model can do both classification and the retrieval, which is not common,” Gong adds. “Before this work, these methods are used separately, but after this work, I see that most of the audio-visual learning frameworks use contracting loss and the masked autoencoder together, implicitly or explicitly.”

    Bringing self-supervised audio-visual learning into our world

    The researchers see their contribution of the contrastive audio-visual masked autoencoder (CAV-MAE) as an important milestone and a step forward for applications, which are increasingly moving from single modality to multi-modality and which require or leverage audio-visual fusion. They hypothesize that one day it could be used for action recognition in realms like sports, education, entertainment, motor vehicles, and public safety. It could also, one day, extend to other modalities. At this time, the fact that, “this only applies to audio-visual data may be a limitation, but we are targeting multi-modal learning, which is trend of machine learning,” says Gong. “As humans, we have multi-modalities — we have smell, touch — many more things that just audio-visual. So, when we try to build AI, we try to mimic humans somehow, not necessarily from the biological perspective, and this method could [potentially be] generalized to other unexplored modalities.”

    As machine-learning models continue to play an increasingly important role in our lives, techniques like this one will become increasingly valuable.

    This research was supported by the MIT-IBM Watson AI Lab. More

  • in

    Celebrating the impact of IDSS

    The “interdisciplinary approach” is something that has been lauded for decades for its ability to break down silos and create new integrated approaches to research.

    For Munther Dahleh, founding director of the MIT Institute for Data, Systems, and Society (IDSS), showing the community that data science and statistics can transcend individual disciplines and form a new holistic approach to addressing complex societal challenges has been crucial to the institute’s success.

    “From the very beginning, it was critical that we recognized the areas of data science, statistics, AI, and, in a way, computing, as transdisciplinary,” says Dahleh, who is the William A. Coolidge Professor in Electrical Engineering and Computer Science. “We made that point over and over — these are areas that embed in your field. It is not ours; this organization is here for everyone.”

    On April 14-15, researchers from across and beyond MIT joined together to celebrate the accomplishments and impact IDSS has had on research and education since its inception in 2015. Taking the place of IDSS’s annual statistics and data science conference SDSCon, the celebration also doubled as a way to recognize Dahleh for his work creating and executing the vision of IDSS as he prepares to step down from his director position this summer.

    In addition to talks and panels on statistics and computation, smart systems, automation and artificial intelligence, conference participants discussed issues ranging from climate change, health care, and misinformation. Nobel Prize winner and IDSS affiliate Professor Esther Duflo spoke on large scale immunization efforts, former MLK Visiting Professor Craig Watkins joined a panel on equity and justice in AI, and IDSS Associate Director Alberto Abadie discussed synthetic controls for policy evaluation. Other policy questions were explored through lightning talks, including those by students from the Technology and Policy Program (TPP) within IDSS.

    A place to call home

    The list of IDSS accomplishments over the last eight years is long and growing. From creating a home for 21st century statistics at MIT after other unsuccessful attempts, to creating a new PhD preparing the trilingual student who is an expert in data science and social science in the context of a domain, to playing a key role in determining an effective process for Covid testing in the early days of the pandemic, IDSS has left its mark on MIT. More recently, IDSS launched an initiative using big data to help effect structural and normative change toward racial equity, and will continue to explore societal challenges through the lenses of statistics, social science, and science and engineering.

    “I’m very proud of what we’ve done and of all the people who have contributed to this. The leadership team has been phenomenal in their commitment and their creativity,” Dahleh says. “I always say it doesn’t take one person, it takes the village to do what we have done, and I am very proud of that.”

    Prior to the institute’s formation, Dahleh and others at MIT were brought together to answer one key question: How would MIT prepare for the future of systems and data?

    “Data science is a complex area because in some ways it’s everywhere and it belongs to everyone, similar to statistics and AI,” Dahleh says “The most important part of creating an organization to support it was making it clear that it was an organization for everyone.” The response the team came back with was to build an Institute: a department that could cross all other departments and schools.

    While Dahleh and others on the committee were creating this blueprint for the future, the events that would lead early IDSS hires like Caroline Uhler to join the team were also beginning to take shape. Uhler, now an MIT professor of computer science and co-director of the Eric and Wendy Schmidt Center at the Broad Institute, was a panelist at the celebration discussing statistics and human health.

    In 2015, Uhler was a faculty member at the Institute of Science and Technology in Austria looking to move back to the U.S. “I was looking for positions in all different types of departments related to statistics, including electrical engineering and computer science, which were areas not related to my degree,” Uhler says. “What really got me to MIT was Munther’s vision for building a modern type of statistics, and the unique opportunity to be part of building what statistics should be moving forward.”

    The breadth of the Statistics and Data Science Center has given it a unique and a robust character that makes for an attractive collaborative environment at MIT. “A lot of IDSS’s impact has been in giving people like me a home,” Uhler adds. “By building an institute for statistics that is across all schools instead of housed within a single department, it has created a home for everyone who is interested in the field.”

    Filling the gap

    For Ali Jadbabaie, former IDSS associate director and another early IDSS hire, being in the right place at the right time landed him in the center of it all. A control theory expert and network scientist by training, Jadbabaie first came to MIT during a sabbatical from his position as a professor at the University of Pennsylvania.

    “My time at MIT coincided with the early discussions around forming IDSS and given my experience they asked me to stay and help with its creation,” Jadbabaie says. He is now head of the Department of Civil and Environmental Engineering at MIT, and he spoke at the celebration about a new MIT major in climate system science and engineering.

    A critical early accomplishment of IDSS was the creation of a doctoral program in social and engineering systems (SES), which has the goal of educating and fostering the success of a new type of PhD student, says Jadbabaie.

    “We realized we had this opportunity to educate a new type of PhD student who was conversant in the math of information sciences and statistics in addition to an understanding of a domain — infrastructures, climate, political polarization — in which problems arise,” he says. “This program would provide training in statistics and data science, the math of information sciences and a branch of social science that is relevant to their domain.”

    “SES has been filling a gap,” adds Jadbabaie. “We wanted to bring quantitative reasoning to areas in social sciences, particularly as they interact with complex engineering systems.”

    “My first year at MIT really broadened my horizon in terms of what was available and exciting,” says Manxi Wu, a member of the first cohort of students in the SES program after starting out in the Master of Science in Transportation (MST) program. “My advisor introduced me to a number of interesting topics at the intersection of game theory, economics, and engineering systems, and in my second year I realized my interest was really about the societal scale systems, with transportation as my go-to application area when I think about how to make an impact in the real world.”

    Wu, now an assistant professor in the School of Operations Research and Information Engineering at Cornell, was a panelist at the Celebration’s session on smart infrastructure systems. She says that the beauty of the SES program lies in its ability to create a common ground between groups of students and researchers who all have different applications interests but share an eagerness to sharpen their technical skills.

    “While we may be working on very different application areas, the core methodologies, such as mathematical tools for data science and probability optimization, create a common language,” Wu says. “We are all capable of speaking the technical language, and our diversified interests give us even more to talk about.”

    In addition to the PhD program, IDSS has helped bring quality MIT programming to people around the globe with its MicroMasters Program in Statistics and Data Science (SDS), which recently celebrated the certification of over 1,000 learners. The MicroMasters is just one offering in the newly-minted IDSSx, a collection of online learning opportunities for learners at different skill levels and interests.

    “The impact of branding what MIT-IDSS does across the globe has been great,” Dahleh says. “In addition, we’ve created smaller online programs for continued education in data science and machine learning, which I think is also critical in educating the community at large.”

    Hopes for the future

    Through all of its accomplishments, the core mission of IDSS has never changed.

    “The belief was always to create an institute focused on how data science can be used to solve pressing societal problems,” Dahleh says. “The organizational structure of IDSS as an MIT Institute has enabled it to promote data and systems as a transdiciplinary area that embeds in every domain to support its mission. This reverse ownership structure will continue to strengthen the presence of IDSS in MIT and will make it an essential unit within the Schwarzman College of Computing.”

    As Dahleh prepares to step down from his role, and Professor Martin Wainwright gets ready to fill his (very big) shoes as director, Dahleh’s colleagues say the real key to the success of IDSS all started with his passion and vision.

    “Creating a new academic unit within MIT is actually next to impossible,” Jadbabaie says. “It requires structural changes, as well as someone who has a strong understanding of multiple areas, who knows how to get people to work together collectively, and who has a mission.”

    “The most important thing is that he was inclusive,” he adds. “He didn’t try to create a gate around it and say these people are in and these people are not. I don’t think this would have ever happened without Munther at the helm.” More

  • in

    Exploring new methods for increasing safety and reliability of autonomous vehicles

    When we think of getting on the road in our cars, our first thoughts may not be that fellow drivers are particularly safe or careful — but human drivers are more reliable than one may expect. For each fatal car crash in the United States, motor vehicles log a whopping hundred million miles on the road.

    Human reliability also plays a role in how autonomous vehicles are integrated in the traffic system, especially around safety considerations. Human drivers continue to surpass autonomous vehicles in their ability to make quick decisions and perceive complex environments: Autonomous vehicles are known to struggle with seemingly common tasks, such as taking on- or off-ramps, or turning left in the face of oncoming traffic. Despite these enormous challenges, embracing autonomous vehicles in the future could yield great benefits, like clearing congested highways; enhancing freedom and mobility for non-drivers; and boosting driving efficiency, an important piece in fighting climate change.

    MIT engineer Cathy Wu envisions ways that autonomous vehicles could be deployed with their current shortcomings, without experiencing a dip in safety. “I started thinking more about the bottlenecks. It’s very clear that the main barrier to deployment of autonomous vehicles is safety and reliability,” Wu says.

    One path forward may be to introduce a hybrid system, in which autonomous vehicles handle easier scenarios on their own, like cruising on the highway, while transferring more complicated maneuvers to remote human operators. Wu, who is a member of the Laboratory for Information and Decision Systems (LIDS), a Gilbert W. Winslow Assistant Professor of Civil and Environmental Engineering (CEE) and a member of the MIT Institute for Data, Systems, and Society (IDSS), likens this approach to air traffic controllers on the ground directing commercial aircraft.

    In a paper published April 12 in IEEE Transactions on Robotics, Wu and co-authors Cameron Hickert and Sirui Li (both graduate students at LIDS) introduced a framework for how remote human supervision could be scaled to make a hybrid system efficient without compromising passenger safety. They noted that if autonomous vehicles were able to coordinate with each other on the road, they could reduce the number of moments in which humans needed to intervene.

    Humans and cars: finding a balance that’s just right

    For the project, Wu, Hickert, and Li sought to tackle a maneuver that autonomous vehicles often struggle to complete. They decided to focus on merging, specifically when vehicles use an on-ramp to enter a highway. In real life, merging cars must accelerate or slow down in order to avoid crashing into cars already on the road. In this scenario, if an autonomous vehicle was about to merge into traffic, remote human supervisors could momentarily take control of the vehicle to ensure a safe merge. In order to evaluate the efficiency of such a system, particularly while guaranteeing safety, the team specified the maximum amount of time each human supervisor would be expected to spend on a single merge. They were interested in understanding whether a small number of remote human supervisors could successfully manage a larger group of autonomous vehicles, and the extent to which this human-to-car ratio could be improved while still safely covering every merge.

    With more autonomous vehicles in use, one might assume a need for more remote supervisors. But in scenarios where autonomous vehicles coordinated with each other, the team found that cars could significantly reduce the number of times humans needed to step in. For example, a coordinating autonomous vehicle already on a highway could adjust its speed to make room for a merging car, eliminating a risky merging situation altogether.

    The team substantiated the potential to safely scale remote supervision in two theorems. First, using a mathematical framework known as queuing theory, the researchers formulated an expression to capture the probability of a given number of supervisors failing to handle all merges pooled together from multiple cars. This way, the researchers were able to assess how many remote supervisors would be needed in order to cover every potential merge conflict, depending on the number of autonomous vehicles in use. The researchers derived a second theorem to quantify the influence of cooperative autonomous vehicles on surrounding traffic for boosting reliability, to assist cars attempting to merge.

    When the team modeled a scenario in which 30 percent of cars on the road were cooperative autonomous vehicles, they estimated that a ratio of one human supervisor to every 47 autonomous vehicles could cover 99.9999 percent of merging cases. But this level of coverage drops below 99 percent, an unacceptable range, in scenarios where autonomous vehicles did not cooperate with each other.

    “If vehicles were to coordinate and basically prevent the need for supervision, that’s actually the best way to improve reliability,” Wu says.

    Cruising toward the future

    The team decided to focus on merging not only because it’s a challenge for autonomous vehicles, but also because it’s a well-defined task associated with a less-daunting scenario: driving on the highway. About half of the total miles traveled in the United States occur on interstates and other freeways. Since highways allow higher speeds than city roads, Wu says, “If you can fully automate highway driving … you give people back about a third of their driving time.”

    If it became feasible for autonomous vehicles to cruise unsupervised for most highway driving, the challenge of safely navigating complex or unexpected moments would remain. For instance, “you [would] need to be able to handle the start and end of the highway driving,” Wu says. You would also need to be able to manage times when passengers zone out or fall asleep, making them unable to quickly take over controls should it be needed. But if remote human supervisors could guide autonomous vehicles at key moments, passengers may never have to touch the wheel. Besides merging, other challenging situations on the highway include changing lanes and overtaking slower cars on the road.

    Although remote supervision and coordinated autonomous vehicles are hypotheticals for high-speed operations, and not currently in use, Wu hopes that thinking about these topics can encourage growth in the field.

    “This gives us some more confidence that the autonomous driving experience can happen,” Wu says. “I think we need to be more creative about what we mean by ‘autonomous vehicles.’ We want to give people back their time — safely. We want the benefits, we don’t strictly want something that drives autonomously.” More

  • in

    Using data to write songs for progress

    A three-year recipient of MIT’s Emerson Classical Vocal Scholarships, senior Ananya Gurumurthy recalls getting ready to step onto the Carnegie Hall stage to sing a Mozart opera that she once sang with the New York All-State Choir. The choir conductor reminded her to articulate her words and to engage her diaphragm.

    “If you don’t project your voice, how are people going to hear you when you perform?” Gurumurthy recalls her conductor telling her. “This is your moment, your chance to connect with such a tremendous audience.”

    Gurumurthy reflects on the universal truth of those words as she adds her musical talents to her math and computer science studies to campaign for social and economic justice.

    The daughter of immigrants

    Growing up in Edgemont, New York, she was inspired to fight on behalf of others by her South Asian immigrant parents, who came to the United States in the 1980s. Her father is a management consultant and her mother has experience as an investment banker.

    “They came barely 15 years after the passage of the 1965 Immigration and Nationality Act, which removed national origin quotas from the American immigration system,” she says. “I would not be here if it had not been for the Civil Rights Movement, which preceded both me and my parents.”

    Her parents told her about their new home’s anti-immigrant sentiments; for example, her father was a graduate student in Dallas exiting a store when he was pelted with glass bottles and racial slurs.

    “I often consider the amount of bravery that it must have taken them to abandon everything they knew to immigrate to a new, but still imperfect, country in search of something better,” she says. “As a result, I have always felt so grounded in my identity both as a South Asian American and a woman of color. These identities have allowed me to think critically about how I can most effectively reform the institutions surrounding me.”

    Gurumurthy has been singing since she was 11, but in high school, she decided to also build her political voice by working for New York Senator Andrea Stewart-Cousins. At one point, Gurumurthy noted a log was kept for the subjects of constituent calls, such as “affordable housing” and  “infrastructure,” and it was then that she became aware that Stewart-Cousins would address the most pressing of these callers’ issues before the Senate.

    “This experience was my first time witnessing how powerful the mobilization of constituents in vast numbers was for influencing meaningful legislative change,” says Gurumurthy.

    After she began applying her math skills to political campaigns, Gurumurthy was soon tapped to run analytics for the Democratic National Committee’s (DNC) midterm election initiative. As a lead analyst for the New York DNC, she adapted an interactive activation-competition (IAC) model to understand voting patterns in the 2018 and 2020 elections. She collected data from public voting records to predict how constituents would cast their ballots and used an IAC algorithm to strategize alongside grassroots organizations and allocate resources to empower historically disenfranchised groups in municipal, state, and federal elections to encourage them to vote.

    Research and student organizing at MIT

    When she arrived at MIT in 2019 to study mathematics with computer science, along with minors in music and economics, she admits she was saddled with the naïve notion that she would “build digital tools that could single-handedly alleviate all of the collective pressures of systemic injustice in this country.” 

    Since then, she has learned to create what she calls “a more nuanced view.” She picked up data analytics skills to build mobilization platforms for organizations that pursued social and economic justice, including working in Fulton County, Georgia, with Fair Fight Action (through the Kelly-Douglas Fund Scholarship) to analyze patterns of voter suppression, and MIT’s ethics laboratories in the Computer Science and Artificial Intelligence Laboratory to build symbolic artificial intelligence protocols to better understand bias in artificial intelligence algorithms. For her work on the International Monetary Fund (through the MIT Washington Summer Internship Program), Gurumurthy was awarded second place for the 2022 S. Klein Prize in Technical Writing for her paper “The Rapid Rise of Cryptocurrency.”

    “The outcomes of each project gave me more hope to begin the next because I could see the impact of these digital tools,” she says. “I saw people feel empowered to use their voices whether it was voting for the first time, protesting exploitative global monetary policy, or fighting gender discrimination. I’ve been really fortunate to see the power of mathematical analysis firsthand.”

    “I have come to realize that the constructive use of technology could be a powerful voice of resistance against injustice,” she says. “Because numbers matter, and when people bear witness to them, they are pushed to take action in meaningful ways.”

    Hoping to make a difference in her own community, she joined several Institute committees. As co-chair of the Undergraduate Association’s education committee, she propelled MIT’s first-ever digital petition for grade transparency and worked with faculty members on Institute committees to ensure that all students were being provided adequate resources to participate in online education in the wake of the Covid-19 pandemic. The digital petition inspired her to begin a project, called Insite, to develop a more centralized digital means of data collection on student life at MIT to better inform policies made by its governing bodies. As Ring Committee chair, she ensured that the special traditions of the “Brass Rat” were made economically accessible to all class members by helping the committee nearly triple its financial aid budget. For her efforts at MIT, last May she received the William L. Stewart, Jr. Award for “[her] contributions [as] an individual student at MIT to extracurricular activities and student life.”

    Ananya plans on going to law school after graduation, to study constitutional law so that she can use her technical background to build quantitative evidence in cases pertaining to voting rights, social welfare, and ethical technology, and set legal standards ”for the humane use of data,” she says.

    “In building digital tools for a variety of social and economic justice organizations, I hope that we can challenge our existing systems of power and realize the progress we so dearly need to witness. There is strength in numbers, both algorithmically and organizationally. I believe it is our responsibility to simultaneously use these strengths to change the world.”

    Her ambitions, however, began when she began singing lessons when she was 11; without her background as a vocalist, she says she would be voiceless.

    “Operatic performance has given me the ability to truly step into my character and convey powerful emotions in my performance. In the process, I have realized that my voice is most powerful when it reflects my true convictions, whether I am performing or publicly speaking. I truly believe that this honesty has allowed me to become an effective community organizer. I’d like to believe that this voice is what compels those around me to act.”

    Private musical study is available for students through the Emerson/Harris Program, which offers merit-based financial awards to students of outstanding achievement on their instruments or voice in classical, jazz, or world music. The Emerson/Harris Program is funded by the late Cherry L. Emerson Jr. SM ’41, in response to an appeal from Associate Provost Ellen T. Harris (Class of 1949 professor emeritus of music). More