More stories

  • in

    Researchers teach an AI to write better chart captions

    Chart captions that explain complex trends and patterns are important for improving a reader’s ability to comprehend and retain the data being presented. And for people with visual disabilities, the information in a caption often provides their only means of understanding the chart.

    But writing effective, detailed captions is a labor-intensive process. While autocaptioning techniques can alleviate this burden, they often struggle to describe cognitive features that provide additional context.

    To help people author high-quality chart captions, MIT researchers have developed a dataset to improve automatic captioning systems. Using this tool, researchers could teach a machine-learning model to vary the level of complexity and type of content included in a chart caption based on the needs of users.

    The MIT researchers found that machine-learning models trained for autocaptioning with their dataset consistently generated captions that were precise, semantically rich, and described data trends and complex patterns. Quantitative and qualitative analyses revealed that their models captioned charts more effectively than other autocaptioning systems.  

    The team’s goal is to provide the dataset, called VisText, as a tool researchers can use as they work on the thorny problem of chart autocaptioning. These automatic systems could help provide captions for uncaptioned online charts and improve accessibility for people with visual disabilities, says co-lead author Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    “We’ve tried to embed a lot of human values into our dataset so that when we and other researchers are building automatic chart-captioning systems, we don’t end up with models that aren’t what people want or need,” she says.

    Boggust is joined on the paper by co-lead author and fellow graduate student Benny J. Tang and senior author Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

    Human-centered analysis

    The researchers were inspired to develop VisText from prior work in the Visualization Group that explored what makes a good chart caption. In that study, researchers found that sighted users and blind or low-vision users had different preferences for the complexity of semantic content in a caption. 

    The group wanted to bring that human-centered analysis into autocaptioning research. To do that, they developed VisText, a dataset of charts and associated captions that could be used to train machine-learning models to generate accurate, semantically rich, customizable captions.

    Developing effective autocaptioning systems is no easy task. Existing machine-learning methods often try to caption charts the way they would an image, but people and models interpret natural images differently from how we read charts. Other techniques skip the visual content entirely and caption a chart using its underlying data table. However, such data tables are often not available after charts are published.

    Given the shortfalls of using images and data tables, VisText also represents charts as scene graphs. Scene graphs, which can be extracted from a chart image, contain all the chart data but also include additional image context.

    “A scene graph is like the best of both worlds — it contains almost all the information present in an image while being easier to extract from images than data tables. As it’s also text, we can leverage advances in modern large language models for captioning,” Tang explains.

    They compiled a dataset that contains more than 12,000 charts — each represented as a data table, image, and scene graph — as well as associated captions. Each chart has two separate captions: a low-level caption that describes the chart’s construction (like its axis ranges) and a higher-level caption that describes statistics, relationships in the data, and complex trends.

    The researchers generated low-level captions using an automated system and crowdsourced higher-level captions from human workers.

    “Our captions were informed by two key pieces of prior research: existing guidelines on accessible descriptions of visual media and a conceptual model from our group for categorizing semantic content. This ensured that our captions featured important low-level chart elements like axes, scales, and units for readers with visual disabilities, while retaining human variability in how captions can be written,” says Tang.

    Translating charts

    Once they had gathered chart images and captions, the researchers used VisText to train five machine-learning models for autocaptioning. They wanted to see how each representation — image, data table, and scene graph — and combinations of the representations affected the quality of the caption.

    “You can think about a chart captioning model like a model for language translation. But instead of saying, translate this German text to English, we are saying translate this ‘chart language’ to English,” Boggust says.

    Their results showed that models trained with scene graphs performed as well or better than those trained using data tables. Since scene graphs are easier to extract from existing charts, the researchers argue that they might be a more useful representation.

    They also trained models with low-level and high-level captions separately. This technique, known as semantic prefix tuning, enabled them to teach the model to vary the complexity of the caption’s content.

    In addition, they conducted a qualitative examination of captions produced by their best-performing method and categorized six types of common errors. For instance, a directional error occurs if a model says a trend is decreasing when it is actually increasing.

    This fine-grained, robust qualitative evaluation was important for understanding how the model was making its errors. For example, using quantitative methods, a directional error might incur the same penalty as a repetition error, where the model repeats the same word or phrase. But a directional error could be more misleading to a user than a repetition error. The qualitative analysis helped them understand these types of subtleties, Boggust says.

    These sorts of errors also expose limitations of current models and raise ethical considerations that researchers must consider as they work to develop autocaptioning systems, she adds.

    Generative machine-learning models, such as those that power ChatGPT, have been shown to hallucinate or give incorrect information that can be misleading. While there is a clear benefit to using these models for autocaptioning existing charts, it could lead to the spread of misinformation if charts are captioned incorrectly.

    “Maybe this means that we don’t just caption everything in sight with AI. Instead, perhaps we provide these autocaptioning systems as authorship tools for people to edit. It is important to think about these ethical implications throughout the research process, not just at the end when we have a model to deploy,” she says.

    Boggust, Tang, and their colleagues want to continue optimizing the models to reduce some common errors. They also want to expand the VisText dataset to include more charts, and more complex charts, such as those with stacked bars or multiple lines. And they would also like to gain insights into what these autocaptioning models are actually learning about chart data.

    This research was supported, in part, by a Google Research Scholar Award, the National Science Foundation, the MLA@CSAIL Initiative, and the United States Air Force Research Laboratory. More

  • in

    MIT-Pillar AI Collective announces first seed grant recipients

    The MIT-Pillar AI Collective has announced its first six grant recipients. Students, alumni, and postdocs working on a broad range of topics in artificial intelligence, machine learning, and data science will receive funding and support for research projects that could translate into commercially viable products or companies. These grants are intended to help students explore commercial applications for their research, and eventually drive that commercialization through the creation of a startup.

    “These tremendous students and postdocs are working on projects that have the potential to be truly transformative across a diverse range of industries. It’s thrilling to think that the novel research these teams are conducting could lead to the founding of startups that revolutionize everything from drug delivery to video conferencing,” says Anantha Chandrakasan, dean of the School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science.

    Launched in September 2022, the MIT-Pillar AI Collective is a pilot program funded by a $1 million gift from Pillar VC that aims to cultivate prospective entrepreneurs and drive innovation in areas related to AI. Administered by the MIT Deshpande Center for Technological Innovation, the AI Collective centers on the market discovery process, advancing projects through market research, customer discovery, and prototyping. Graduate students and postdocs supported by the program work toward the development of minimum viable products.

    “In addition to funding, the MIT-Pillar AI Collective provides grant recipients with mentorship and guidance. With the rapid advancement of AI technologies, this type of support is critical to ensure students and postdocs are able to access the resources required to move quickly in this fast-pace environment,” says Jinane Abounadi, managing director of the MIT-Pillar AI Collective.

    The six inaugural recipients will receive support in identifying key milestones and advice from experienced entrepreneurs. The AI Collective assists seed grant recipients in gathering feedback from potential end-users, as well as getting insights from early-stage investors. The program also organizes community events, including a “Founder Talks” speaker series, and other team-building activities.   

    “Each one of these grant recipients exhibits an entrepreneurial spirit. It is exciting to provide support and guidance as they start a journey that could one day see them as founders and leaders of successful companies,” adds Jamie Goldstein ’89, founder of Pillar VC.

    The first cohort of grant recipients include the following projects:

    Predictive query interface

    Abdullah Alomar SM ’21, a PhD candidate studying electrical engineering and computer science, is building a predictive query interface for time series databases to better forecast demand and financial data. This user-friendly interface can help alleviate some of the bottlenecks and issues related to unwieldy data engineering processes while providing state-of-the-art statistical accuracy. Alomar is advised by Devavrat Shah, the Andrew (1956) and Erna Viterbi Professor at MIT.

    Design of light-activated drugs

    Simon Axelrod, a PhD candidate studying chemical physics at Harvard University, is combining AI with physics simulations to design light-activated drugs that could reduce side effects and improve effectiveness. Patients would receive an inactive form of a drug, which is then activated by light in a specific area of the body containing diseased tissue. This localized use of photoactive drugs would minimize the side effects from drugs targeting healthy cells. Axelrod is developing novel computational models that predict properties of photoactive drugs with high speed and accuracy, allowing researchers to focus on only the highest-quality drug candidates. He is advised by Rafael Gomez-Bombarelli, the Jeffrey Cheah Career Development Chair in Engineering in the MIT Department of Materials Science and Engineering. 

    Low-cost 3D perception

    Arjun Balasingam, a PhD student in electrical engineering and computer science and a member of the Computer Science and Artificial Intelligence Laboratory’s (CSAIL) Networks and Mobile Systems group, is developing a technology, called MobiSee, that enables real-time 3D reconstruction in challenging dynamic environments. MobiSee uses self-supervised AI methods along with video and lidar to provide low-cost, state-of-the-art 3D perception on consumer mobile devices like smartphones. This technology could have far-reaching applications across mixed reality, navigation, safety, and sports streaming, in addition to unlocking opportunities for new real-time and immersive experiences. He is advised by Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence at MIT and member of CSAIL.

    Sleep therapeutics

    Guillermo Bernal SM ’14, PhD ’23, a recent PhD graduate in media arts and sciences, is developing a sleep therapeutic platform that would enable sleep specialists and researchers to conduct robust sleep studies and develop therapy plans remotely, while the patient is comfortable in their home. Called Fascia, the three-part system consists of a polysomnogram with a sleep mask form factor that collects data, a hub that enables researchers to provide stimulation and feedback via olfactory, auditory, and visual stimuli, and a web portal that enables researchers to read a patient’s signals in real time with machine learning analysis. Bernal was advised by Pattie Maes, professor of media arts and sciences at the MIT Media Lab.

    Autonomous manufacturing assembly with human-like tactile perception

    Michael Foshey, a mechanical engineer and project manager with MIT CSAIL’s Computational Design and Fabrication Group, is developing an AI-enabled tactile perception system that can be used to give robots human-like dexterity. With this new technology platform, Foshey and his team hope to enable industry-changing applications in manufacturing. Currently, assembly tasks in manufacturing are largely done by hand and are typically repetitive and tedious. As a result, these jobs are being largely left unfilled. These labor shortages can cause supply chain shortages and increases in the cost of production. Foshey’s new technology platform aims to address this by automating assembly tasks to reduce reliance on manual labor. Foshey is supervised by Wojciech Matusik, MIT professor of electrical engineering and computer science and member of CSAIL.  

    Generative AI for video conferencing

    Vibhaalakshmi Sivaraman SM ’19, a PhD candidate in electrical engineering and computer science who is a member of CSAIL’s Networking and Mobile Systems Group, is developing a generative technology, Gemino, to facilitate video conferencing in high-latency and low-bandwidth network environments. Gemino is a neural compression system for video conferencing that overcomes the robustness concerns and compute complexity challenges that limit current face-image-synthesis models. This technology could enable sustained video conferencing calls in regions and scenarios that cannot reliably support video calls today. Sivaraman is advised by Mohammad Alizadeh, MIT associate professor of electrical engineering and computer science and member of CSAIL.  More

  • in

    Bringing the social and ethical responsibilities of computing to the forefront

    There has been a remarkable surge in the use of algorithms and artificial intelligence to address a wide range of problems and challenges. While their adoption, particularly with the rise of AI, is reshaping nearly every industry sector, discipline, and area of research, such innovations often expose unexpected consequences that involve new norms, new expectations, and new rules and laws.

    To facilitate deeper understanding, the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Schwarzman College of Computing, recently brought together social scientists and humanists with computer scientists, engineers, and other computing faculty for an exploration of the ways in which the broad applicability of algorithms and AI has presented both opportunities and challenges in many aspects of society.

    “The very nature of our reality is changing. AI has the ability to do things that until recently were solely the realm of human intelligence — things that can challenge our understanding of what it means to be human,” remarked Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, in his opening address at the inaugural SERC Symposium. “This poses philosophical, conceptual, and practical questions on a scale not experienced since the start of the Enlightenment. In the face of such profound change, we need new conceptual maps for navigating the change.”

    The symposium offered a glimpse into the vision and activities of SERC in both research and education. “We believe our responsibility with SERC is to educate and equip our students and enable our faculty to contribute to responsible technology development and deployment,” said Georgia Perakis, the William F. Pounds Professor of Management in the MIT Sloan School of Management, co-associate dean of SERC, and the lead organizer of the symposium. “We’re drawing from the many strengths and diversity of disciplines across MIT and beyond and bringing them together to gain multiple viewpoints.”

    Through a succession of panels and sessions, the symposium delved into a variety of topics related to the societal and ethical dimensions of computing. In addition, 37 undergraduate and graduate students from a range of majors, including urban studies and planning, political science, mathematics, biology, electrical engineering and computer science, and brain and cognitive sciences, participated in a poster session to exhibit their research in this space, covering such topics as quantum ethics, AI collusion in storage markets, computing waste, and empowering users on social platforms for better content credibility.

    Showcasing a diversity of work

    In three sessions devoted to themes of beneficent and fair computing, equitable and personalized health, and algorithms and humans, the SERC Symposium showcased work by 12 faculty members across these domains.

    One such project from a multidisciplinary team of archaeologists, architects, digital artists, and computational social scientists aimed to preserve endangered heritage sites in Afghanistan with digital twins. The project team produced highly detailed interrogable 3D models of the heritage sites, in addition to extended reality and virtual reality experiences, as learning resources for audiences that cannot access these sites.

    In a project for the United Network for Organ Sharing, researchers showed how they used applied analytics to optimize various facets of an organ allocation system in the United States that is currently undergoing a major overhaul in order to make it more efficient, equitable, and inclusive for different racial, age, and gender groups, among others.

    Another talk discussed an area that has not yet received adequate public attention: the broader implications for equity that biased sensor data holds for the next generation of models in computing and health care.

    A talk on bias in algorithms considered both human bias and algorithmic bias, and the potential for improving results by taking into account differences in the nature of the two kinds of bias.

    Other highlighted research included the interaction between online platforms and human psychology; a study on whether decision-makers make systemic prediction mistakes on the available information; and an illustration of how advanced analytics and computation can be leveraged to inform supply chain management, operations, and regulatory work in the food and pharmaceutical industries.

    Improving the algorithms of tomorrow

    “Algorithms are, without question, impacting every aspect of our lives,” said Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science, in kicking off a panel she moderated on the implications of data and algorithms.

    “Whether it’s in the context of social media, online commerce, automated tasks, and now a much wider range of creative interactions with the advent of generative AI tools and large language models, there’s little doubt that much more is to come,” Ozdaglar said. “While the promise is evident to all of us, there’s a lot to be concerned as well. This is very much time for imaginative thinking and careful deliberation to improve the algorithms of tomorrow.”

    Turning to the panel, Ozdaglar asked experts from computing, social science, and data science for insights on how to understand what is to come and shape it to enrich outcomes for the majority of humanity.

    Sarah Williams, associate professor of technology and urban planning at MIT, emphasized the critical importance of comprehending the process of how datasets are assembled, as data are the foundation for all models. She also stressed the need for research to address the potential implication of biases in algorithms that often find their way in through their creators and the data used in their development. “It’s up to us to think about our own ethical solutions to these problems,” she said. “Just as it’s important to progress with the technology, we need to start the field of looking at these questions of what biases are in the algorithms? What biases are in the data, or in that data’s journey?”

    Shifting focus to generative models and whether the development and use of these technologies should be regulated, the panelists — which also included MIT’s Srini Devadas, professor of electrical engineering and computer science, John Horton, professor of information technology, and Simon Johnson, professor of entrepreneurship — all concurred that regulating open-source algorithms, which are publicly accessible, would be difficult given that regulators are still catching up and struggling to even set guardrails for technology that is now 20 years old.

    Returning to the question of how to effectively regulate the use of these technologies, Johnson proposed a progressive corporate tax system as a potential solution. He recommends basing companies’ tax payments on their profits, especially for large corporations whose massive earnings go largely untaxed due to offshore banking. By doing so, Johnson said that this approach can serve as a regulatory mechanism that discourages companies from trying to “own the entire world” by imposing disincentives.

    The role of ethics in computing education

    As computing continues to advance with no signs of slowing down, it is critical to educate students to be intentional in the social impact of the technologies they will be developing and deploying into the world. But can one actually be taught such things? If so, how?

    Caspar Hare, professor of philosophy at MIT and co-associate dean of SERC, posed this looming question to faculty on a panel he moderated on the role of ethics in computing education. All experienced in teaching ethics and thinking about the social implications of computing, each panelist shared their perspective and approach.

    A strong advocate for the importance of learning from history, Eden Medina, associate professor of science, technology, and society at MIT, said that “often the way we frame computing is that everything is new. One of the things that I do in my teaching is look at how people have confronted these issues in the past and try to draw from them as a way to think about possible ways forward.” Medina regularly uses case studies in her classes and referred to a paper written by Yale University science historian Joanna Radin on the Pima Indian Diabetes Dataset that raised ethical issues on the history of that particular collection of data that many don’t consider as an example of how decisions around technology and data can grow out of very specific contexts.

    Milo Phillips-Brown, associate professor of philosophy at Oxford University, talked about the Ethical Computing Protocol that he co-created while he was a SERC postdoc at MIT. The protocol, a four-step approach to building technology responsibly, is designed to train computer science students to think in a better and more accurate way about the social implications of technology by breaking the process down into more manageable steps. “The basic approach that we take very much draws on the fields of value-sensitive design, responsible research and innovation, participatory design as guiding insights, and then is also fundamentally interdisciplinary,” he said.

    Fields such as biomedicine and law have an ethics ecosystem that distributes the function of ethical reasoning in these areas. Oversight and regulation are provided to guide front-line stakeholders and decision-makers when issues arise, as are training programs and access to interdisciplinary expertise that they can draw from. “In this space, we have none of that,” said John Basl, associate professor of philosophy at Northeastern University. “For current generations of computer scientists and other decision-makers, we’re actually making them do the ethical reasoning on their own.” Basl commented further that teaching core ethical reasoning skills across the curriculum, not just in philosophy classes, is essential, and that the goal shouldn’t be for every computer scientist be a professional ethicist, but for them to know enough of the landscape to be able to ask the right questions and seek out the relevant expertise and resources that exists.

    After the final session, interdisciplinary groups of faculty, students, and researchers engaged in animated discussions related to the issues covered throughout the day during a reception that marked the conclusion of the symposium. More

  • in

    MIT researchers make language models scalable self-learners

    Socrates once said: “It is not the size of a thing, but the quality that truly matters. For it is in the nature of substance, not its volume, that true value is found.”

    Does size always matter for large language models (LLMs)? In a technological landscape bedazzled by LLMs taking center stage, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers think smaller models shouldn’t be overlooked, especially for natural language understanding products widely deployed in the industry.

    To that end, the researchers cooked up an approach to long-standing problems of inefficiency and privacy associated with big, text-based AI models — a logic-aware model that outperforms 500-times-bigger counterparts on some language understanding tasks without human-generated annotations, while preserving privacy and robustness with high performance.

    LLMs, which have shown some promising skills in generating language, art, and code, are computationally expensive, and their data requirements can risk privacy leaks when using application programming interfaces for data upload. Smaller models have been historically less capable, particularly in multitasking and weakly supervised tasks, compared to their larger counterparts.

    So what’s helping these smaller models act so mighty, then? Something called “textual entailment,” a way to help these models understand a variety of language tasks, where if one sentence (the premise) is true, then the other sentence (the hypothesis) is likely to be true as well. For example, if the premise is, “all cats have tails” then the hypothesis “a tabby cat has a tail” would be entailed by the premise. This concept is used to train an “entailment model” that proved to be less biased than other language models, from the team’s previous research. They then created “prompts” that the models can use to figure out if certain information is entailed by a given sentence or phrase according to different tasks. This method improved the model’s ability to adapt to different tasks without any additional training, known as zero-shot adaptation.

    In the realm of “natural language understanding,” there are various applications that hinge on determining the relationship between two pieces of text. For example, in sentiment classification, a statement like “I think the movie is good” can be inferred or entailed from a movie review that says, “I like the story and the acting is great,” indicating a positive sentiment. Another is news classification, where the topic of a news article can be inferred from its content. For example, a statement like “the news article is about sports” can be entailed if the main content of the article reports on an NBA game. The key insight was that many existing natural language understanding tasks could be recast as an entailment (i.e., logical inference in natural language) task. 

    “Our research is about improving the ability of computer programs to understand and process natural language — the way humans speak and write. Our self-trained, 350-million-parameter entailment models, without human-generated labels, outperform supervised language models with 137 to 175 billion parameters,” says MIT CSAIL postdoc Hongyin Luo, lead author on a new paper about the study. “This has potential to reshape the landscape of AI and machine learning, providing a more scalable, trustworthy, and cost-effective solution to language modeling,” says Luo. “By proving that smaller models can perform at the same level as larger ones for language understanding, this work paves the way for more sustainable and privacy-preserving AI technologies.” 

    The team discovered that they could improve the model’s performance even more by using a technique called “self-training,” where the model uses its own predictions to teach itself, effectively learning without human supervision and additional annotated training data.The self-training method significantly improved performance on a bunch of downstream tasks, including sentiment analysis, question-answering, and news classification. It outperformed both Google’s LaMDA and FLAN in zero-shot capabilities, GPT models, and other supervised algorithms. 

    However, one challenge with self-training is that the model can sometimes generate incorrect or noisy labels that harm performance. To overcome this, they developed a new algorithm called ‘SimPLE’ (Simple Pseudo-Label Editing), a process to review and modify the pseudo-labels made in initial rounds of learning. By correcting any mislabeled instances, it improved the overall quality of the self-generated labels. This not only made the models more effective at understanding language, but more robust when faced with adversarial data. 

    As with most research, there are some limitations. The self-training on multi-class classification tasks didn’t perform as well as on binary natural language understanding tasks, indicating the challenge of applying entailment models to multi-choice tasks.“This research presents an efficient and effective way to train large language models (LLMs) by formulating natural language understanding tasks as contextual entailment problems and employing a pseudo-labeling self-training mechanism to incorporate large quantities of unlabelled text data in the training process,” adds CSAIL Senior Research Scientist James Glass, who is also an author on the paper. “While the field of LLMs is undergoing rapid and dramatic changes, this research shows that it is possible to produce relatively compact language models that perform very well on benchmark understanding tasks compared to their peers of roughly the same size, or even much larger language models.”

    “Entailment task is a popular proxy to evaluate “understanding” of a given context by an AI model,” says Leonid Karlinsky, research staff member at the MIT-IBM Watson AI Lab. “It is used in many areas analyzing models with unimodal, like LLMs, and and multi-modal, like VLMs [visual language models] inputs, simplifying the task of question-answering about a given input context to a binary classification problem — does this context entail a certain (e.g., text) conclusion or not? This paper makes two contributions in this space. First, it proposes a way to improve the zero-shot (without additional tuning) NLU performance and robustness to adversarial attacks via tuning with synthesized (specialized) entailment tasks generated for the primal NLU task. Second, it offers a self-supervised SimPLE method including pseudo-labeling and confidence-based filtering to further improve large LLMs’ NLU performance.”

    Luo and Glass wrote the paper with Yoon Kim, a CSAIL member and assistant professor in MIT’s Department of Electrical Engineering and Computer Science, and Jiaxin Ge of Peking University. Their work will be presented at the meeting of the Association for Computational Linguistics in Toronto, Ontario this July. This research was supported by a grant from the Hong Kong Innovation AI program. More

  • in

    Scaling audio-visual learning without labels

    Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere have developed a new technique for analyzing unlabeled audio and visual data that could improve the performance of machine-learning models used in applications like speech recognition and object detection. The work, for the first time, combines two architectures of self-supervised learning, contrastive learning and masked data modeling, in an effort to scale machine-learning tasks like event classification in single- and multimodal data without the need for annotation, thereby replicating how humans understand and perceive our world.

    “A larger portion of human knowledge is learned in a self-supervised way, because we don’t always get supervision signals, and we want to enable the machine-learning model to have the same ability,” says Yuan Gong, an MIT postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    “So, another way to put it is that self-supervised learning often forms the foundation of an initial model, because it can learn on vast amounts of unlabeled data. And then you can use classical, supervised learning or reinforcement learning to fine tune the model to something particular if you want to,” says Jim Glass, an MIT senior research scientist and member of the MIT-IBM Watson AI Lab.

    The technique, called the contrastive audio-visual masked autoencoder (CAV-MAE), is a type of neural network that can learn to extract and map meaningful latent representations into high-dimensional space from acoustic and visual data by training on large YouTube datasets of audio and video 10-second clips. The researchers say the technique is more effective than previous approaches because it explicitly models the relationships between audio and visual data in a way that other methods do not.

    Joining Gong and Glass on the study are graduate students Andrew Rouditchenko and Alexander H. Liu of MIT, David Harwath PhD ’18 of the University of Texas at Austin, and MIT-IBM Watson AI Lab members Leonid Karlinsky and Hilde Kuehne. Kuehne is also affiliated with Goethe University Frankfurt. The method was recently presented at the International Conference on Learning Representations.

    A joint and coordinated approach

    The CAV-MAE works by “learning by prediction” and “learning by comparison,” says Gong. The masked data modeling, or the prediction method, takes a video along with its coordinated audio waveform, converts the audio to a spectrogram, and masks 75 percent of both. The unmasked data is tokenized, then fed into separate audio and visual encoders before entering a joint encoder/decoder, where the model is asked to recover the missing data. The difference (reconstruction loss) between the resulting reconstructed prediction and the original audio-visual combination is then used to train the model for better performance. An example of this would be covering part of a video of a piano and part of a spectrogram of piano music, and then asking the model to try to determine the masked inputs. Unfortunately, this method may not capture the association between the video and audio pair, whereas contrastive learning leverages this, but may discard some modality-unique information, like the background in a video.

    Contrastive learning aims to map representations that are similar close to each other. For example, the model will attempt to place different video and audio data of different parrots close to each other and further away from pairs of video and audio of guitars playing. In a similar fashion to masked autoencoding, audio-visual pairs are passed into separate modality encoders; however, the audio and visual components are kept separately within the joint encoder before the model performs pooling and contrastive loss. In this way, contrastive learning tries to identify the parts of each audio or video that are most relevant to the other. For example, if a video shows someone speaking and the corresponding audio clip contains speech, the autoencoder will learn to associate the mouth movements of the speaker with the words being spoken. It will then adjust the model’s parameters so that those inputs are represented close to each other. Ultimately, the CAV-MAE method combines both techniques with multiple forward data streams with masking as a first step, modality-specific encoders, and layer normalization so that the representation strengths are similar.

    “We [then] wanted to compare the proposed CAV-MAE with a model trained only with a masked autoencoder and a model trained only with contrastive learning, because we want to show that by combining masked autoencoder and contrastive learning, we can get some performance improvement,” says Gong, “and the results support our hypothesis that there’s obvious improvement.”

    The researchers tested CAV-MAE — as well as their method without contrastive loss or a masked autoencoder — against other state-of-the-art methods on audio-visual retrieval and audio-visual event classification tasks using standard AudioSet (20K and 2M) and VGGSound datasets — labeled, realistic short clips, which could include multiple sounds. Audio-visual retrieval means that the model sees either the audio or visual component of a query pair and searches for the missing one; event classification includes identifying actions or sounds within data, like a person singing or a car driving.

    Overall, they found that contrastive learning and masked data modeling are complementary methods. CAV-MAE was able to outperform previous techniques (with fully self-supervised pre-training) by about 2 percent for event classification performance verses models with comparable computation and, more impressively, kept pace with or outperformed models with industry-level computational resources. The team’s model ranked similarly to models trained with only the contrastive loss. And surprisingly, the team says, the incorporation of multi-modal data into CAV-MAE pre-training greatly improves the fine-tuning of single-modality representation via supervised learning (with some labeled data) and performance on audio-only event classification tasks. This demonstrates that, like humans, multi-modal information provides an additional “soft label” boost even for audio or visual only tasks; for instance, it helps the model to understand if it’s looking for an electric or acoustic guitar — a richer supervision signal.

    “I think people like the elegance of this model for combining information in the different audio and visual streams. It has the contrastive and the reconstruction loss, and compared to models that have been evaluated with similar data, it clearly does very well across a range of these tasks,” says Glass.

    Building on this, “one special thing is, our model can do both classification and the retrieval, which is not common,” Gong adds. “Before this work, these methods are used separately, but after this work, I see that most of the audio-visual learning frameworks use contracting loss and the masked autoencoder together, implicitly or explicitly.”

    Bringing self-supervised audio-visual learning into our world

    The researchers see their contribution of the contrastive audio-visual masked autoencoder (CAV-MAE) as an important milestone and a step forward for applications, which are increasingly moving from single modality to multi-modality and which require or leverage audio-visual fusion. They hypothesize that one day it could be used for action recognition in realms like sports, education, entertainment, motor vehicles, and public safety. It could also, one day, extend to other modalities. At this time, the fact that, “this only applies to audio-visual data may be a limitation, but we are targeting multi-modal learning, which is trend of machine learning,” says Gong. “As humans, we have multi-modalities — we have smell, touch — many more things that just audio-visual. So, when we try to build AI, we try to mimic humans somehow, not necessarily from the biological perspective, and this method could [potentially be] generalized to other unexplored modalities.”

    As machine-learning models continue to play an increasingly important role in our lives, techniques like this one will become increasingly valuable.

    This research was supported by the MIT-IBM Watson AI Lab. More

  • in

    Celebrating the impact of IDSS

    The “interdisciplinary approach” is something that has been lauded for decades for its ability to break down silos and create new integrated approaches to research.

    For Munther Dahleh, founding director of the MIT Institute for Data, Systems, and Society (IDSS), showing the community that data science and statistics can transcend individual disciplines and form a new holistic approach to addressing complex societal challenges has been crucial to the institute’s success.

    “From the very beginning, it was critical that we recognized the areas of data science, statistics, AI, and, in a way, computing, as transdisciplinary,” says Dahleh, who is the William A. Coolidge Professor in Electrical Engineering and Computer Science. “We made that point over and over — these are areas that embed in your field. It is not ours; this organization is here for everyone.”

    On April 14-15, researchers from across and beyond MIT joined together to celebrate the accomplishments and impact IDSS has had on research and education since its inception in 2015. Taking the place of IDSS’s annual statistics and data science conference SDSCon, the celebration also doubled as a way to recognize Dahleh for his work creating and executing the vision of IDSS as he prepares to step down from his director position this summer.

    In addition to talks and panels on statistics and computation, smart systems, automation and artificial intelligence, conference participants discussed issues ranging from climate change, health care, and misinformation. Nobel Prize winner and IDSS affiliate Professor Esther Duflo spoke on large scale immunization efforts, former MLK Visiting Professor Craig Watkins joined a panel on equity and justice in AI, and IDSS Associate Director Alberto Abadie discussed synthetic controls for policy evaluation. Other policy questions were explored through lightning talks, including those by students from the Technology and Policy Program (TPP) within IDSS.

    A place to call home

    The list of IDSS accomplishments over the last eight years is long and growing. From creating a home for 21st century statistics at MIT after other unsuccessful attempts, to creating a new PhD preparing the trilingual student who is an expert in data science and social science in the context of a domain, to playing a key role in determining an effective process for Covid testing in the early days of the pandemic, IDSS has left its mark on MIT. More recently, IDSS launched an initiative using big data to help effect structural and normative change toward racial equity, and will continue to explore societal challenges through the lenses of statistics, social science, and science and engineering.

    “I’m very proud of what we’ve done and of all the people who have contributed to this. The leadership team has been phenomenal in their commitment and their creativity,” Dahleh says. “I always say it doesn’t take one person, it takes the village to do what we have done, and I am very proud of that.”

    Prior to the institute’s formation, Dahleh and others at MIT were brought together to answer one key question: How would MIT prepare for the future of systems and data?

    “Data science is a complex area because in some ways it’s everywhere and it belongs to everyone, similar to statistics and AI,” Dahleh says “The most important part of creating an organization to support it was making it clear that it was an organization for everyone.” The response the team came back with was to build an Institute: a department that could cross all other departments and schools.

    While Dahleh and others on the committee were creating this blueprint for the future, the events that would lead early IDSS hires like Caroline Uhler to join the team were also beginning to take shape. Uhler, now an MIT professor of computer science and co-director of the Eric and Wendy Schmidt Center at the Broad Institute, was a panelist at the celebration discussing statistics and human health.

    In 2015, Uhler was a faculty member at the Institute of Science and Technology in Austria looking to move back to the U.S. “I was looking for positions in all different types of departments related to statistics, including electrical engineering and computer science, which were areas not related to my degree,” Uhler says. “What really got me to MIT was Munther’s vision for building a modern type of statistics, and the unique opportunity to be part of building what statistics should be moving forward.”

    The breadth of the Statistics and Data Science Center has given it a unique and a robust character that makes for an attractive collaborative environment at MIT. “A lot of IDSS’s impact has been in giving people like me a home,” Uhler adds. “By building an institute for statistics that is across all schools instead of housed within a single department, it has created a home for everyone who is interested in the field.”

    Filling the gap

    For Ali Jadbabaie, former IDSS associate director and another early IDSS hire, being in the right place at the right time landed him in the center of it all. A control theory expert and network scientist by training, Jadbabaie first came to MIT during a sabbatical from his position as a professor at the University of Pennsylvania.

    “My time at MIT coincided with the early discussions around forming IDSS and given my experience they asked me to stay and help with its creation,” Jadbabaie says. He is now head of the Department of Civil and Environmental Engineering at MIT, and he spoke at the celebration about a new MIT major in climate system science and engineering.

    A critical early accomplishment of IDSS was the creation of a doctoral program in social and engineering systems (SES), which has the goal of educating and fostering the success of a new type of PhD student, says Jadbabaie.

    “We realized we had this opportunity to educate a new type of PhD student who was conversant in the math of information sciences and statistics in addition to an understanding of a domain — infrastructures, climate, political polarization — in which problems arise,” he says. “This program would provide training in statistics and data science, the math of information sciences and a branch of social science that is relevant to their domain.”

    “SES has been filling a gap,” adds Jadbabaie. “We wanted to bring quantitative reasoning to areas in social sciences, particularly as they interact with complex engineering systems.”

    “My first year at MIT really broadened my horizon in terms of what was available and exciting,” says Manxi Wu, a member of the first cohort of students in the SES program after starting out in the Master of Science in Transportation (MST) program. “My advisor introduced me to a number of interesting topics at the intersection of game theory, economics, and engineering systems, and in my second year I realized my interest was really about the societal scale systems, with transportation as my go-to application area when I think about how to make an impact in the real world.”

    Wu, now an assistant professor in the School of Operations Research and Information Engineering at Cornell, was a panelist at the Celebration’s session on smart infrastructure systems. She says that the beauty of the SES program lies in its ability to create a common ground between groups of students and researchers who all have different applications interests but share an eagerness to sharpen their technical skills.

    “While we may be working on very different application areas, the core methodologies, such as mathematical tools for data science and probability optimization, create a common language,” Wu says. “We are all capable of speaking the technical language, and our diversified interests give us even more to talk about.”

    In addition to the PhD program, IDSS has helped bring quality MIT programming to people around the globe with its MicroMasters Program in Statistics and Data Science (SDS), which recently celebrated the certification of over 1,000 learners. The MicroMasters is just one offering in the newly-minted IDSSx, a collection of online learning opportunities for learners at different skill levels and interests.

    “The impact of branding what MIT-IDSS does across the globe has been great,” Dahleh says. “In addition, we’ve created smaller online programs for continued education in data science and machine learning, which I think is also critical in educating the community at large.”

    Hopes for the future

    Through all of its accomplishments, the core mission of IDSS has never changed.

    “The belief was always to create an institute focused on how data science can be used to solve pressing societal problems,” Dahleh says. “The organizational structure of IDSS as an MIT Institute has enabled it to promote data and systems as a transdiciplinary area that embeds in every domain to support its mission. This reverse ownership structure will continue to strengthen the presence of IDSS in MIT and will make it an essential unit within the Schwarzman College of Computing.”

    As Dahleh prepares to step down from his role, and Professor Martin Wainwright gets ready to fill his (very big) shoes as director, Dahleh’s colleagues say the real key to the success of IDSS all started with his passion and vision.

    “Creating a new academic unit within MIT is actually next to impossible,” Jadbabaie says. “It requires structural changes, as well as someone who has a strong understanding of multiple areas, who knows how to get people to work together collectively, and who has a mission.”

    “The most important thing is that he was inclusive,” he adds. “He didn’t try to create a gate around it and say these people are in and these people are not. I don’t think this would have ever happened without Munther at the helm.” More

  • in

    J-WAFS announces 2023 seed grant recipients

    Today, the Abdul Latif Jameel Water and Food Systems Lab (J-WAFS) announced its ninth round of seed grants to support innovative research projects at MIT. The grants are designed to fund research efforts that tackle challenges related to water and food for human use, with the ultimate goal of creating meaningful impact as the world population continues to grow and the planet undergoes significant climate and environmental changes.Ten new projects led by 15 researchers from seven different departments will be supported this year. The projects address a range of challenges by employing advanced materials, technology innovations, and new approaches to resource management. The new projects aim to remove harmful chemicals from water sources, develop monitoring and other systems to help manage various aquaculture industries, optimize water purification materials, and more.“The seed grant program is J-WAFS’ flagship grant initiative,” says J-WAFS executive director Renee J. Robins. “The funding is intended to spur groundbreaking MIT research addressing complex issues that are challenging our water and food systems. The 10 projects selected this year show great promise, and we look forward to the progress and accomplishments these talented researchers will make,” she adds.The 2023 J-WAFS seed grant researchers and their projects are:Sara Beery, an assistant professor in the Department of Electrical Engineering and Computer Science (EECS), is building the first completely automated system to estimate the size of salmon populations in the Pacific Northwest (PNW).Salmon are a keystone species in the PNW, feeding human populations for the last 7,500 years at least. However, overfishing, habitat loss, and climate change threaten extinction of salmon populations across the region. Accurate salmon counts during their seasonal migration to their natal river to spawn are essential for fisheries’ regulation and management but are limited by human capacity. Fish population monitoring is a widespread challenge in the United States and worldwide. Beery and her team are working to build a system that will provide a detailed picture of the state of salmon populations in unprecedented, spatial, and temporal resolution by combining sonar sensors and computer vision and machine learning (CVML) techniques. The sonar will capture individual fish as they swim upstream and CVML will train accurate algorithms to interpret the sonar video for detecting, tracking, and counting fish automatically while adapting to changing river conditions and fish densities.Another aquaculture project is being led by Michael Triantafyllou, the Henry L. and Grace Doherty Professor in Ocean Science and Engineering in the Department of Mechanical Engineering, and Robert Vincent, the assistant director at MIT’s Sea Grant Program. They are working with Otto Cordero, an associate professor in the Department of Civil and Environmental Engineering, to control harmful bacteria blooms in aquaculture algae feed production.

    Aquaculture in the United States represents a $1.5 billion industry annually and helps support 1.7 million jobs, yet many American hatcheries are not able to keep up with demand. One barrier to aquaculture production is the high degree of variability in survival rates, most likely caused by a poorly controlled microbiome that leads to bacterial infections and sub-optimal feed efficiency. Triantafyllou, Vincent, and Cordero plan to monitor the microbiome composition of a shellfish hatchery in order to identify possible causing agents of mortality, as well as beneficial microbes. They hope to pair microbe data with detail phenotypic information about the animal population to generate rapid diagnostic tests and explore the potential for microbiome therapies to protect larvae and prevent future outbreaks. The researchers plan to transfer their findings and technology to the local and regional aquaculture community to ensure healthy aquaculture production that will support the expansion of the U.S. aquaculture industry.

    David Des Marais is the Cecil and Ida Green Career Development Professor in the Department of Civil and Environmental Engineering. His 2023 J-WAFS project seeks to understand plant growth responses to elevated carbon dioxide (CO2) in the atmosphere, in the hopes of identifying breeding strategies that maximize crop yield under future CO2 scenarios.Today’s crop plants experience higher atmospheric CO2 than 20 or 30 years ago. Crops such as wheat, oat, barley, and rice typically increase their growth rate and biomass when grown at experimentally elevated atmospheric CO2. This is known as the so-called “CO2 fertilization effect.” However, not all plant species respond to rising atmospheric CO2 with increased growth, and for the ones that do, increased growth doesn’t necessarily correspond to increased crop yield. Using specially built plant growth chambers that can control the concentration of CO2, Des Marais will explore how CO2 availability impacts the development of tillers (branches) in the grass species Brachypodium. He will study how gene expression controls tiller development, and whether this is affected by the growing environment. The tillering response refers to how many branches a plant produces, which sets a limit on how much grain it can yield. Therefore, optimizing the tillering response to elevated CO2 could greatly increase yield. Des Marais will also look at the complete genome sequence of Brachypodium, wheat, oat, and barley to help identify genes relevant for branch growth.Darcy McRose, an assistant professor in the Department of Civil and Environmental Engineering, is researching whether a combination of plant metabolites and soil bacteria can be used to make mineral-associated phosphorus more bioavailable.The nutrient phosphorus is essential for agricultural plant growth, but when added as a fertilizer, phosphorus sticks to the surface of soil minerals, decreasing bioavailability, limiting plant growth, and accumulating residual phosphorus. Heavily fertilized agricultural soils often harbor large reservoirs of this type of mineral-associated “legacy” phosphorus. Redox transformations are one chemical process that can liberate mineral-associated phosphorus. However, this needs to be carefully controlled, as overly mobile phosphorus can lead to runoff and pollution of natural waters. Ideally, phosphorus would be made bioavailable when plants need it and immobile when they don’t. Many plants make small metabolites called coumarins that might be able to solubilize mineral-adsorbed phosphorus and be activated and inactivated under different conditions. McRose will use laboratory experiments to determine whether a combination of plant metabolites and soil bacteria can be used as a highly efficient and tunable system for phosphorus solubilization. She also aims to develop an imaging platform to investigate exchanges of phosphorus between plants and soil microbes.Many of the 2023 seed grants will support innovative technologies to monitor, quantify, and remediate various kinds of pollutants found in water. Two of the new projects address the problem of per- and polyfluoroalkyl substances (PFAS), human-made chemicals that have recently emerged as a global health threat. Known as “forever chemicals,” PFAS are used in many manufacturing processes. These chemicals are known to cause significant health issues including cancer, and they have become pervasive in soil, dust, air, groundwater, and drinking water. Unfortunately, the physical and chemical properties of PFAS render them difficult to detect and remove.Aristide Gumyusenge, the Merton C. Assistant Professor of Materials Science and Engineering, is using metal-organic frameworks for low-cost sensing and capture of PFAS. Most metal-organic frameworks (MOFs) are synthesized as particles, which complicates their high accuracy sensing performance due to defects such as intergranular boundaries. Thin, film-based electronic devices could enable the use of MOFs for many applications, especially chemical sensing. Gumyusenge’s project aims to design test kits based on two-dimensional conductive MOF films for detecting PFAS in drinking water. In early demonstrations, Gumyusenge and his team showed that these MOF films can sense PFAS at low concentrations. They will continue to iterate using a computation-guided approach to tune sensitivity and selectivity of the kits with the goal of deploying them in real-world scenarios.Carlos Portela, the Brit (1961) and Alex (1949) d’Arbeloff Career Development Professor in the Department of Mechanical Engineering, and Ariel Furst, the Cook Career Development Professor in the Department of Chemical Engineering, are building novel architected materials to act as filters for the removal of PFAS from water. Portela and Furst will design and fabricate nanoscale materials that use activated carbon and porous polymers to create a physical adsorption system. They will engineer the materials to have tunable porosities and morphologies that can maximize interactions between contaminated water and functionalized surfaces, while providing a mechanically robust system.Rohit Karnik is a Tata Professor and interim co-department head of the Department of Mechanical Engineering. He is working on another technology, his based on microbead sensors, to rapidly measure and monitor trace contaminants in water.Water pollution from both biological and chemical contaminants contributes to an estimated 1.36 million deaths annually. Chemical contaminants include pesticides and herbicides, heavy metals like lead, and compounds used in manufacturing. These emerging contaminants can be found throughout the environment, including in water supplies. The Environmental Protection Agency (EPA) in the United States sets recommended water quality standards, but states are responsible for developing their own monitoring criteria and systems, which must be approved by the EPA every three years. However, the availability of data on regulated chemicals and on candidate pollutants is limited by current testing methods that are either insensitive or expensive and laboratory-based, requiring trained scientists and technicians. Karnik’s project proposes a simple, self-contained, portable system for monitoring trace and emerging pollutants in water, making it suitable for field studies. The concept is based on multiplexed microbead-based sensors that use thermal or gravitational actuation to generate a signal. His proposed sandwich assay, a testing format that is appealing for environmental sensing, will enable both single-use and continuous monitoring. The hope is that the bead-based assays will increase the ease and reach of detecting and quantifying trace contaminants in water for both personal and industrial scale applications.Alexander Radosevich, a professor in the Department of Chemistry, and Timothy Swager, the John D. MacArthur Professor of Chemistry, are teaming up to create rapid, cost-effective, and reliable techniques for on-site arsenic detection in water.Arsenic contamination of groundwater is a problem that affects as many as 500 million people worldwide. Arsenic poisoning can lead to a range of severe health problems from cancer to cardiovascular and neurological impacts. Both the EPA and the World Health Organization have established that 10 parts per billion is a practical threshold for arsenic in drinking water, but measuring arsenic in water at such low levels is challenging, especially in resource-limited environments where access to sensitive laboratory equipment may not be readily accessible. Radosevich and Swager plan to develop reaction-based chemical sensors that bind and extract electrons from aqueous arsenic. In this way, they will exploit the inherent reactivity of aqueous arsenic to selectively detect and quantify it. This work will establish the chemical basis for a new method of detecting trace arsenic in drinking water.Rajeev Ram is a professor in the Department of Electrical Engineering and Computer Science. His J-WAFS research will advance a robust technology for monitoring nitrogen-containing pollutants, which threaten over 15,000 bodies of water in the United States alone.Nitrogen in the form of nitrate, nitrite, ammonia, and urea can run off from agricultural fertilizer and lead to harmful algal blooms that jeopardize human health. Unfortunately, monitoring these contaminants in the environment is challenging, as sensors are difficult to maintain and expensive to deploy. Ram and his students will work to establish limits of detection for nitrate, nitrite, ammonia, and urea in environmental, industrial, and agricultural samples using swept-source Raman spectroscopy. Swept-source Raman spectroscopy is a method of detecting the presence of a chemical by using a tunable, single mode laser that illuminates a sample. This method does not require costly, high-power lasers or a spectrometer. Ram will then develop and demonstrate a portable system that is capable of achieving chemical specificity in complex, natural environments. Data generated by such a system should help regulate polluters and guide remediation.Kripa Varanasi, a professor in the Department of Mechanical Engineering, and Angela Belcher, the James Mason Crafts Professor and head of the Department of Biological Engineering, will join forces to develop an affordable water disinfection technology that selectively identifies, adsorbs, and kills “superbugs” in domestic and industrial wastewater.Recent research predicts that antibiotic-resistance bacteria (superbugs) will result in $100 trillion in health care expenses and 10 million deaths annually by 2050. The prevalence of superbugs in our water systems has increased due to corroded pipes, contamination, and climate change. Current drinking water disinfection technologies are designed to kill all types of bacteria before human consumption. However, for certain domestic and industrial applications there is a need to protect the good bacteria required for ecological processes that contribute to soil and plant health. Varanasi and Belcher will combine material, biological, process, and system engineering principles to design a sponge-based water disinfection technology that can identify and destroy harmful bacteria while leaving the good bacteria unharmed. By modifying the sponge surface with specialized nanomaterials, their approach will be able to kill superbugs faster and more efficiently. The sponge filters can be deployed under very low pressure, making them an affordable technology, especially in resource-constrained communities.In addition to the 10 seed grant projects, J-WAFS will also fund a research initiative led by Greg Sixt. Sixt is the research manager for climate and food systems at J-WAFS, and the director of the J-WAFS-led Food and Climate Systems Transformation (FACT) Alliance. His project focuses on the Lake Victoria Basin (LVB) of East Africa. The second-largest freshwater lake in the world, Lake Victoria straddles three countries (Uganda, Tanzania, and Kenya) and has a catchment area that encompasses two more (Rwanda and Burundi). Sixt will collaborate with Michael Hauser of the University of Natural Resources and Life Sciences, Vienna, and Paul Kariuki, of the Lake Victoria Basin Commission.The group will study how to adapt food systems to climate change in the Lake Victoria Basin. The basin is facing a range of climate threats that could significantly impact livelihoods and food systems in the expansive region. For example, extreme weather events like droughts and floods are negatively affecting agricultural production and freshwater resources. Across the LVB, current approaches to land and water management are unsustainable and threaten future food and water security. The Lake Victoria Basin Commission (LVBC), a specialized institution of the East African Community, wants to play a more vital role in coordinating transboundary land and water management to support transitions toward more resilient, sustainable, and equitable food systems. The primary goal of this research will be to support the LVBC’s transboundary land and water management efforts, specifically as they relate to sustainability and climate change adaptation in food systems. The research team will work with key stakeholders in Kenya, Uganda, and Tanzania to identify specific capacity needs to facilitate land and water management transitions. The two-year project will produce actionable recommendations to the LVBC. More

  • in

    A better way to study ocean currents

    To study ocean currents, scientists release GPS-tagged buoys in the ocean and record their velocities to reconstruct the currents that transport them. These buoy data are also used to identify “divergences,” which are areas where water rises up from below the surface or sinks beneath it.

    By accurately predicting currents and pinpointing divergences, scientists can more precisely forecast the weather, approximate how oil will spread after a spill, or measure energy transfer in the ocean. A new model that incorporates machine learning makes more accurate predictions than conventional models do, a new study reports.

    A multidisciplinary research team including computer scientists at MIT and oceanographers has found that a standard statistical model typically used on buoy data can struggle to accurately reconstruct currents or identify divergences because it makes unrealistic assumptions about the behavior of water.

    The researchers developed a new model that incorporates knowledge from fluid dynamics to better reflect the physics at work in ocean currents. They show that their method, which only requires a small amount of additional computational expense, is more accurate at predicting currents and identifying divergences than the traditional model.

    This new model could help oceanographers make more accurate estimates from buoy data, which would enable them to more effectively monitor the transportation of biomass (such as Sargassum seaweed), carbon, plastics, oil, and nutrients in the ocean. This information is also important for understanding and tracking climate change.

    “Our method captures the physical assumptions more appropriately and more accurately. In this case, we know a lot of the physics already. We are giving the model a little bit of that information so it can focus on learning the things that are important to us, like what are the currents away from the buoys, or what is this divergence and where is it happening?” says senior author Tamara Broderick, an associate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.

    Broderick’s co-authors include lead author Renato Berlinghieri, an electrical engineering and computer science graduate student; Brian L. Trippe, a postdoc at Columbia University; David R. Burt and Ryan Giordano, MIT postdocs; Kaushik Srinivasan, an assistant researcher in atmospheric and ocean sciences at the University of California at Los Angeles; Tamay Özgökmen, professor in the Department of Ocean Sciences at the University of Miami; and Junfei Xia, a graduate student at the University of Miami. The research will be presented at the International Conference on Machine Learning.

    Diving into the data

    Oceanographers use data on buoy velocity to predict ocean currents and identify “divergences” where water rises to the surface or sinks deeper.

    To estimate currents and find divergences, oceanographers have used a machine-learning technique known as a Gaussian process, which can make predictions even when data are sparse. To work well in this case, the Gaussian process must make assumptions about the data to generate a prediction.

    A standard way of applying a Gaussian process to oceans data assumes the latitude and longitude components of the current are unrelated. But this assumption isn’t physically accurate. For instance, this existing model implies that a current’s divergence and its vorticity (a whirling motion of fluid) operate on the same magnitude and length scales. Ocean scientists know this is not true, Broderick says. The previous model also assumes the frame of reference matters, which means fluid would behave differently in the latitude versus the longitude direction.

    “We were thinking we could address these problems with a model that incorporates the physics,” she says.

    They built a new model that uses what is known as a Helmholtz decomposition to accurately represent the principles of fluid dynamics. This method models an ocean current by breaking it down into a vorticity component (which captures the whirling motion) and a divergence component (which captures water rising or sinking).

    In this way, they give the model some basic physics knowledge that it uses to make more accurate predictions.

    This new model utilizes the same data as the old model. And while their method can be more computationally intensive, the researchers show that the additional cost is relatively small.

    Buoyant performance

    They evaluated the new model using synthetic and real ocean buoy data. Because the synthetic data were fabricated by the researchers, they could compare the model’s predictions to ground-truth currents and divergences. But simulation involves assumptions that may not reflect real life, so the researchers also tested their model using data captured by real buoys released in the Gulf of Mexico.

    This shows the trajectories of approximately 300 buoys released during the Grand LAgrangian Deployment (GLAD) in the Gulf of Mexico in the summer of 2013, to learn about ocean surface currents around the Deepwater Horizon oil spill site. The small, regular clockwise rotations are due to Earth’s rotation.Credit: Consortium of Advanced Research for Transport of Hydrocarbons in the Environment

    In each case, their method demonstrated superior performance for both tasks, predicting currents and identifying divergences, when compared to the standard Gaussian process and another machine-learning approach that used a neural network. For example, in one simulation that included a vortex adjacent to an ocean current, the new method correctly predicted no divergence while the previous Gaussian process method and the neural network method both predicted a divergence with very high confidence.

    The technique is also good at identifying vortices from a small set of buoys, Broderick adds.

    Now that they have demonstrated the effectiveness of using a Helmholtz decomposition, the researchers want to incorporate a time element into their model, since currents can vary over time as well as space. In addition, they want to better capture how noise impacts the data, such as winds that sometimes affect buoy velocity. Separating that noise from the data could make their approach more accurate.

    “Our hope is to take this noisily observed field of velocities from the buoys, and then say what is the actual divergence and actual vorticity, and predict away from those buoys, and we think that our new technique will be helpful for this,” she says.

    “The authors cleverly integrate known behaviors from fluid dynamics to model ocean currents in a flexible model,” says Massimiliano Russo, an associate biostatistician at Brigham and Women’s Hospital and instructor at Harvard Medical School, who was not involved with this work. “The resulting approach retains the flexibility to model the nonlinearity in the currents but can also characterize phenomena such as vortices and connected currents that would only be noticed if the fluid dynamic structure is integrated into the model. This is an excellent example of where a flexible model can be substantially improved with a well thought and scientifically sound specification.”

    This research is supported, in part, by the Office of Naval Research, a National Science Foundation (NSF) CAREER Award, and the Rosenstiel School of Marine, Atmospheric, and Earth Science at the University of Miami. More