More stories

  • in

    Why big changes early in life can help later on

    Imagine moving from state to state while growing up in the U.S., transferring between high schools, and eventually attending college out of state. The first two events might seem disruptive, and the third involves departing a local community. And yet, these things may be exactly what helps some people thrive later in life.

    That’s one implication of a newly published study about social networks co-authored by an MIT professor, which finds that so-called long ties — connections between people who otherwise lack any mutual contacts — are highly associated with greater economic success in life. Those long ties are fostered partly by turning points such as moving between states, and switching schools.

    The study, based on a large quantity of Facebook data, both illuminates how productive social networks are structured and identifies specific life events that significantly shape people’s networks.

    “People who have more long ties [on Facebook], and who have stronger long ties, have better economic indicators,” says Dean Eckles, an MIT professor and co-author of a new paper detailing the study’s findings.

    “Our hope is that the study provides better evidence of this really strong relationship, at the scale of the entire U.S,” Eckles says. “There hasn’t really been this sort of investigation into those types of disruptive life events.”

    The paper, “Long ties, disruptive life events, and economic prosperity,” appears in open-access form in Proceedings of the National Academy of Sciences. The authors are Eaman Jahani PhD ’21, a postdoc and lecturer at the University of California at Berkeley, who received his doctorate from MIT’s Institute for Data, Systems, and Society, and the Statistics and Data Science Center; Samuel P. Fraiberger, a data scientist at the World Bank; Michael Bailey, an economist and research scientist manager at Meta Platforms (which operates Facebook); and Eckles, an associate professor of marketing at MIT Sloan School of Management. Jahani, who worked at Meta when the study was conducted, performed the initial research, and the aggregate data analysis protected the privacy of individuals in compliance with regulations.

    On the move

    In recent decades, scholars have often analyzed social networks while building on a 1973 study by Stanford University’s Mark Granovetter, “The Strength of Weak Ties,” one of the 10 most-cited social science papers of all time. In it, Granovetter postulated that a network’s “weak ties”— the people you know less well — are vital. Your best friends may have networks quite similar to your own, but your “weak ties” provide additional connections useful for employment, and more. Granovetter also edited this current paper for PNAS.

    To conduct the study, the scholars mapped all reciprocal interactions among U.S.-based Facebook accounts from December 2020 to June 2021, to build a data-rich picture of social networks in action. The researchers maintain a distinction between “long” and “short” ties; in this definition, long ties have no other mutual connections at all, while short ties have some.

    Ultimately the scholars found that, when assessing everyone who has lived in the same state since 2012, those who had previously moved among U.S. states had 13 percent more long ties on Facebook than those who had not. Similarly, people who had switched high schools had 10 percent more long ties than people who had not.

    Facebook does not have income data for its users, so the scholars used a series of proxy measures to evaluate financial success. People with more long ties tend to live in higher-income areas, have more internet-connected devices, use more expensive mobile phones, and make more donations to charitable causes, compared to those who do not.

    Additionally, the research evaluates whether or not moving among states, or switching schools, is itself what causes people to have more long ties. After all, it could be the case that families who move more often have qualities that lead family members to be more proactive about forging ties with people.

    To examine this, the research team analyzed a subgroup of Facebook users who had switched high schools only when their first high school closed — meaning it was not their choice to change. Those people had 6 percent more long ties than those who had attended the same high schools but not been forced to switch; given this common pool of school attendees forced into divergent circumstances, the evidence suggests that making the school change itself “shapes the proclivity to connect with different communities,” as the scholars write in the paper. 

    “It’s a plausibly random nudge,” Eckles says, “and we find the people who were exposed to these high school closures end up with more long ties. I think that is one of the compelling elements pointing toward a causal story here.”

    Three types of events, same trend

    As the scholars acknowledge in the paper, there are some limitations to the study. Because it focuses on Facebook interactions, the research does not account for offline activities that may sustain social networks. It is also likely that economic success itself shapes people’s social networks, and not just that networks help shape success. Some people may have opportunities to maintain long ties, through professional work or travel, that others do not.

    On the other hand, the study does uncover long-term social network ties that had not been evaluated before, and, as the authors write,”having three different types of events — involving different processes by which people are selected into the disruption — pointing to the same conclusions makes for a more robust and notable pattern.”

    Other scholars in the field believe the study is a notable piece of research. In a commentary on the paper also published in PNAS, Michael Macy, a sociology professor at Cornell University, writes that “the authors demonstrate the importance of contributing to cumulative knowledge by confirming hypotheses derived from foundational theory while at the same time elaborating on what was previously known by digging deeper into the underlying causal mechanisms. In short, the paper is must reading not only for area specialists but for social scientists across the disciplines.”

    For his part, Eckles emphasizes that the researchers are releasing anonymized data from the study, so that other scholars can build on it, and develop additional insights about social network structure, while complying with all privacy regulations.

    “We’ve released [that] data and made it public, and we’re really happy to be doing that,” Eckles says. “We want to make as much of this as possible open to others. That’s one of the things that I’m hoping is part of the broader impact of the paper.”

    Jahani worked as a contractor at Meta Platforms, which operates Facebook, while conducting the research. Eckles has received past funding from Meta, as well as conference sponsorship, and previously worked there, before joining MIT.   More

  • in

    Learning the language of molecules to predict their properties

    Discovering new materials and drugs typically involves a manual, trial-and-error process that can take decades and cost millions of dollars. To streamline this process, scientists often use machine learning to predict molecular properties and narrow down the molecules they need to synthesize and test in the lab.

    Researchers from MIT and the MIT-Watson AI Lab have developed a new, unified framework that can simultaneously predict molecular properties and generate new molecules much more efficiently than these popular deep-learning approaches.

    To teach a machine-learning model to predict a molecule’s biological or mechanical properties, researchers must show it millions of labeled molecular structures — a process known as training. Due to the expense of discovering molecules and the challenges of hand-labeling millions of structures, large training datasets are often hard to come by, which limits the effectiveness of machine-learning approaches.

    By contrast, the system created by the MIT researchers can effectively predict molecular properties using only a small amount of data. Their system has an underlying understanding of the rules that dictate how building blocks combine to produce valid molecules. These rules capture the similarities between molecular structures, which helps the system generate new molecules and predict their properties in a data-efficient manner.

    This method outperformed other machine-learning approaches on both small and large datasets, and was able to accurately predict molecular properties and generate viable molecules when given a dataset with fewer than 100 samples.

    “Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments,” says lead author Minghao Guo, a computer science and electrical engineering (EECS) graduate student.

    Guo’s co-authors include MIT-IBM Watson AI Lab research staff members Veronika Thost, Payel Das, and Jie Chen; recent MIT graduates Samuel Song ’23 and Adithya Balachandran ’23; and senior author Wojciech Matusik, a professor of electrical engineering and computer science and a member of the MIT-IBM Watson AI Lab, who leads the Computational Design and Fabrication Group within the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference for Machine Learning.

    Learning the language of molecules

    To achieve the best results with machine-learning models, scientists need training datasets with millions of molecules that have similar properties to those they hope to discover. In reality, these domain-specific datasets are usually very small. So, researchers use models that have been pretrained on large datasets of general molecules, which they apply to a much smaller, targeted dataset. However, because these models haven’t acquired much domain-specific knowledge, they tend to perform poorly.

    The MIT team took a different approach. They created a machine-learning system that automatically learns the “language” of molecules — what is known as a molecular grammar — using only a small, domain-specific dataset. It uses this grammar to construct viable molecules and predict their properties.

    In language theory, one generates words, sentences, or paragraphs based on a set of grammar rules. You can think of a molecular grammar the same way. It is a set of production rules that dictate how to generate molecules or polymers by combining atoms and substructures.

    Just like a language grammar, which can generate a plethora of sentences using the same rules, one molecular grammar can represent a vast number of molecules. Molecules with similar structures use the same grammar production rules, and the system learns to understand these similarities.

    Since structurally similar molecules often have similar properties, the system uses its underlying knowledge of molecular similarity to predict properties of new molecules more efficiently. 

    “Once we have this grammar as a representation for all the different molecules, we can use it to boost the process of property prediction,” Guo says.

    The system learns the production rules for a molecular grammar using reinforcement learning — a trial-and-error process where the model is rewarded for behavior that gets it closer to achieving a goal.

    But because there could be billions of ways to combine atoms and substructures, the process to learn grammar production rules would be too computationally expensive for anything but the tiniest dataset.

    The researchers decoupled the molecular grammar into two parts. The first part, called a metagrammar, is a general, widely applicable grammar they design manually and give the system at the outset. Then it only needs to learn a much smaller, molecule-specific grammar from the domain dataset. This hierarchical approach speeds up the learning process.

    Big results, small datasets

    In experiments, the researchers’ new system simultaneously generated viable molecules and polymers, and predicted their properties more accurately than several popular machine-learning approaches, even when the domain-specific datasets had only a few hundred samples. Some other methods also required a costly pretraining step that the new system avoids.

    The technique was especially effective at predicting physical properties of polymers, such as the glass transition temperature, which is the temperature required for a material to transition from solid to liquid. Obtaining this information manually is often extremely costly because the experiments require extremely high temperatures and pressures.

    To push their approach further, the researchers cut one training set down by more than half — to just 94 samples. Their model still achieved results that were on par with methods trained using the entire dataset.

    “This grammar-based representation is very powerful. And because the grammar itself is a very general representation, it can be deployed to different kinds of graph-form data. We are trying to identify other applications beyond chemistry or material science,” Guo says.

    In the future, they also want to extend their current molecular grammar to include the 3D geometry of molecules and polymers, which is key to understanding the interactions between polymer chains. They are also developing an interface that would show a user the learned grammar production rules and solicit feedback to correct rules that may be wrong, boosting the accuracy of the system.

    This work is funded, in part, by the MIT-IBM Watson AI Lab and its member company, Evonik. More

  • in

    Researchers teach an AI to write better chart captions

    Chart captions that explain complex trends and patterns are important for improving a reader’s ability to comprehend and retain the data being presented. And for people with visual disabilities, the information in a caption often provides their only means of understanding the chart.

    But writing effective, detailed captions is a labor-intensive process. While autocaptioning techniques can alleviate this burden, they often struggle to describe cognitive features that provide additional context.

    To help people author high-quality chart captions, MIT researchers have developed a dataset to improve automatic captioning systems. Using this tool, researchers could teach a machine-learning model to vary the level of complexity and type of content included in a chart caption based on the needs of users.

    The MIT researchers found that machine-learning models trained for autocaptioning with their dataset consistently generated captions that were precise, semantically rich, and described data trends and complex patterns. Quantitative and qualitative analyses revealed that their models captioned charts more effectively than other autocaptioning systems.  

    The team’s goal is to provide the dataset, called VisText, as a tool researchers can use as they work on the thorny problem of chart autocaptioning. These automatic systems could help provide captions for uncaptioned online charts and improve accessibility for people with visual disabilities, says co-lead author Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    “We’ve tried to embed a lot of human values into our dataset so that when we and other researchers are building automatic chart-captioning systems, we don’t end up with models that aren’t what people want or need,” she says.

    Boggust is joined on the paper by co-lead author and fellow graduate student Benny J. Tang and senior author Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

    Human-centered analysis

    The researchers were inspired to develop VisText from prior work in the Visualization Group that explored what makes a good chart caption. In that study, researchers found that sighted users and blind or low-vision users had different preferences for the complexity of semantic content in a caption. 

    The group wanted to bring that human-centered analysis into autocaptioning research. To do that, they developed VisText, a dataset of charts and associated captions that could be used to train machine-learning models to generate accurate, semantically rich, customizable captions.

    Developing effective autocaptioning systems is no easy task. Existing machine-learning methods often try to caption charts the way they would an image, but people and models interpret natural images differently from how we read charts. Other techniques skip the visual content entirely and caption a chart using its underlying data table. However, such data tables are often not available after charts are published.

    Given the shortfalls of using images and data tables, VisText also represents charts as scene graphs. Scene graphs, which can be extracted from a chart image, contain all the chart data but also include additional image context.

    “A scene graph is like the best of both worlds — it contains almost all the information present in an image while being easier to extract from images than data tables. As it’s also text, we can leverage advances in modern large language models for captioning,” Tang explains.

    They compiled a dataset that contains more than 12,000 charts — each represented as a data table, image, and scene graph — as well as associated captions. Each chart has two separate captions: a low-level caption that describes the chart’s construction (like its axis ranges) and a higher-level caption that describes statistics, relationships in the data, and complex trends.

    The researchers generated low-level captions using an automated system and crowdsourced higher-level captions from human workers.

    “Our captions were informed by two key pieces of prior research: existing guidelines on accessible descriptions of visual media and a conceptual model from our group for categorizing semantic content. This ensured that our captions featured important low-level chart elements like axes, scales, and units for readers with visual disabilities, while retaining human variability in how captions can be written,” says Tang.

    Translating charts

    Once they had gathered chart images and captions, the researchers used VisText to train five machine-learning models for autocaptioning. They wanted to see how each representation — image, data table, and scene graph — and combinations of the representations affected the quality of the caption.

    “You can think about a chart captioning model like a model for language translation. But instead of saying, translate this German text to English, we are saying translate this ‘chart language’ to English,” Boggust says.

    Their results showed that models trained with scene graphs performed as well or better than those trained using data tables. Since scene graphs are easier to extract from existing charts, the researchers argue that they might be a more useful representation.

    They also trained models with low-level and high-level captions separately. This technique, known as semantic prefix tuning, enabled them to teach the model to vary the complexity of the caption’s content.

    In addition, they conducted a qualitative examination of captions produced by their best-performing method and categorized six types of common errors. For instance, a directional error occurs if a model says a trend is decreasing when it is actually increasing.

    This fine-grained, robust qualitative evaluation was important for understanding how the model was making its errors. For example, using quantitative methods, a directional error might incur the same penalty as a repetition error, where the model repeats the same word or phrase. But a directional error could be more misleading to a user than a repetition error. The qualitative analysis helped them understand these types of subtleties, Boggust says.

    These sorts of errors also expose limitations of current models and raise ethical considerations that researchers must consider as they work to develop autocaptioning systems, she adds.

    Generative machine-learning models, such as those that power ChatGPT, have been shown to hallucinate or give incorrect information that can be misleading. While there is a clear benefit to using these models for autocaptioning existing charts, it could lead to the spread of misinformation if charts are captioned incorrectly.

    “Maybe this means that we don’t just caption everything in sight with AI. Instead, perhaps we provide these autocaptioning systems as authorship tools for people to edit. It is important to think about these ethical implications throughout the research process, not just at the end when we have a model to deploy,” she says.

    Boggust, Tang, and their colleagues want to continue optimizing the models to reduce some common errors. They also want to expand the VisText dataset to include more charts, and more complex charts, such as those with stacked bars or multiple lines. And they would also like to gain insights into what these autocaptioning models are actually learning about chart data.

    This research was supported, in part, by a Google Research Scholar Award, the National Science Foundation, the MLA@CSAIL Initiative, and the United States Air Force Research Laboratory. More

  • in

    MIT-Pillar AI Collective announces first seed grant recipients

    The MIT-Pillar AI Collective has announced its first six grant recipients. Students, alumni, and postdocs working on a broad range of topics in artificial intelligence, machine learning, and data science will receive funding and support for research projects that could translate into commercially viable products or companies. These grants are intended to help students explore commercial applications for their research, and eventually drive that commercialization through the creation of a startup.

    “These tremendous students and postdocs are working on projects that have the potential to be truly transformative across a diverse range of industries. It’s thrilling to think that the novel research these teams are conducting could lead to the founding of startups that revolutionize everything from drug delivery to video conferencing,” says Anantha Chandrakasan, dean of the School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science.

    Launched in September 2022, the MIT-Pillar AI Collective is a pilot program funded by a $1 million gift from Pillar VC that aims to cultivate prospective entrepreneurs and drive innovation in areas related to AI. Administered by the MIT Deshpande Center for Technological Innovation, the AI Collective centers on the market discovery process, advancing projects through market research, customer discovery, and prototyping. Graduate students and postdocs supported by the program work toward the development of minimum viable products.

    “In addition to funding, the MIT-Pillar AI Collective provides grant recipients with mentorship and guidance. With the rapid advancement of AI technologies, this type of support is critical to ensure students and postdocs are able to access the resources required to move quickly in this fast-pace environment,” says Jinane Abounadi, managing director of the MIT-Pillar AI Collective.

    The six inaugural recipients will receive support in identifying key milestones and advice from experienced entrepreneurs. The AI Collective assists seed grant recipients in gathering feedback from potential end-users, as well as getting insights from early-stage investors. The program also organizes community events, including a “Founder Talks” speaker series, and other team-building activities.   

    “Each one of these grant recipients exhibits an entrepreneurial spirit. It is exciting to provide support and guidance as they start a journey that could one day see them as founders and leaders of successful companies,” adds Jamie Goldstein ’89, founder of Pillar VC.

    The first cohort of grant recipients include the following projects:

    Predictive query interface

    Abdullah Alomar SM ’21, a PhD candidate studying electrical engineering and computer science, is building a predictive query interface for time series databases to better forecast demand and financial data. This user-friendly interface can help alleviate some of the bottlenecks and issues related to unwieldy data engineering processes while providing state-of-the-art statistical accuracy. Alomar is advised by Devavrat Shah, the Andrew (1956) and Erna Viterbi Professor at MIT.

    Design of light-activated drugs

    Simon Axelrod, a PhD candidate studying chemical physics at Harvard University, is combining AI with physics simulations to design light-activated drugs that could reduce side effects and improve effectiveness. Patients would receive an inactive form of a drug, which is then activated by light in a specific area of the body containing diseased tissue. This localized use of photoactive drugs would minimize the side effects from drugs targeting healthy cells. Axelrod is developing novel computational models that predict properties of photoactive drugs with high speed and accuracy, allowing researchers to focus on only the highest-quality drug candidates. He is advised by Rafael Gomez-Bombarelli, the Jeffrey Cheah Career Development Chair in Engineering in the MIT Department of Materials Science and Engineering. 

    Low-cost 3D perception

    Arjun Balasingam, a PhD student in electrical engineering and computer science and a member of the Computer Science and Artificial Intelligence Laboratory’s (CSAIL) Networks and Mobile Systems group, is developing a technology, called MobiSee, that enables real-time 3D reconstruction in challenging dynamic environments. MobiSee uses self-supervised AI methods along with video and lidar to provide low-cost, state-of-the-art 3D perception on consumer mobile devices like smartphones. This technology could have far-reaching applications across mixed reality, navigation, safety, and sports streaming, in addition to unlocking opportunities for new real-time and immersive experiences. He is advised by Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence at MIT and member of CSAIL.

    Sleep therapeutics

    Guillermo Bernal SM ’14, PhD ’23, a recent PhD graduate in media arts and sciences, is developing a sleep therapeutic platform that would enable sleep specialists and researchers to conduct robust sleep studies and develop therapy plans remotely, while the patient is comfortable in their home. Called Fascia, the three-part system consists of a polysomnogram with a sleep mask form factor that collects data, a hub that enables researchers to provide stimulation and feedback via olfactory, auditory, and visual stimuli, and a web portal that enables researchers to read a patient’s signals in real time with machine learning analysis. Bernal was advised by Pattie Maes, professor of media arts and sciences at the MIT Media Lab.

    Autonomous manufacturing assembly with human-like tactile perception

    Michael Foshey, a mechanical engineer and project manager with MIT CSAIL’s Computational Design and Fabrication Group, is developing an AI-enabled tactile perception system that can be used to give robots human-like dexterity. With this new technology platform, Foshey and his team hope to enable industry-changing applications in manufacturing. Currently, assembly tasks in manufacturing are largely done by hand and are typically repetitive and tedious. As a result, these jobs are being largely left unfilled. These labor shortages can cause supply chain shortages and increases in the cost of production. Foshey’s new technology platform aims to address this by automating assembly tasks to reduce reliance on manual labor. Foshey is supervised by Wojciech Matusik, MIT professor of electrical engineering and computer science and member of CSAIL.  

    Generative AI for video conferencing

    Vibhaalakshmi Sivaraman SM ’19, a PhD candidate in electrical engineering and computer science who is a member of CSAIL’s Networking and Mobile Systems Group, is developing a generative technology, Gemino, to facilitate video conferencing in high-latency and low-bandwidth network environments. Gemino is a neural compression system for video conferencing that overcomes the robustness concerns and compute complexity challenges that limit current face-image-synthesis models. This technology could enable sustained video conferencing calls in regions and scenarios that cannot reliably support video calls today. Sivaraman is advised by Mohammad Alizadeh, MIT associate professor of electrical engineering and computer science and member of CSAIL.  More

  • in

    Bringing the social and ethical responsibilities of computing to the forefront

    There has been a remarkable surge in the use of algorithms and artificial intelligence to address a wide range of problems and challenges. While their adoption, particularly with the rise of AI, is reshaping nearly every industry sector, discipline, and area of research, such innovations often expose unexpected consequences that involve new norms, new expectations, and new rules and laws.

    To facilitate deeper understanding, the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Schwarzman College of Computing, recently brought together social scientists and humanists with computer scientists, engineers, and other computing faculty for an exploration of the ways in which the broad applicability of algorithms and AI has presented both opportunities and challenges in many aspects of society.

    “The very nature of our reality is changing. AI has the ability to do things that until recently were solely the realm of human intelligence — things that can challenge our understanding of what it means to be human,” remarked Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, in his opening address at the inaugural SERC Symposium. “This poses philosophical, conceptual, and practical questions on a scale not experienced since the start of the Enlightenment. In the face of such profound change, we need new conceptual maps for navigating the change.”

    The symposium offered a glimpse into the vision and activities of SERC in both research and education. “We believe our responsibility with SERC is to educate and equip our students and enable our faculty to contribute to responsible technology development and deployment,” said Georgia Perakis, the William F. Pounds Professor of Management in the MIT Sloan School of Management, co-associate dean of SERC, and the lead organizer of the symposium. “We’re drawing from the many strengths and diversity of disciplines across MIT and beyond and bringing them together to gain multiple viewpoints.”

    Through a succession of panels and sessions, the symposium delved into a variety of topics related to the societal and ethical dimensions of computing. In addition, 37 undergraduate and graduate students from a range of majors, including urban studies and planning, political science, mathematics, biology, electrical engineering and computer science, and brain and cognitive sciences, participated in a poster session to exhibit their research in this space, covering such topics as quantum ethics, AI collusion in storage markets, computing waste, and empowering users on social platforms for better content credibility.

    Showcasing a diversity of work

    In three sessions devoted to themes of beneficent and fair computing, equitable and personalized health, and algorithms and humans, the SERC Symposium showcased work by 12 faculty members across these domains.

    One such project from a multidisciplinary team of archaeologists, architects, digital artists, and computational social scientists aimed to preserve endangered heritage sites in Afghanistan with digital twins. The project team produced highly detailed interrogable 3D models of the heritage sites, in addition to extended reality and virtual reality experiences, as learning resources for audiences that cannot access these sites.

    In a project for the United Network for Organ Sharing, researchers showed how they used applied analytics to optimize various facets of an organ allocation system in the United States that is currently undergoing a major overhaul in order to make it more efficient, equitable, and inclusive for different racial, age, and gender groups, among others.

    Another talk discussed an area that has not yet received adequate public attention: the broader implications for equity that biased sensor data holds for the next generation of models in computing and health care.

    A talk on bias in algorithms considered both human bias and algorithmic bias, and the potential for improving results by taking into account differences in the nature of the two kinds of bias.

    Other highlighted research included the interaction between online platforms and human psychology; a study on whether decision-makers make systemic prediction mistakes on the available information; and an illustration of how advanced analytics and computation can be leveraged to inform supply chain management, operations, and regulatory work in the food and pharmaceutical industries.

    Improving the algorithms of tomorrow

    “Algorithms are, without question, impacting every aspect of our lives,” said Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science, in kicking off a panel she moderated on the implications of data and algorithms.

    “Whether it’s in the context of social media, online commerce, automated tasks, and now a much wider range of creative interactions with the advent of generative AI tools and large language models, there’s little doubt that much more is to come,” Ozdaglar said. “While the promise is evident to all of us, there’s a lot to be concerned as well. This is very much time for imaginative thinking and careful deliberation to improve the algorithms of tomorrow.”

    Turning to the panel, Ozdaglar asked experts from computing, social science, and data science for insights on how to understand what is to come and shape it to enrich outcomes for the majority of humanity.

    Sarah Williams, associate professor of technology and urban planning at MIT, emphasized the critical importance of comprehending the process of how datasets are assembled, as data are the foundation for all models. She also stressed the need for research to address the potential implication of biases in algorithms that often find their way in through their creators and the data used in their development. “It’s up to us to think about our own ethical solutions to these problems,” she said. “Just as it’s important to progress with the technology, we need to start the field of looking at these questions of what biases are in the algorithms? What biases are in the data, or in that data’s journey?”

    Shifting focus to generative models and whether the development and use of these technologies should be regulated, the panelists — which also included MIT’s Srini Devadas, professor of electrical engineering and computer science, John Horton, professor of information technology, and Simon Johnson, professor of entrepreneurship — all concurred that regulating open-source algorithms, which are publicly accessible, would be difficult given that regulators are still catching up and struggling to even set guardrails for technology that is now 20 years old.

    Returning to the question of how to effectively regulate the use of these technologies, Johnson proposed a progressive corporate tax system as a potential solution. He recommends basing companies’ tax payments on their profits, especially for large corporations whose massive earnings go largely untaxed due to offshore banking. By doing so, Johnson said that this approach can serve as a regulatory mechanism that discourages companies from trying to “own the entire world” by imposing disincentives.

    The role of ethics in computing education

    As computing continues to advance with no signs of slowing down, it is critical to educate students to be intentional in the social impact of the technologies they will be developing and deploying into the world. But can one actually be taught such things? If so, how?

    Caspar Hare, professor of philosophy at MIT and co-associate dean of SERC, posed this looming question to faculty on a panel he moderated on the role of ethics in computing education. All experienced in teaching ethics and thinking about the social implications of computing, each panelist shared their perspective and approach.

    A strong advocate for the importance of learning from history, Eden Medina, associate professor of science, technology, and society at MIT, said that “often the way we frame computing is that everything is new. One of the things that I do in my teaching is look at how people have confronted these issues in the past and try to draw from them as a way to think about possible ways forward.” Medina regularly uses case studies in her classes and referred to a paper written by Yale University science historian Joanna Radin on the Pima Indian Diabetes Dataset that raised ethical issues on the history of that particular collection of data that many don’t consider as an example of how decisions around technology and data can grow out of very specific contexts.

    Milo Phillips-Brown, associate professor of philosophy at Oxford University, talked about the Ethical Computing Protocol that he co-created while he was a SERC postdoc at MIT. The protocol, a four-step approach to building technology responsibly, is designed to train computer science students to think in a better and more accurate way about the social implications of technology by breaking the process down into more manageable steps. “The basic approach that we take very much draws on the fields of value-sensitive design, responsible research and innovation, participatory design as guiding insights, and then is also fundamentally interdisciplinary,” he said.

    Fields such as biomedicine and law have an ethics ecosystem that distributes the function of ethical reasoning in these areas. Oversight and regulation are provided to guide front-line stakeholders and decision-makers when issues arise, as are training programs and access to interdisciplinary expertise that they can draw from. “In this space, we have none of that,” said John Basl, associate professor of philosophy at Northeastern University. “For current generations of computer scientists and other decision-makers, we’re actually making them do the ethical reasoning on their own.” Basl commented further that teaching core ethical reasoning skills across the curriculum, not just in philosophy classes, is essential, and that the goal shouldn’t be for every computer scientist be a professional ethicist, but for them to know enough of the landscape to be able to ask the right questions and seek out the relevant expertise and resources that exists.

    After the final session, interdisciplinary groups of faculty, students, and researchers engaged in animated discussions related to the issues covered throughout the day during a reception that marked the conclusion of the symposium. More

  • in

    MIT researchers make language models scalable self-learners

    Socrates once said: “It is not the size of a thing, but the quality that truly matters. For it is in the nature of substance, not its volume, that true value is found.”

    Does size always matter for large language models (LLMs)? In a technological landscape bedazzled by LLMs taking center stage, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers think smaller models shouldn’t be overlooked, especially for natural language understanding products widely deployed in the industry.

    To that end, the researchers cooked up an approach to long-standing problems of inefficiency and privacy associated with big, text-based AI models — a logic-aware model that outperforms 500-times-bigger counterparts on some language understanding tasks without human-generated annotations, while preserving privacy and robustness with high performance.

    LLMs, which have shown some promising skills in generating language, art, and code, are computationally expensive, and their data requirements can risk privacy leaks when using application programming interfaces for data upload. Smaller models have been historically less capable, particularly in multitasking and weakly supervised tasks, compared to their larger counterparts.

    So what’s helping these smaller models act so mighty, then? Something called “textual entailment,” a way to help these models understand a variety of language tasks, where if one sentence (the premise) is true, then the other sentence (the hypothesis) is likely to be true as well. For example, if the premise is, “all cats have tails” then the hypothesis “a tabby cat has a tail” would be entailed by the premise. This concept is used to train an “entailment model” that proved to be less biased than other language models, from the team’s previous research. They then created “prompts” that the models can use to figure out if certain information is entailed by a given sentence or phrase according to different tasks. This method improved the model’s ability to adapt to different tasks without any additional training, known as zero-shot adaptation.

    In the realm of “natural language understanding,” there are various applications that hinge on determining the relationship between two pieces of text. For example, in sentiment classification, a statement like “I think the movie is good” can be inferred or entailed from a movie review that says, “I like the story and the acting is great,” indicating a positive sentiment. Another is news classification, where the topic of a news article can be inferred from its content. For example, a statement like “the news article is about sports” can be entailed if the main content of the article reports on an NBA game. The key insight was that many existing natural language understanding tasks could be recast as an entailment (i.e., logical inference in natural language) task. 

    “Our research is about improving the ability of computer programs to understand and process natural language — the way humans speak and write. Our self-trained, 350-million-parameter entailment models, without human-generated labels, outperform supervised language models with 137 to 175 billion parameters,” says MIT CSAIL postdoc Hongyin Luo, lead author on a new paper about the study. “This has potential to reshape the landscape of AI and machine learning, providing a more scalable, trustworthy, and cost-effective solution to language modeling,” says Luo. “By proving that smaller models can perform at the same level as larger ones for language understanding, this work paves the way for more sustainable and privacy-preserving AI technologies.” 

    The team discovered that they could improve the model’s performance even more by using a technique called “self-training,” where the model uses its own predictions to teach itself, effectively learning without human supervision and additional annotated training data.The self-training method significantly improved performance on a bunch of downstream tasks, including sentiment analysis, question-answering, and news classification. It outperformed both Google’s LaMDA and FLAN in zero-shot capabilities, GPT models, and other supervised algorithms. 

    However, one challenge with self-training is that the model can sometimes generate incorrect or noisy labels that harm performance. To overcome this, they developed a new algorithm called ‘SimPLE’ (Simple Pseudo-Label Editing), a process to review and modify the pseudo-labels made in initial rounds of learning. By correcting any mislabeled instances, it improved the overall quality of the self-generated labels. This not only made the models more effective at understanding language, but more robust when faced with adversarial data. 

    As with most research, there are some limitations. The self-training on multi-class classification tasks didn’t perform as well as on binary natural language understanding tasks, indicating the challenge of applying entailment models to multi-choice tasks.“This research presents an efficient and effective way to train large language models (LLMs) by formulating natural language understanding tasks as contextual entailment problems and employing a pseudo-labeling self-training mechanism to incorporate large quantities of unlabelled text data in the training process,” adds CSAIL Senior Research Scientist James Glass, who is also an author on the paper. “While the field of LLMs is undergoing rapid and dramatic changes, this research shows that it is possible to produce relatively compact language models that perform very well on benchmark understanding tasks compared to their peers of roughly the same size, or even much larger language models.”

    “Entailment task is a popular proxy to evaluate “understanding” of a given context by an AI model,” says Leonid Karlinsky, research staff member at the MIT-IBM Watson AI Lab. “It is used in many areas analyzing models with unimodal, like LLMs, and and multi-modal, like VLMs [visual language models] inputs, simplifying the task of question-answering about a given input context to a binary classification problem — does this context entail a certain (e.g., text) conclusion or not? This paper makes two contributions in this space. First, it proposes a way to improve the zero-shot (without additional tuning) NLU performance and robustness to adversarial attacks via tuning with synthesized (specialized) entailment tasks generated for the primal NLU task. Second, it offers a self-supervised SimPLE method including pseudo-labeling and confidence-based filtering to further improve large LLMs’ NLU performance.”

    Luo and Glass wrote the paper with Yoon Kim, a CSAIL member and assistant professor in MIT’s Department of Electrical Engineering and Computer Science, and Jiaxin Ge of Peking University. Their work will be presented at the meeting of the Association for Computational Linguistics in Toronto, Ontario this July. This research was supported by a grant from the Hong Kong Innovation AI program. More

  • in

    Scaling audio-visual learning without labels

    Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere have developed a new technique for analyzing unlabeled audio and visual data that could improve the performance of machine-learning models used in applications like speech recognition and object detection. The work, for the first time, combines two architectures of self-supervised learning, contrastive learning and masked data modeling, in an effort to scale machine-learning tasks like event classification in single- and multimodal data without the need for annotation, thereby replicating how humans understand and perceive our world.

    “A larger portion of human knowledge is learned in a self-supervised way, because we don’t always get supervision signals, and we want to enable the machine-learning model to have the same ability,” says Yuan Gong, an MIT postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    “So, another way to put it is that self-supervised learning often forms the foundation of an initial model, because it can learn on vast amounts of unlabeled data. And then you can use classical, supervised learning or reinforcement learning to fine tune the model to something particular if you want to,” says Jim Glass, an MIT senior research scientist and member of the MIT-IBM Watson AI Lab.

    The technique, called the contrastive audio-visual masked autoencoder (CAV-MAE), is a type of neural network that can learn to extract and map meaningful latent representations into high-dimensional space from acoustic and visual data by training on large YouTube datasets of audio and video 10-second clips. The researchers say the technique is more effective than previous approaches because it explicitly models the relationships between audio and visual data in a way that other methods do not.

    Joining Gong and Glass on the study are graduate students Andrew Rouditchenko and Alexander H. Liu of MIT, David Harwath PhD ’18 of the University of Texas at Austin, and MIT-IBM Watson AI Lab members Leonid Karlinsky and Hilde Kuehne. Kuehne is also affiliated with Goethe University Frankfurt. The method was recently presented at the International Conference on Learning Representations.

    A joint and coordinated approach

    The CAV-MAE works by “learning by prediction” and “learning by comparison,” says Gong. The masked data modeling, or the prediction method, takes a video along with its coordinated audio waveform, converts the audio to a spectrogram, and masks 75 percent of both. The unmasked data is tokenized, then fed into separate audio and visual encoders before entering a joint encoder/decoder, where the model is asked to recover the missing data. The difference (reconstruction loss) between the resulting reconstructed prediction and the original audio-visual combination is then used to train the model for better performance. An example of this would be covering part of a video of a piano and part of a spectrogram of piano music, and then asking the model to try to determine the masked inputs. Unfortunately, this method may not capture the association between the video and audio pair, whereas contrastive learning leverages this, but may discard some modality-unique information, like the background in a video.

    Contrastive learning aims to map representations that are similar close to each other. For example, the model will attempt to place different video and audio data of different parrots close to each other and further away from pairs of video and audio of guitars playing. In a similar fashion to masked autoencoding, audio-visual pairs are passed into separate modality encoders; however, the audio and visual components are kept separately within the joint encoder before the model performs pooling and contrastive loss. In this way, contrastive learning tries to identify the parts of each audio or video that are most relevant to the other. For example, if a video shows someone speaking and the corresponding audio clip contains speech, the autoencoder will learn to associate the mouth movements of the speaker with the words being spoken. It will then adjust the model’s parameters so that those inputs are represented close to each other. Ultimately, the CAV-MAE method combines both techniques with multiple forward data streams with masking as a first step, modality-specific encoders, and layer normalization so that the representation strengths are similar.

    “We [then] wanted to compare the proposed CAV-MAE with a model trained only with a masked autoencoder and a model trained only with contrastive learning, because we want to show that by combining masked autoencoder and contrastive learning, we can get some performance improvement,” says Gong, “and the results support our hypothesis that there’s obvious improvement.”

    The researchers tested CAV-MAE — as well as their method without contrastive loss or a masked autoencoder — against other state-of-the-art methods on audio-visual retrieval and audio-visual event classification tasks using standard AudioSet (20K and 2M) and VGGSound datasets — labeled, realistic short clips, which could include multiple sounds. Audio-visual retrieval means that the model sees either the audio or visual component of a query pair and searches for the missing one; event classification includes identifying actions or sounds within data, like a person singing or a car driving.

    Overall, they found that contrastive learning and masked data modeling are complementary methods. CAV-MAE was able to outperform previous techniques (with fully self-supervised pre-training) by about 2 percent for event classification performance verses models with comparable computation and, more impressively, kept pace with or outperformed models with industry-level computational resources. The team’s model ranked similarly to models trained with only the contrastive loss. And surprisingly, the team says, the incorporation of multi-modal data into CAV-MAE pre-training greatly improves the fine-tuning of single-modality representation via supervised learning (with some labeled data) and performance on audio-only event classification tasks. This demonstrates that, like humans, multi-modal information provides an additional “soft label” boost even for audio or visual only tasks; for instance, it helps the model to understand if it’s looking for an electric or acoustic guitar — a richer supervision signal.

    “I think people like the elegance of this model for combining information in the different audio and visual streams. It has the contrastive and the reconstruction loss, and compared to models that have been evaluated with similar data, it clearly does very well across a range of these tasks,” says Glass.

    Building on this, “one special thing is, our model can do both classification and the retrieval, which is not common,” Gong adds. “Before this work, these methods are used separately, but after this work, I see that most of the audio-visual learning frameworks use contracting loss and the masked autoencoder together, implicitly or explicitly.”

    Bringing self-supervised audio-visual learning into our world

    The researchers see their contribution of the contrastive audio-visual masked autoencoder (CAV-MAE) as an important milestone and a step forward for applications, which are increasingly moving from single modality to multi-modality and which require or leverage audio-visual fusion. They hypothesize that one day it could be used for action recognition in realms like sports, education, entertainment, motor vehicles, and public safety. It could also, one day, extend to other modalities. At this time, the fact that, “this only applies to audio-visual data may be a limitation, but we are targeting multi-modal learning, which is trend of machine learning,” says Gong. “As humans, we have multi-modalities — we have smell, touch — many more things that just audio-visual. So, when we try to build AI, we try to mimic humans somehow, not necessarily from the biological perspective, and this method could [potentially be] generalized to other unexplored modalities.”

    As machine-learning models continue to play an increasingly important role in our lives, techniques like this one will become increasingly valuable.

    This research was supported by the MIT-IBM Watson AI Lab. More

  • in

    Study doubles the number of known repeating fast radio bursts

    Fast radio bursts (FRBs) are repeating flashes of radio waves that remain a source of mystery to astronomers. We do know a few things about them: FRBs originate from far outside the Milky Way, for instance, and they’re probably produced from the cinders of dying stars. While many astronomical radio waves have been observed to have burst only once, some waves have been seen bursting multiple times — a puzzle that has led astronomers to question if these radio waves are similar in nature and origin.

    Now, a large team of astronomers, including several from the MIT Kavli Institute for Astrophysics and Space Research and the MIT Department of Physics, have collaborated on work to decipher the origin and nature of FRBs. Their recent open-access publication in The Astrophysical Journal reports the discovery of 25 new repeating FRB sources, doubling the known number of these phenomena known to scientists to 50. In addition, the team found that many repeating FRBs are inactive, producing less than one burst per week of observing time.

    The Canadian-led Canadian Hydrogen Intensity Mapping Experiment (CHIME) has been instrumental in detecting thousands of FRBs as it scans the entire northern sky. So, astronomers with the CHIME/FRB Collaboration developed a new set of statistics tools to comb through massive sets of data to find every repeating source detected so far. This provided a valuable opportunity for astronomers to observe the same source with different telescopes and study the diversity of emission. “We can now accurately calculate the probability that two or more bursts coming from similar locations are not just a coincidence,” explains Ziggy Pleunis, a Dunlap Postdoctoral Fellow at the Dunlap Institute for Astronomy and Astrophysics and corresponding author of the new work.

    The team also concluded that all FRBs may eventually repeat. They found that radio waves seen to have burst only once differed from those that were seen to have burst multiple times both in terms of duration of bursts and range of frequencies emitted, which solidifies the idea that these radio bursts have indeed different origins.

    MIT postdoc Daniele Michilli and PhD student Kaitlyn Shin, both members of MIT Assistant Professor Kiyoshi Masui’s Synoptic Radio Lab, analyzed signals from CHIME’s 1,024 antennae. The work, Michilli says, “allowed us to unambiguously identify some of the sources as repeaters and to provide other observatories with accurate coordinates for follow-up studies.”

    “Now that we have a much larger sample of repeating FRBs, we’re better equipped to understand why we might observe some FRBs to be repeaters and others to be apparently non-repeating, and what the implications are for better understanding their origins,” says Shin.

    Adds Pleunis, “FRBs are likely produced by the leftovers from explosive stellar deaths. By studying repeating FRB sources in detail, we can study the environments that these explosions occur in and understand better the end stages of a star’s life. We can also learn more about the material that is being expelled before and during the star’s demise, which is then returned to the galaxies that the FRBs live in.”

    In addition to Michilli, Shin, and Masui, MIT contributors to the study include physics graduate students Calvin Leung and Haochen Wang. More