More stories

  • in

    “AI for Impact” lives up to its name

    For entrepreneurial MIT students looking to put their skills to work for a greater good, the Media Arts and Sciences class MAS.664 (AI for Impact) has been a destination point. With the onset of the pandemic, that goal came into even sharper focus. Just weeks before the campus shut down in 2020, a team of students from the class launched a project that would make significant strides toward an open-source platform to identify coronavirus exposures without compromising personal privacy.

    Their work was at the heart of Safe Paths, one of the earliest contact tracing apps in the United States. The students joined with volunteers from other universities, medical centers, and companies to publish their code, alongside a well-received white paper describing the privacy-preserving, decentralized protocol, all while working with organizations wishing to launch the app within their communities. The app and related software eventually got spun out into the nonprofit PathCheck Foundation, which today engages with public health entities and is providing exposure notifications in Guam, Cyprus, Hawaii, Minnesota, Alabama, and Louisiana.

    The formation of Safe Paths demonstrates the special sense among MIT researchers that “we can launch something that can help people around the world,” notes Media Lab Associate Professor Ramesh Raskar, who teaches the class together with Media Lab Professor Alex “Sandy” Pentland and Media Lab Lecturer Joost Bonsen. “To have that kind of passion and ambition — but also the confidence that what you create here can actually be deployed globally — is kind of amazing.”

    AI for Impact, created by Pentland, began meeting two decades ago under the course name Development Ventures, and has nurtured multiple thriving businesses. Examples of class ventures that Pentland incubated or co-founded include Dimagi, Cogito, Ginger, Prosperia, and Sanergy.

    The aim-high challenge posed to each class is to come up with a business plan that touches a billion people, and it can’t all be in one country, Pentland explains. Not every class effort becomes a business, “but 20 percent to 30 percent of students start something, which is great for an entrepreneur class,” says Pentland.

    Opportunities for Impact

    The numbers behind Dimagi, for instance, are striking. Its core product CommCare has helped front-line health workers provide care for more than 400 million people in more than 130 countries around the world. When it comes to maternal and child care, Dimagi’s platform has registered one in every 110 pregnancies worldwide. This past year, several governments around the world deployed CommCare applications for Covid-19 response — from Sierra Leone and Somalia to New York and Colorado.

    Spinoffs like Cogito, Prosperia, and Ginger have likewise grown into highly successful companies. Cogito helps a million people a day gain access to the health care they need; Prosperia helps manage social support payments to 80 million people in Latin America; and Ginger handles mental health services for over 1 million people.

    The passion behind these and other class ventures points to a central idea of the class, Pentland notes: MIT students are often looking for ways to build entrepreneurial businesses that enable positive social change.

    During the spring 2021 class, for example, a number of promising student projects included tools to help residents of poor communities transition to owning their homes rather than renting, and to take better control of their community health.

    “It’s clear that the people who are graduating from here want to do something significant with their lives … they want to have an impact on their world,” Pentland says. “This class enables them to meet other people who are interested in doing the same thing, and offers them some help in starting a company to do it.”

    Many of the students who join the class come in with a broad set of interests. Guest lectures, case studies of other social entrepreneurship projects, and an introduction to a broad ecosystem of expertise and funding, then helps students to refine their general ideas into specific and viable projects.

    A path toward confronting a pandemic 

    Raskar began co-teaching the class in 2019, and brought a “Big AI” focus to the Development Ventures class, inspired by an AI for Impact team he had set up at his former employer, Facebook. “What I realized is that companies like Google or Facebook or Amazon actually have enough data about all of us that they can solve major problems in our society — climate, transportation, health, and so on,” he says. “This is something we should think about more seriously: how to use AI and data for positive social impact, while protecting privacy.”

    Early into the spring 2020 class, as students were beginning to consider their own projects, Raskar approached the class about the emerging coronavirus outbreak. Students like Kristen Vilcans recognized the urgency, and the opportunity. She and 10 other students joined forces to work on a project that would focus on Covid-19.

    “Students felt empowered to do something to help tackle the spread of this alarming new virus,” Raskar recalls. “They immediately began to develop data- and AI-based solutions to one of the most critical pieces of addressing a pandemic: halting the chain of infections. They created and launched one of the first digital contact tracing and exposure notification solutions in the U.S., developing an early alert system that engaged the public and protected privacy.” 

    Raskar looks back on the moment when a core group of students coalesced into a team. “It was very rare for a significant part of the class to just come together saying, ‘let’s do this, right away.’ It became as much a movement as a venture.”

    Group discussions soon began to center around an open-source, privacy-first digital set of tools for Covid-19 contact tracing. For the next two weeks, right up to the campus shutdown in March 2020, the team took over two adjacent conference rooms in the Media Lab, and started a Slack messaging channel devoted to the project. As the team members reached out to an ever-wider circle of friends, colleagues, and mentors, the number of participants grew to nearly 1,600 people, coming together virtually from all corners of the world.

    Kaushal Jain, a Harvard Business School student who had cross-registered for the spring 2020 class to get to know the MIT ecosystem, was also an early participant in Safe Paths. He wrote up an initial plan for the venture and began working with external organizations to figure out how to structure it into a nonprofit company. Jain eventually became the project’s lead for funding and partnerships.

    Vilcans, a graduate student in system design and management, served as Safe Paths’ communications lead through July 2020, while still working a part-time job at Draper Laboratory and taking classes.

    “There are these moments when you want to dive in, you want to contribute and you want to work nonstop,” she says, adding that the experience was also a wake-up call on how to manage burnout, and how to balance what you need as a person while contributing to a high-impact team. “That’s important to understand as a leader for the future.”

    MIT recognized Vilcan’s contributions later that year with the 2020 SDM Student Award for Leadership, Innovation, and Systems Thinking. 

    Jain, too, says the class gave him more than he could have expected.

    “I made strong friendships with like-minded people from very different backgrounds,” he says. “One key thing that I learned was to be flexible about the kind of work you want to do. Be open and see if there’s an opportunity, either through crisis or through something that you believe could really change a lot of things in the world. And then just go for it.” More

  • in

    Exact symbolic artificial intelligence for faster, better assessment of AI fairness

    The justice system, banks, and private companies use algorithms to make decisions that have profound impacts on people’s lives. Unfortunately, those algorithms are sometimes biased — disproportionately impacting people of color as well as individuals in lower income classes when they apply for loans or jobs, or even when courts decide what bail should be set while a person awaits trial.

    MIT researchers have developed a new artificial intelligence programming language that can assess the fairness of algorithms more exactly, and more quickly, than available alternatives.

    Their Sum-Product Probabilistic Language (SPPL) is a probabilistic programming system. Probabilistic programming is an emerging field at the intersection of programming languages and artificial intelligence that aims to make AI systems much easier to develop, with early successes in computer vision, common-sense data cleaning, and automated data modeling. Probabilistic programming languages make it much easier for programmers to define probabilistic models and carry out probabilistic inference — that is, work backward to infer probable explanations for observed data.

    “There are previous systems that can solve various fairness questions. Our system is not the first; but because our system is specialized and optimized for a certain class of models, it can deliver solutions thousands of times faster,” says Feras Saad, a PhD student in electrical engineering and computer science (EECS) and first author on a recent paper describing the work. Saad adds that the speedups are not insignificant: The system can be up to 3,000 times faster than previous approaches.

    SPPL gives fast, exact solutions to probabilistic inference questions such as “How likely is the model to recommend a loan to someone over age 40?” or “Generate 1,000 synthetic loan applicants, all under age 30, whose loans will be approved.” These inference results are based on SPPL programs that encode probabilistic models of what kinds of applicants are likely, a priori, and also how to classify them. Fairness questions that SPPL can answer include “Is there a difference between the probability of recommending a loan to an immigrant and nonimmigrant applicant with the same socioeconomic status?” or “What’s the probability of a hire, given that the candidate is qualified for the job and from an underrepresented group?”

    SPPL is different from most probabilistic programming languages, as SPPL only allows users to write probabilistic programs for which it can automatically deliver exact probabilistic inference results. SPPL also makes it possible for users to check how fast inference will be, and therefore avoid writing slow programs. In contrast, other probabilistic programming languages such as Gen and Pyro allow users to write down probabilistic programs where the only known ways to do inference are approximate — that is, the results include errors whose nature and magnitude can be hard to characterize.

    Error from approximate probabilistic inference is tolerable in many AI applications. But it is undesirable to have inference errors corrupting results in socially impactful applications of AI, such as automated decision-making, and especially in fairness analysis.

    Jean-Baptiste Tristan, associate professor at Boston College and former research scientist at Oracle Labs, who was not involved in the new research, says, “I’ve worked on fairness analysis in academia and in real-world, large-scale industry settings. SPPL offers improved flexibility and trustworthiness over other PPLs on this challenging and important class of problems due to the expressiveness of the language, its precise and simple semantics, and the speed and soundness of the exact symbolic inference engine.”

    SPPL avoids errors by restricting to a carefully designed class of models that still includes a broad class of AI algorithms, including the decision tree classifiers that are widely used for algorithmic decision-making. SPPL works by compiling probabilistic programs into a specialized data structure called a “sum-product expression.” SPPL further builds on the emerging theme of using probabilistic circuits as a representation that enables efficient probabilistic inference. This approach extends prior work on sum-product networks to models and queries expressed via a probabilistic programming language. However, Saad notes that this approach comes with limitations: “SPPL is substantially faster for analyzing the fairness of a decision tree, for example, but it can’t analyze models like neural networks. Other systems can analyze both neural networks and decision trees, but they tend to be slower and give inexact answers.”

    “SPPL shows that exact probabilistic inference is practical, not just theoretically possible, for a broad class of probabilistic programs,” says Vikash Mansinghka, an MIT principal research scientist and senior author on the paper. “In my lab, we’ve seen symbolic inference driving speed and accuracy improvements in other inference tasks that we previously approached via approximate Monte Carlo and deep learning algorithms. We’ve also been applying SPPL to probabilistic programs learned from real-world databases, to quantify the probability of rare events, generate synthetic proxy data given constraints, and automatically screen data for probable anomalies.”

    The new SPPL probabilistic programming language was presented in June at the ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI), in a paper that Saad co-authored with MIT EECS Professor Martin Rinard and Mansinghka. SPPL is implemented in Python and is available open source. More

  • in

    Lincoln Laboratory convenes top network scientists for Graph Exploitation Symposium

    As the Covid-19 pandemic has shown, we live in a richly connected world, facilitating not only the efficient spread of a virus but also of information and influence. What can we learn by analyzing these connections? This is a core question of network science, a field of research that models interactions across physical, biological, social, and information systems to solve problems.

    The 2021 Graph Exploitation Symposium (GraphEx), hosted by MIT Lincoln Laboratory, brought together top network science researchers to share the latest advances and applications in the field.

    “We explore and identify how exploitation of graph data can offer key technology enablers to solve the most pressing problems our nation faces today,” says Edward Kao, a symposium organizer and technical staff in Lincoln Laboratory’s AI Software Architectures and Algorithms Group.

    The themes of the virtual event revolved around some of the year’s most relevant issues, such as analyzing disinformation on social media, modeling the pandemic’s spread, and using graph-based machine learning models to speed drug design.

    “The special sessions on influence operations and Covid-19 at GraphEx reflect the relevance of network and graph-based analysis for understanding the phenomenology of these complicated and impactful aspects of modern-day life, and also may suggest paths forward as we learn more and more about graph manipulation,” says William Streilein, who co-chaired the event with Rajmonda Caceres, both of Lincoln Laboratory.

    Social networks

    Several presentations at the symposium focused on the role of network science in analyzing influence operations (IO), or organized attempts by state and/or non-state actors to spread disinformation narratives.  

    Lincoln Laboratory researchers have been developing tools to classify and quantify the influence of social media accounts that are likely IO accounts, such as those willfully spreading false Covid-19 treatments to vulnerable populations.

    “A cluster of IO accounts acts as an echo chamber to amplify the narrative. The vulnerable population is then engaging in these narratives,” says Erika Mackin, a researcher developing the tool, called RIO or Reconnaissance of Influence Operations.

    To classify IO accounts, Mackin and her team trained an algorithm to detect probable IO accounts in Twitter networks based on a specific hashtag or narrative. One example they studied was #MacronLeaks, a disinformation campaign targeting Emmanuel Macron during the 2017 French presidential election. The algorithm is trained to label accounts within this network as being IO on the basis of several factors, such as the number of interactions with foreign news accounts, the number of links tweeted, or number of languages used. Their model then uses a statistical approach to score an account’s level of influence in spreading the narrative within that network.

    The team has found that their classifier outperforms existing detectors of IO accounts, because it can identify both bot accounts and human-operated ones. They’ve also discovered that IO accounts that pushed the 2017 French election disinformation narrative largely overlap with accounts influentially spreading Covid-19 pandemic disinformation today. “This suggests that these accounts will continue to transition to disinformation narratives,” Mackin says.

    Pandemic modeling

    Throughout the Covid-19 pandemic, leaders have been looking to epidemiological models, which predict how disease will spread, to make sound decisions. Alessandro Vespignani, director of the Network Science Institute at Northeastern University, has been leading Covid-19 modeling efforts in the United States, and shared a keynote on this work at the symposium.

    Besides taking into account the biological facts of the disease, such as its incubation period, Vespignani’s model is especially powerful in its inclusion of community behavior. To run realistic simulations of disease spread, he develops “synthetic populations” that are built by using publicly available, highly detailed datasets about U.S. households. “We create a population that is not real, but is statistically real, and generate a map of the interactions of those individuals,” he says. This information feeds back into the model to predict the spread of the disease. 

    Today, Vespignani is considering how to integrate genomic analysis of the virus into this kind of population modeling in order to understand how variants are spreading. “It’s still a work in progress that is extremely interesting,” he says, adding that this approach has been useful in modeling the dispersal of the Delta variant of SARS-CoV-2. 

    As researchers model the virus’ spread, Lucas Laird at Lincoln Laboratory is considering how network science can be used to design effective control strategies. He and his team are developing a model for customizing strategies for different geographic regions. The effort was spurred by the differences in Covid-19 spread across U.S. communities, and what the researchers found to be a gap in intervention modeling to address those differences.

    As examples, they applied their planning algorithm to three counties in Florida, Massachusetts, and California. Taking into account the characteristics of a specific geographic center, such as the number of susceptible individuals and number of infections there, their planner institutes different strategies in those communities throughout the outbreak duration.

    “Our approach eradicates disease in 100 days, but it also is able to do it with much more targeted interventions than any of the global interventions. In other words, you don’t have to shut down a full country.” Laird adds that their planner offers a “sandbox environment” for exploring intervention strategies in the future.

    Machine learning with graphs

    Graph-based machine learning is receiving increasing attention for its potential to “learn” the complex relationships between graphical data, and thus extract new insights or predictions about these relationships. This interest has given rise to a new class of algorithms called graph neural networks. Today, graph neural networks are being applied in areas such as drug discovery and material design, with promising results.

    “We can now apply deep learning much more broadly, not only to medical images and biological sequences. This creates new opportunities in data-rich biology and medicine,” says Marinka Zitnik, an assistant professor at Harvard University who presented her research at GraphEx.

    Zitnik’s research focuses on the rich networks of interactions between proteins, drugs, disease, and patients, at the scale of billions of interactions. One application of this research is discovering drugs to treat diseases with no or few approved drug treatments, such as for Covid-19. In April, Zitnik’s team published a paper on their research that used graph neural networks to rank 6,340 drugs for their expected efficacy against SARS-CoV-2, identifying four that could be repurposed to treat Covid-19.

    At Lincoln Laboratory, researchers are similarly applying graph neural networks to the challenge of designing advanced materials, such as those that can withstand extreme radiation or capture carbon dioxide. Like the process of designing drugs, the trial-and-error approach to materials design is time-consuming and costly. The laboratory’s team is developing graph neural networks that can learn relationships between a material’s crystalline structure and its properties. This network can then be used to predict a variety of properties from any new crystal structure, greatly speeding up the process of screening materials with desired properties for specific applications.

    “Graph representation learning has emerged as a rich and thriving research area for incorporating inductive bias and structured priors during the machine learning process, with broad applications such as drug design, accelerated scientific discovery, and personalized recommendation systems,” Caceres says. 

    A vibrant community

    Lincoln Laboratory has hosted the GraphEx Symposium annually since 2010, with the exception of last year’s cancellation due to Covid-19. “One key takeaway is that despite the postponement from last year and the need to be virtual, the GraphEx community is as vibrant and active as it’s ever been,” Streilein says. “Network-based analysis continues to expand its reach and is applied to ever-more important areas of science, society, and defense with increasing impact.”

    In addition to those from Lincoln Laboratory, technical committee members and co-chairs of the GraphEx Symposium included researchers from Harvard University, Arizona State University, Stanford University, Smith College, Duke University, the U.S. Department of Defense, and Sandia National Laboratories. More