More stories

  • in

    Making data visualizations more accessible

    In the early days of the Covid-19 pandemic, the Centers for Disease Control and Prevention produced a simple chart to illustrate how measures like mask wearing and social distancing could “flatten the curve” and reduce the peak of infections.

    The chart was amplified by news sites and shared on social media platforms, but it often lacked a corresponding text description to make it accessible for blind individuals who use a screen reader to navigate the web, shutting out many of the 253 million people worldwide who have visual disabilities.

    This alternative text is often missing from online charts, and even when it is included, it is frequently uninformative or even incorrect, according to qualitative data gathered by scientists at MIT.

    These researchers conducted a study with blind and sighted readers to determine which text is useful to include in a chart description, which text is not, and why. Ultimately, they found that captions for blind readers should focus on the overall trends and statistics in the chart, not its design elements or higher-level insights.

    They also created a conceptual model that can be used to evaluate a chart description, whether the text was generated automatically by software or manually by a human author. Their work could help journalists, academics, and communicators create descriptions that are more effective for blind individuals and guide researchers as they develop better tools to automatically generate captions.

    “Ninety-nine-point-nine percent of images on Twitter lack any kind of description — and that is not hyperbole, that is the actual statistic,” says Alan Lundgard, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper. “Having people manually author those descriptions seems to be difficult for a variety of reasons. Perhaps semiautonomous tools could help with that. But it is crucial to do this preliminary participatory design work to figure out what is the target for these tools, so we are not generating content that is either not useful to its intended audience or, in the worst case, erroneous.”

    Lundgard wrote the paper with senior author Arvind Satyanarayan, an assistant professor of computer science who leads the Visualization Group in CSAIL. The research will be presented at the Institute of Electrical and Electronics Engineers Visualization Conference in October.

    Evaluating visualizations

    To develop the conceptual model, the researchers planned to begin by studying graphs featured by popular online publications such as FiveThirtyEight and NYTimes.com, but they ran into a problem — those charts mostly lacked any textual descriptions. So instead, they collected descriptions for these charts from graduate students in an MIT data visualization class and through an online survey, then grouped the captions into four categories.

    Level 1 descriptions focus on the elements of the chart, such as its title, legend, and colors. Level 2 descriptions describe statistical content, like the minimum, maximum, or correlations. Level 3 descriptions cover perceptual interpretations of the data, like complex trends or clusters. Level 4 descriptions include subjective interpretations that go beyond the data and draw on the author’s knowledge.

    In a study with blind and sighted readers, the researchers presented visualizations with descriptions at different levels and asked participants to rate how useful they were. While both groups agreed that level 1 content on its own was not very helpful, sighted readers gave level 4 content the highest marks while blind readers ranked that content among the least useful.

    Survey results revealed that a majority of blind readers were emphatic that descriptions should not contain an author’s editorialization, but rather stick to straight facts about the data. On the other hand, most sighted readers preferred a description that told a story about the data.

    “For me, a surprising finding about the lack of utility for the highest-level content is that it ties very closely to feelings about agency and control as a disabled person. In our research, blind readers specifically didn’t want the descriptions to tell them what to think about the data. They want the data to be accessible in a way that allows them to interpret it for themselves, and they want to have the agency to do that interpretation,” Lundgard says.

    A more inclusive future

    This work could have implications as data scientists continue to develop and refine machine learning methods for autogenerating captions and alternative text.

    “We are not able to do it yet, but it is not inconceivable to imagine that in the future we would be able to automate the creation of some of this higher-level content and build models that target level 2 or level 3 in our framework. And now we know what the research questions are. If we want to produce these automated captions, what should those captions say? We are able to be a bit more directed in our future research because we have these four levels,” Satyanarayan says.

    In the future, the four-level framework could also help researchers develop machine learning models that can automatically suggest effective visualizations as part of the data analysis process, or models that can extract the most useful information from a chart.

    This research could also inform future work in Satyanarayan’s group that seeks to make interactive visualizations more accessible for blind readers who use a screen reader to access and interpret the information. 

    “The question of how to ensure that charts and graphs are accessible to screen reader users is both a socially important equity issue and a challenge that can advance the state-of-the-art in AI,” says Meredith Ringel Morris, director and principal scientist of the People + AI Research team at Google Research, who was not involved with this study. “By introducing a framework for conceptualizing natural language descriptions of information graphics that is grounded in end-user needs, this work helps ensure that future AI researchers will focus their efforts on problems aligned with end-users’ values.”

    Morris adds: “Rich natural-language descriptions of data graphics will not only expand access to critical information for people who are blind, but will also benefit a much wider audience as eyes-free interactions via smart speakers, chatbots, and other AI-powered agents become increasingly commonplace.”

    This research was supported by the National Science Foundation. More

  • in

    “AI for Impact” lives up to its name

    For entrepreneurial MIT students looking to put their skills to work for a greater good, the Media Arts and Sciences class MAS.664 (AI for Impact) has been a destination point. With the onset of the pandemic, that goal came into even sharper focus. Just weeks before the campus shut down in 2020, a team of students from the class launched a project that would make significant strides toward an open-source platform to identify coronavirus exposures without compromising personal privacy.

    Their work was at the heart of Safe Paths, one of the earliest contact tracing apps in the United States. The students joined with volunteers from other universities, medical centers, and companies to publish their code, alongside a well-received white paper describing the privacy-preserving, decentralized protocol, all while working with organizations wishing to launch the app within their communities. The app and related software eventually got spun out into the nonprofit PathCheck Foundation, which today engages with public health entities and is providing exposure notifications in Guam, Cyprus, Hawaii, Minnesota, Alabama, and Louisiana.

    The formation of Safe Paths demonstrates the special sense among MIT researchers that “we can launch something that can help people around the world,” notes Media Lab Associate Professor Ramesh Raskar, who teaches the class together with Media Lab Professor Alex “Sandy” Pentland and Media Lab Lecturer Joost Bonsen. “To have that kind of passion and ambition — but also the confidence that what you create here can actually be deployed globally — is kind of amazing.”

    AI for Impact, created by Pentland, began meeting two decades ago under the course name Development Ventures, and has nurtured multiple thriving businesses. Examples of class ventures that Pentland incubated or co-founded include Dimagi, Cogito, Ginger, Prosperia, and Sanergy.

    The aim-high challenge posed to each class is to come up with a business plan that touches a billion people, and it can’t all be in one country, Pentland explains. Not every class effort becomes a business, “but 20 percent to 30 percent of students start something, which is great for an entrepreneur class,” says Pentland.

    Opportunities for Impact

    The numbers behind Dimagi, for instance, are striking. Its core product CommCare has helped front-line health workers provide care for more than 400 million people in more than 130 countries around the world. When it comes to maternal and child care, Dimagi’s platform has registered one in every 110 pregnancies worldwide. This past year, several governments around the world deployed CommCare applications for Covid-19 response — from Sierra Leone and Somalia to New York and Colorado.

    Spinoffs like Cogito, Prosperia, and Ginger have likewise grown into highly successful companies. Cogito helps a million people a day gain access to the health care they need; Prosperia helps manage social support payments to 80 million people in Latin America; and Ginger handles mental health services for over 1 million people.

    The passion behind these and other class ventures points to a central idea of the class, Pentland notes: MIT students are often looking for ways to build entrepreneurial businesses that enable positive social change.

    During the spring 2021 class, for example, a number of promising student projects included tools to help residents of poor communities transition to owning their homes rather than renting, and to take better control of their community health.

    “It’s clear that the people who are graduating from here want to do something significant with their lives … they want to have an impact on their world,” Pentland says. “This class enables them to meet other people who are interested in doing the same thing, and offers them some help in starting a company to do it.”

    Many of the students who join the class come in with a broad set of interests. Guest lectures, case studies of other social entrepreneurship projects, and an introduction to a broad ecosystem of expertise and funding, then helps students to refine their general ideas into specific and viable projects.

    A path toward confronting a pandemic 

    Raskar began co-teaching the class in 2019, and brought a “Big AI” focus to the Development Ventures class, inspired by an AI for Impact team he had set up at his former employer, Facebook. “What I realized is that companies like Google or Facebook or Amazon actually have enough data about all of us that they can solve major problems in our society — climate, transportation, health, and so on,” he says. “This is something we should think about more seriously: how to use AI and data for positive social impact, while protecting privacy.”

    Early into the spring 2020 class, as students were beginning to consider their own projects, Raskar approached the class about the emerging coronavirus outbreak. Students like Kristen Vilcans recognized the urgency, and the opportunity. She and 10 other students joined forces to work on a project that would focus on Covid-19.

    “Students felt empowered to do something to help tackle the spread of this alarming new virus,” Raskar recalls. “They immediately began to develop data- and AI-based solutions to one of the most critical pieces of addressing a pandemic: halting the chain of infections. They created and launched one of the first digital contact tracing and exposure notification solutions in the U.S., developing an early alert system that engaged the public and protected privacy.” 

    Raskar looks back on the moment when a core group of students coalesced into a team. “It was very rare for a significant part of the class to just come together saying, ‘let’s do this, right away.’ It became as much a movement as a venture.”

    Group discussions soon began to center around an open-source, privacy-first digital set of tools for Covid-19 contact tracing. For the next two weeks, right up to the campus shutdown in March 2020, the team took over two adjacent conference rooms in the Media Lab, and started a Slack messaging channel devoted to the project. As the team members reached out to an ever-wider circle of friends, colleagues, and mentors, the number of participants grew to nearly 1,600 people, coming together virtually from all corners of the world.

    Kaushal Jain, a Harvard Business School student who had cross-registered for the spring 2020 class to get to know the MIT ecosystem, was also an early participant in Safe Paths. He wrote up an initial plan for the venture and began working with external organizations to figure out how to structure it into a nonprofit company. Jain eventually became the project’s lead for funding and partnerships.

    Vilcans, a graduate student in system design and management, served as Safe Paths’ communications lead through July 2020, while still working a part-time job at Draper Laboratory and taking classes.

    “There are these moments when you want to dive in, you want to contribute and you want to work nonstop,” she says, adding that the experience was also a wake-up call on how to manage burnout, and how to balance what you need as a person while contributing to a high-impact team. “That’s important to understand as a leader for the future.”

    MIT recognized Vilcan’s contributions later that year with the 2020 SDM Student Award for Leadership, Innovation, and Systems Thinking. 

    Jain, too, says the class gave him more than he could have expected.

    “I made strong friendships with like-minded people from very different backgrounds,” he says. “One key thing that I learned was to be flexible about the kind of work you want to do. Be open and see if there’s an opportunity, either through crisis or through something that you believe could really change a lot of things in the world. And then just go for it.” More

  • in

    Lincoln Laboratory convenes top network scientists for Graph Exploitation Symposium

    As the Covid-19 pandemic has shown, we live in a richly connected world, facilitating not only the efficient spread of a virus but also of information and influence. What can we learn by analyzing these connections? This is a core question of network science, a field of research that models interactions across physical, biological, social, and information systems to solve problems.

    The 2021 Graph Exploitation Symposium (GraphEx), hosted by MIT Lincoln Laboratory, brought together top network science researchers to share the latest advances and applications in the field.

    “We explore and identify how exploitation of graph data can offer key technology enablers to solve the most pressing problems our nation faces today,” says Edward Kao, a symposium organizer and technical staff in Lincoln Laboratory’s AI Software Architectures and Algorithms Group.

    The themes of the virtual event revolved around some of the year’s most relevant issues, such as analyzing disinformation on social media, modeling the pandemic’s spread, and using graph-based machine learning models to speed drug design.

    “The special sessions on influence operations and Covid-19 at GraphEx reflect the relevance of network and graph-based analysis for understanding the phenomenology of these complicated and impactful aspects of modern-day life, and also may suggest paths forward as we learn more and more about graph manipulation,” says William Streilein, who co-chaired the event with Rajmonda Caceres, both of Lincoln Laboratory.

    Social networks

    Several presentations at the symposium focused on the role of network science in analyzing influence operations (IO), or organized attempts by state and/or non-state actors to spread disinformation narratives.  

    Lincoln Laboratory researchers have been developing tools to classify and quantify the influence of social media accounts that are likely IO accounts, such as those willfully spreading false Covid-19 treatments to vulnerable populations.

    “A cluster of IO accounts acts as an echo chamber to amplify the narrative. The vulnerable population is then engaging in these narratives,” says Erika Mackin, a researcher developing the tool, called RIO or Reconnaissance of Influence Operations.

    To classify IO accounts, Mackin and her team trained an algorithm to detect probable IO accounts in Twitter networks based on a specific hashtag or narrative. One example they studied was #MacronLeaks, a disinformation campaign targeting Emmanuel Macron during the 2017 French presidential election. The algorithm is trained to label accounts within this network as being IO on the basis of several factors, such as the number of interactions with foreign news accounts, the number of links tweeted, or number of languages used. Their model then uses a statistical approach to score an account’s level of influence in spreading the narrative within that network.

    The team has found that their classifier outperforms existing detectors of IO accounts, because it can identify both bot accounts and human-operated ones. They’ve also discovered that IO accounts that pushed the 2017 French election disinformation narrative largely overlap with accounts influentially spreading Covid-19 pandemic disinformation today. “This suggests that these accounts will continue to transition to disinformation narratives,” Mackin says.

    Pandemic modeling

    Throughout the Covid-19 pandemic, leaders have been looking to epidemiological models, which predict how disease will spread, to make sound decisions. Alessandro Vespignani, director of the Network Science Institute at Northeastern University, has been leading Covid-19 modeling efforts in the United States, and shared a keynote on this work at the symposium.

    Besides taking into account the biological facts of the disease, such as its incubation period, Vespignani’s model is especially powerful in its inclusion of community behavior. To run realistic simulations of disease spread, he develops “synthetic populations” that are built by using publicly available, highly detailed datasets about U.S. households. “We create a population that is not real, but is statistically real, and generate a map of the interactions of those individuals,” he says. This information feeds back into the model to predict the spread of the disease. 

    Today, Vespignani is considering how to integrate genomic analysis of the virus into this kind of population modeling in order to understand how variants are spreading. “It’s still a work in progress that is extremely interesting,” he says, adding that this approach has been useful in modeling the dispersal of the Delta variant of SARS-CoV-2. 

    As researchers model the virus’ spread, Lucas Laird at Lincoln Laboratory is considering how network science can be used to design effective control strategies. He and his team are developing a model for customizing strategies for different geographic regions. The effort was spurred by the differences in Covid-19 spread across U.S. communities, and what the researchers found to be a gap in intervention modeling to address those differences.

    As examples, they applied their planning algorithm to three counties in Florida, Massachusetts, and California. Taking into account the characteristics of a specific geographic center, such as the number of susceptible individuals and number of infections there, their planner institutes different strategies in those communities throughout the outbreak duration.

    “Our approach eradicates disease in 100 days, but it also is able to do it with much more targeted interventions than any of the global interventions. In other words, you don’t have to shut down a full country.” Laird adds that their planner offers a “sandbox environment” for exploring intervention strategies in the future.

    Machine learning with graphs

    Graph-based machine learning is receiving increasing attention for its potential to “learn” the complex relationships between graphical data, and thus extract new insights or predictions about these relationships. This interest has given rise to a new class of algorithms called graph neural networks. Today, graph neural networks are being applied in areas such as drug discovery and material design, with promising results.

    “We can now apply deep learning much more broadly, not only to medical images and biological sequences. This creates new opportunities in data-rich biology and medicine,” says Marinka Zitnik, an assistant professor at Harvard University who presented her research at GraphEx.

    Zitnik’s research focuses on the rich networks of interactions between proteins, drugs, disease, and patients, at the scale of billions of interactions. One application of this research is discovering drugs to treat diseases with no or few approved drug treatments, such as for Covid-19. In April, Zitnik’s team published a paper on their research that used graph neural networks to rank 6,340 drugs for their expected efficacy against SARS-CoV-2, identifying four that could be repurposed to treat Covid-19.

    At Lincoln Laboratory, researchers are similarly applying graph neural networks to the challenge of designing advanced materials, such as those that can withstand extreme radiation or capture carbon dioxide. Like the process of designing drugs, the trial-and-error approach to materials design is time-consuming and costly. The laboratory’s team is developing graph neural networks that can learn relationships between a material’s crystalline structure and its properties. This network can then be used to predict a variety of properties from any new crystal structure, greatly speeding up the process of screening materials with desired properties for specific applications.

    “Graph representation learning has emerged as a rich and thriving research area for incorporating inductive bias and structured priors during the machine learning process, with broad applications such as drug design, accelerated scientific discovery, and personalized recommendation systems,” Caceres says. 

    A vibrant community

    Lincoln Laboratory has hosted the GraphEx Symposium annually since 2010, with the exception of last year’s cancellation due to Covid-19. “One key takeaway is that despite the postponement from last year and the need to be virtual, the GraphEx community is as vibrant and active as it’s ever been,” Streilein says. “Network-based analysis continues to expand its reach and is applied to ever-more important areas of science, society, and defense with increasing impact.”

    In addition to those from Lincoln Laboratory, technical committee members and co-chairs of the GraphEx Symposium included researchers from Harvard University, Arizona State University, Stanford University, Smith College, Duke University, the U.S. Department of Defense, and Sandia National Laboratories. More