More stories

  • in

    3 Questions: Catherine D’Ignazio on data science and a quest for justice

    As long as we apply data science to society, we should remember that our data may have flaws, biases, and absences. That is one motif of MIT Associate Professor Catherine D’Ignazio’s new book, “Counting Feminicide,” published this spring by the MIT Press. In it, D’Ignazio explores the world of Latin American activists who began using media accounts and other sources to tabulate how many women had been killed in their countries as the result of gender-based violence — and found that their own numbers differed greatly from official statistics.Some of these activists have become prominent public figures, and others less so, but all of them have produced work providing lessons about collecting data, sharing it, and applying data to projects supporting human liberty and dignity. Now, their stories are reaching a new audience thanks to D’Ignazio, an associate professor of urban science and planning in MIT’s Department of Urban Studies and Planning, and director of MIT’s Data and Feminism Lab. She is also hosting an ongoing, transnational book club about the work. MIT News spoke with D’Ignazio about the new book and how activists are expanding the traditional practice of data science.Q: What is your book about?A: Three things. It’s a book that documents the rise of data activism as a really interesting form of citizen data science. Increasingly, because of the availability of data and tools, gathering and doing your own data analysis is a growing form of social activism. We characterize it in the book as a citizenship practice. People are using data to make knowledge claims and put political demands out there for their institutions to respond to.Another takeaway is that from observing data activists, there are ways they approach data science that are very different from how it’s usually taught. Among other things, when undertaking work about inequality and violence, there’s a connection with the rows of data. It’s about memorializing people who have been lost. Mainstream data scientists can learn a lot from this.The third thing is about feminicide itself and missing information. The main reason people start collecting data about feminicide is because their institutions aren’t doing it. This includes our institutions here in the United States. We’re talking about violence against women that the state is neglecting to count, classify, or take action on. So, activists step into these gaps and do this to the best of their ability, and they have been quite effective. The media will go to the activists, who end up becoming authorities on feminicide.Q: Can you elaborate on the differences between the practices of these data activists and more standard data science?A: One difference is what I’ll call the intimacy and proximity to the rows of the data set. In conventional data science, when you’re analyzing data, typically you’re not also the data collector. However these activists and groups are involved across the entire pipeline. As a result, there’s a connection and humanization to each line of the data set. For example, there is a school nurse in Texas who runs the site Women Count USA, and she will spend many hours trying to find photographs of victims of feminicide, which represents unusual care paid to each row of a dataset.Another point is the sophistication that the data activists have around what their data represent and what the biases are in the data. In mainstream AI and data science, we’re still having conversations where people seem surprised that there is bias in datasets. But I was impressed with the critical sophistication with which the activists approached their data. They gather information from the media and are familiar with the biases media have, and are aware their data is not comprehensive but is still useful. We can hold those two things together. It’s often more comprehensive data than what the institutions themselves have or will release to the public.Q: You did not just chronicle the work of activists, but engaged with them as well, and report about that in the book. What did you work on with them?A: One big component in the book is the participatory technology development that we engaged in with the activists, and one chapter is a case study of our work with activists to co-design machine learning and AI technology that supports their work. Our team was brainstorming about a system for the activists that would automatically find cases, verify them, and put them right in the database. Interestingly, the activists pushed back on that. They did not want full automation. They felt being, in effect, witnesses is an important part of the work. The emotional burden is an important part of the work and very central to it, too. That’s not something I might always expect to hear from data scientists.Keeping the human in the loop also means the human makes the final decision over whether a specific item constitutes feminicide or not. Handling it like that aligns with the fact that there are multiple definitions of feminicide, which is a complicated thing from a computational perspective. The proliferation of definitions about what counts as feminicide is a reflection of the fact that this is an ongoing global, transnational conversation. Feminicide has been codified in many laws, especially in Latin American countries, but none of those single laws is definitive. And no single activist definition is definitive. People are creating this together, through dialogue and struggle, so any computational system has to be designed with that understanding of the democratic process in mind. More

  • in

    Arvind, longtime MIT professor and prolific computer scientist, dies at 77

    Arvind Mithal, the Charles W. and Jennifer C. Johnson Professor in Computer Science and Engineering at MIT, head of the faculty of computer science in the Department of Electrical Engineering and Computer Science (EECS), and a pillar of the MIT community, died on June 17. Arvind, who went by the mononym, was 77 years old.A prolific researcher who led the Computation Structures Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), Arvind served on the MIT faculty for nearly five decades.“He was beloved by countless people across the MIT community and around the world who were inspired by his intellectual brilliance and zest for life,” President Sally Kornbluth wrote in a letter to the MIT community today.As a scientist, Arvind was well known for important contributions to dataflow computing, which seeks to optimize the flow of data to take advantage of parallelism, achieving faster and more efficient computation.In the last 25 years, his research interests broadened to include developing techniques and tools for formal modeling, high-level synthesis, and formal verification of complex digital devices like microprocessors and hardware accelerators, as well as memory models and cache coherence protocols for parallel computing architectures and programming languages.Those who knew Arvind describe him as a rare individual whose interests and expertise ranged from high-level, theoretical formal systems all the way down through languages and compilers to the gates and structures of silicon hardware.The applications of Arvind’s work are far-reaching, from reducing the amount of energy and space required by data centers to streamlining the design of more efficient multicore computer chips.“Arvind was both a tremendous scholar in the fields of computer architecture and programming languages and a dedicated teacher, who brought systems-level thinking to our students. He was also an exceptional academic leader, often leading changes in curriculum and contributing to the Engineering Council in meaningful and impactful ways. I will greatly miss his sage advice and wisdom,” says Anantha Chandrakasan, chief innovation and strategy officer, dean of engineering, and the Vannevar Bush Professor of Electrical Engineering and Computer Science.“Arvind’s positive energy, together with his hearty laugh, brightened so many people’s lives. He was an enduring source of wise counsel for colleagues and for generations of students. With his deep commitment to academic excellence, he not only transformed research in computer architecture and parallel computing but also brought that commitment to his role as head of the computer science faculty in the EECS department. He left a lasting impact on all of us who had the privilege of working with him,” says Dan Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science.Arvind developed an interest in parallel computing while he was a student at the Indian Institute of Technology in Kanpur, from which he received his bachelor’s degree in 1969. He earned a master’s degree and PhD in computer science in 1972 and 1973, respectively, from the University of Minnesota, where he studied operating systems and mathematical models of program behavior. He taught at the University of California at Irvine from 1974 to 1978 before joining the faculty at MIT.At MIT, Arvind’s group studied parallel computing and declarative programming languages, and he led the development of two parallel computing languages, Id and pH. He continued his work on these programming languages through the 1990s, publishing the book “Implicit Parallel Programming in pH” with co-author R.S. Nikhil in 2001, the culmination of more than 20 years of research.In addition to his research, Arvind was an important academic leader in EECS. He served as head of computer science faculty in the department and played a critical role in helping with the reorganization of EECS after the establishment of the MIT Schwarzman College of Computing.“Arvind was a force of nature, larger than life in every sense. His relentless positivity, unwavering optimism, boundless generosity, and exceptional strength as a researcher was truly inspiring and left a profound mark on all who had the privilege of knowing him. I feel enormous gratitude for the light he brought into our lives and his fundamental impact on our community,” says Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science and the director of CSAIL.His work on dataflow and parallel computing led to the Monsoon project in the late 1980s and early 1990s. Arvind’s group, in collaboration with Motorola, built 16 dataflow computing machines and developed their associated software. One Monsoon dataflow machine is now in the Computer History Museum in Mountain View, California.Arvind’s focus shifted in the 1990s when, as he explained in a 2012 interview for the Institute of Electrical and Electronics Engineers (IEEE), funding for research into parallel computing began to dry up.“Microprocessors were getting so much faster that people thought they didn’t need it,” he recalled.Instead, he began applying techniques his team had learned and developed for parallel programming to the principled design of digital hardware.In addition to mentoring students and junior colleagues at MIT, Arvind also advised universities and governments in many countries on research in parallel programming and semiconductor design.Based on his work on digital hardware design, Arvind founded Sandburst in 2000, a fabless manufacturing company for semiconductor chips. He served as the company’s president for two years before returning to the MIT faculty, while continuing as an advisor. Sandburst was later acquired by Broadcom.Arvind and his students also developed Bluespec, a programming language designed to automate the design of chips. Building off this work, he co-founded the startup Bluespec, Inc., in 2003, to develop practical tools that help engineers streamline device design.Over the past decade, he was dedicated to advancing undergraduate education at MIT by bringing modern design tools to courses 6.004 (Computation Structures) and 6.191 (Introduction to Deep Learning), and incorporating Minispec, a programming language that is closely related to Bluespec.Arvind was honored for these and other contributions to data flow and multithread computing, and the development of tools for the high-level synthesis of hardware, with membership in the National Academy of Engineering in 2008 and the American Academy of Arts and Sciences in 2012. He was also named a distinguished alumnus of IIT Kanpur, his undergraduate alma mater.“Arvind was more than a pillar of the EECS community and a titan of computer science; he was a beloved colleague and a treasured friend. Those of us with the remarkable good fortune to work and collaborate with Arvind are devastated by his sudden loss. His kindness and joviality were unwavering; his mentorship was thoughtful and well-considered; his guidance was priceless. We will miss Arvind deeply,” says Asu Ozdaglar, deputy dean of the MIT Schwarzman College of Computing and head of EECS.Among numerous other awards, including membership in the Indian National Academy of Sciences and fellowship in the Association for Computing Machinery and IEEE, he received the Harry H. Goode Memorial Award from IEEE in 2012, which honors significant contributions to theory or practice in the information processing field.A humble scientist, Arvind was the first to point out that these achievements were only possible because of his outstanding and brilliant collaborators. Chief among those collaborators were the undergraduate and graduate students he felt fortunate to work with at MIT. He maintained excellent relationships with them both professionally and personally, and valued these relationships more than the work they did together, according to family members.In summing up the key to his scientific success, Arvind put it this way in the 2012 IEEE interview: “Really, one has to do what one believes in. I think the level at which most of us work, it is not sustainable if you don’t enjoy it on a day-to-day basis. You can’t work on it just because of the results. You have to work on it because you say, ‘I have to know the answer to this,’” he said.He is survived by his wife, Gita Singh Mithal, their two sons Divakar ’01 and Prabhakar ’04, their wives Leena and Nisha, and two grandchildren, Maya and Vikram.  More

  • in

    MIT-Takeda Program wraps up with 16 publications, a patent, and nearly two dozen projects completed

    When the Takeda Pharmaceutical Co. and the MIT School of Engineering launched their collaboration focused on artificial intelligence in health care and drug development in February 2020, society was on the cusp of a globe-altering pandemic and AI was far from the buzzword it is today.As the program concludes, the world looks very different. AI has become a transformative technology across industries including health care and pharmaceuticals, while the pandemic has altered the way many businesses approach health care and changed how they develop and sell medicines.For both MIT and Takeda, the program has been a game-changer.When it launched, the collaborators hoped the program would help solve tangible, real-world problems. By its end, the program has yielded a catalog of new research papers, discoveries, and lessons learned, including a patent for a system that could improve the manufacturing of small-molecule medicines.Ultimately, the program allowed both entities to create a foundation for a world where AI and machine learning play a pivotal role in medicine, leveraging Takeda’s expertise in biopharmaceuticals and the MIT researchers’ deep understanding of AI and machine learning.“The MIT-Takeda Program has been tremendously impactful and is a shining example of what can be accomplished when experts in industry and academia work together to develop solutions,” says Anantha Chandrakasan, MIT’s chief innovation and strategy officer, dean of the School of Engineering, and the Vannevar Bush Professor of Electrical Engineering and Computer Science. “In addition to resulting in research that has advanced how we use AI and machine learning in health care, the program has opened up new opportunities for MIT faculty and students through fellowships, funding, and networking.”What made the program unique was that it was centered around several concrete challenges spanning drug development that Takeda needed help addressing. MIT faculty had the opportunity to select the projects based on their area of expertise and general interest, allowing them to explore new areas within health care and drug development.“It was focused on Takeda’s toughest business problems,” says Anne Heatherington, Takeda’s research and development chief data and technology officer and head of its Data Sciences Institute.“They were problems that colleagues were really struggling with on the ground,” adds Simon Davies, the executive director of the MIT-Takeda Program and Takeda’s global head of statistical and quantitative sciences. Takeda saw an opportunity to collaborate with MIT’s world-class researchers, who were working only a few blocks away. Takeda, a global pharmaceutical company with global headquarters in Japan, has its global business units and R&D center just down the street from the Institute.As part of the program, MIT faculty were able to select what issues they were interested in working on from a group of potential Takeda projects. Then, collaborative teams including MIT researchers and Takeda employees approached research questions in two rounds. Over the course of the program, collaborators worked on 22 projects focused on topics including drug discovery and research, clinical drug development, and pharmaceutical manufacturing. Over 80 MIT students and faculty joined more than 125 Takeda researchers and staff on teams addressing these research questions.The projects centered around not only hard problems, but also the potential for solutions to scale within Takeda or within the biopharmaceutical industry more broadly.Some of the program’s findings have already resulted in wider studies. One group’s results, for instance, showed that using artificial intelligence to analyze speech may allow for earlier detection of frontotemporal dementia, while making that diagnosis more quickly and inexpensively. Similar algorithmic analyses of speech in patients diagnosed with ALS may also help clinicians understand the progression of that disease. Takeda is continuing to test both AI applications.Other discoveries and AI models that resulted from the program’s research have already had an impact. Using a physical model and AI learning algorithms can help detect particle size, mix, and consistency for powdered, small-molecule medicines, for instance, speeding up production timelines. Based on their research under the program, collaborators have filed for a patent for that technology.For injectable medicines like vaccines, AI-enabled inspections can also reduce process time and false rejection rates. Replacing human visual inspections with AI processes has already shown measurable impact for the pharmaceutical company.Heatherington adds, “our lessons learned are really setting the stage for what we’re doing next, really embedding AI and gen-AI [generative AI] into everything that we do moving forward.”Over the course of the program, more than 150 Takeda researchers and staff also participated in educational programming organized by the Abdul Latif Jameel Clinic for Machine Learning in Health. In addition to providing research opportunities, the program funded 10 students through SuperUROP, the Advanced Undergraduate Research Opportunities Program, as well as two cohorts from the DHIVE health-care innovation program, part of the MIT Sandbox Innovation Fund Program.Though the formal program has ended, certain aspects of the collaboration will continue, such as the MIT-Takeda Fellows, which supports graduate students as they pursue groundbreaking research related to health and AI. During its run, the program supported 44 MIT-Takeda Fellows and will continue to support MIT students through an endowment fund. Organic collaboration between MIT and Takeda researchers will also carry forward. And the programs’ collaborators are working to create a model for similar academic and industry partnerships to widen the impact of this first-of-its-kind collaboration.  More

  • in

    MIT Faculty Founder Initiative announces three winners of entrepreneurship awards

    Patients with intractable cancers, chronic pain sufferers, and people who depend on battery-powered medical implants may all benefit from the ideas presented at the 2023-24 MIT-Royalty Pharma Prize Competition’s recent awards. This year’s top prizes went to researchers and biotech entrepreneurs Anne Carpenter, Frederike Petzschner, and Betar Gallant ’08, SM ’10, PhD ’13.MIT Faculty Founder Initiative Executive Director Kit Hickey MBA ’13 describes the time and hard work the three awardees and other finalists devoted to the initiative and its mission of cultivating female faculty in biotech to cross the chasm between laboratory research and its clinical application.“They have taken the first brave step of getting off the bench when they already work seven days a week. They have carved out time from their facilities, from their labs, from their lives in order to put themselves out there and leap into entrepreneurship,” Hickey says. “They’ve done it because they each want to see their innovations out in the world improving patients’ lives.”Carpenter, senior director of the Imaging Platform at the Broad Institute of MIT and Harvard, where she is also an institute scientist, won the competition’s $250,000 2023-24 MIT-Royalty Pharma Faculty Founder Prize Competition Grand Prize. Carpenter specializes in using microscopy imaging of cells and computational methods such as machine learning to accelerate the identification of chemical compounds with therapeutic potential to, for instance, shrink tumors. The identified compounds are then tested in biological assays that model the tumor ecosystem to see how the compounds would perform on actual tumors.Carpenter’s startup, SyzOnc, launched in April, a feat Carpenter associates with the assistance provided by the MIT Faculty Founder Initiative. Participants in the program receive mentorship, stipends, and advice from industry experts, as well as help with incorporating, assembling a management team, fundraising, and intellectual property strategy.“The program offered key insights and input at major decision points that gave us the momentum to open our doors,” Carpenter says, adding that participating “offered validation of our scientific ideas and business plan. That kind of credibility is really helpful to raising funding, particularly for those starting their first company.”Carpenter says she and her team will employ “the best biological and computational advancements to develop new therapies to fight tumors such as sarcoma, pancreatic cancer, and glioblastoma, which currently have dismal survival rates.”The MIT Faculty Founder Initiative was begun in 2020 by the School of Engineering and the Martin Trust Center for MIT Entrepreneurship, based on research findings by Sangeeta Bhatia, the Wilson Professor of Health Sciences and Technology, professor of electrical engineering and computer science, and faculty director of the MIT Faculty Founder Initiative; Susan Hockfield, MIT Corporation life member, MIT president emerita, and professor of neuroscience; and Nancy Hopkins, professor emerita of biology. An investigation they conducted showed that only about 9 percent of MIT’s 250 biotech startups were started by women, whereas women made up 22 percent of the faculty, as was presented in a 2021 MIT Faculty Newsletter.That data showed that “technologies from female labs were not getting out in the world, resulting in lost potential,” Hickey says.“The MIT Faculty Founder Initiative plays a pivotal role in MIT’s entrepreneurship ecosystem. It elevates visionary faculty working on solutions in biotech by providing them with critical mentorship and resources, ensuring these solutions can be rapidly scaled to market,” says Anantha Chandrakasan, MIT’s chief innovation and strategy officer, dean of engineering, and Vannevar Bush Professor of Electrical Engineering and Computer Science.The MIT Faculty Founder Initiative Prize Competition was launched in 2021. At this year’s competition, the judges represented academia, health care, biotech, and financial investment. In addition to awarding a grand prize, the competition also distributed two $100,000 prizes, one to a researcher from Brown University, the first university to collaborate with MIT in the entrepreneurship program.This year’s winner of the $100,000 2023-24 MIT-Royalty Pharma Faculty Founder Prize Competition Runner-Up Prize was Frederike Petzschner, assistant professor at the Carney Institute for Brain Science at Brown, for her SOMA startup’s digital pain management system, which helps sufferers to manage and relieve chronic pain.“We leverage cutting-edge technology to provide precision care, focusing specifically on personalized cognitive interventions tailored to each patient’s unique needs,” she says.With her startup on the verge of incorporating, Petzschner says, “without the Faculty Finder Initiative, our startup would still be pursuing commercialization, but undoubtedly at a much earlier and perhaps less structured stage.”“The constant support from the program organizers and our mentors was truly transformative,” she says.Gallant, associate professor of mechanical engineering at MIT and winner of the $100,000 2023-24 MIT-Royalty Pharma Faculty Founder Prize Competition Breakthrough Prize, is leading the startup Halogen. An expert on advanced battery technologies, Gallant and her team have developed high-density battery storage to improve the lifetime and performance of such medical devices as pacemakers.“If you can extend lifetime, you’re talking about longer times between invasive replacement surgeries, which really affects patient quality of life,” Gallant told MIT News in a 2022 interview.Jim Reddoch, executive vice president and chief scientific officer of sponsor Royalty Pharma, emphasized his company’s support for both the competition and the MIT Faculty Finder Initiative program.“Royalty Pharma is thrilled to support the 2023-2024 MIT-Royalty Pharma Prize Competition and accelerate life sciences innovation at leading research institutions such as MIT and Brown,” Reddoch says. “By supporting the amazing female entrepreneurs in this program, we hope to catalyze more ideas from the lab to biotech companies and eventually into the hands of patients.”Bhatia has referred to the MIT Faculty Founder Initiative as a “playbook” on how to direct female faculty’s high-impact technologies that are not being commercialized into the world of health care.“To me, changing the game means that when you have an invention in your lab, you’re connected enough to the ecosystem to know when it should be a company, and to know who to call and how to get your first investors and how to quickly catalyze your team — and you’re off to the races,” Bhatia says. “Every one one of those inventions can be a medicine as quickly as possible. That’s the future I imagine.”Co-founder Hockfield referred to MIT’s role in promoting entrepreneurship in remarks at the award ceremony, alluding to Brown University’s having joined the effort.“MIT has always been a leader in entrepreneurship,” Hockfield says. “Part of leading is sharing with the world. The collaboration with Brown University for this cohort shows that MIT can share our approach with the world, allowing other universities to follow our model of supporting academic entrepreneurship.”Hickey says that when she and Bhatia asked 30 female faculty members three years ago why they were not commercializing their technologies, many said they had no access to the appropriate networks of mentors, investors, role models, and business partners necessary to begin the journey.“We encourage you to become this network that has been missing,” Hickey told the awards event audience, which included an array of leaders in the biotech world. “Get to know our amazing faculty members and continue to support them. Become a part of this movement.” More

  • in

    Q&A: Exploring ethnic dynamics and climate change in Africa

    Evan Lieberman is the Total Professor of Political Science and Contemporary Africa at MIT, and is also director of the Center for International Studies. During a semester-long sabbatical, he’s currently based at the African Climate and Development Initiative at the University of Cape Town.In this Q&A, Lieberman discusses several climate-related research projects he’s pursuing in South Africa and surrounding countries. This is part of an ongoing series exploring how the School of Humanities, Arts, and Social Sciences is addressing the climate crisis.Q: South Africa is a nation whose political and economic development you have long studied and written about. Do you see this visit as an extension of the kind of research you have been pursuing, or a departure from it?A: Much of my previous work has been animated by the question of understanding the causes and consequences of group-based disparities, whether due to AIDS or Covid. These are problems that know no geographic boundaries, and where ethnic and racial minorities are often hardest hit. Climate change is an analogous problem, with these minority populations living in places where they are most vulnerable, in heat islands in cities, and in coastal areas where they are not protected. The reality is they might get hit much harder by longer-term trends and immediate shocks.In one line of research, I seek to understand how people in different African countries, in different ethnic groups, perceive the problems of climate change and their governments’ response to it. There are ethnic divisions of labor in terms of what people do — whether they are farmers or pastoralists, or live in cities. So some ethnic groups are simply more affected by drought or extreme weather than others, and this can be a basis for conflict, especially when competing for often limited government resources.In this area, just like in my previous research, learning what shapes ordinary citizen perspectives is really important, because these views affect people’s everyday practices, and the extent to which they support certain kinds of policies and investments their government makes in response to climate-related challenges. But I will also try to learn more about the perspectives of policymakers and various development partners who seek to balance climate-related challenges against a host of other problems and priorities.Q: You recently published “Until We Have Won Our Liberty,” which examines the difficult transition of South Africa from apartheid to a democratic government, scrutinizing in particular whether the quality of life for citizens has improved in terms of housing, employment, discrimination, and ethnic conflicts. How do climate change-linked issues fit into your scholarship?A: I never saw myself as a climate researcher, but a number of years ago, heavily influenced by what I was learning at MIT, I began to recognize more and more how important the issue of climate change is. And I realized there were lots of ways in which the climate problem resonated with other kinds of problems I had tackled in earlier parts of my work.There was once a time when climate and the environment was the purview primarily of white progressives: the “tree huggers.” And that’s really changed in recent decades as it has become evident that the people who’ve been most affected by the climate emergency are ethnic and racial minorities. We saw with Hurricane Katrina and other places [that] if you are Black, you’re more likely to live in a vulnerable area and to just generally experience more environmental harms, from pollution and emissions, leaving these communities much less resilient than white communities. Government has largely not addressed this inequity. When you look at American survey data in terms of who’s concerned about climate change, Black Americans, Hispanic Americans, and Asian Americans are more unified in their worries than are white Americans.There are analogous problems in Africa, my career research focus. Governments there have long responded in different ways to different ethnic groups. The research I am starting looks at the extent to which there are disparities in how governments try to solve climate-related challenges.Q: It’s difficult enough in the United States taking the measure of different groups’ perceptions of the impact of climate change and government’s effectiveness in contending with it. How do you go about this in Africa?A: Surprisingly, there’s only been a little bit of work done so far on how ordinary African citizens, who are ostensibly being hit the hardest in the world by the climate emergency, are thinking about this problem. Climate change has not been politicized there in a very big way. In fact, only 50 percent of Africans in one poll had heard of the term.In one of my new projects, with political science faculty colleague Devin Caughey and political science doctoral student Preston Johnston, we are analyzing social and climate survey data [generated by the Afrobarometer research network] from over 30 African countries to understand within and across countries the ways in which ethnic identities structure people’s perception of the climate crisis, and their beliefs in what government ought to be doing. In largely agricultural African societies, people routinely experience drought, extreme rain, and heat. They also lack the infrastructure that can shield them from the intense variability of weather patterns. But we’re adding a lens, which is looking at sources of inequality, especially ethnic differences.I will also be investigating specific sectors. Africa is a continent where in most places people cannot take for granted universal, piped access to clean water. In Cape Town, several years ago, the combination of failure to replace infrastructure and lack of rain caused such extreme conditions that one of the world’s most important cities almost ran out of water.While these studies are in progress, it is clear that in many countries, there are substantively large differences in perceptions of the severity of climate change, and attitudes about who should be doing what, and who’s capable of doing what. In several countries, both perceptions and policy preferences are differentiated along ethnic lines, more so than with respect to generational or class differences within societies.This is interesting as a phenomenon, but substantively, I think it’s important in that it may provide the basis for how politicians and government actors decide to move on allocating resources and implementing climate-protection policies. We see this kind of political calculation in the U.S. and we shouldn’t be surprised that it happens in Africa as well.That’s ultimately one of the challenges from the perch of MIT, where we’re really interested in understanding climate change, and creating technological tools and policies for mitigating the problem or adapting to it. The reality is frustrating. The political world — those who make decisions about whether to acknowledge the problem and whether to implement resources in the best technical way — are playing a whole other game. That game is about rewarding key supporters and being reelected.Q: So how do you go from measuring perceptions and beliefs among citizens about climate change and government responsiveness to those problems, to policies and actions that might actually reduce disparities in the way climate-vulnerable African groups receive support?A: Some of the work I have been doing involves understanding what local and national governments across Africa are actually doing to address these problems. We will have to drill down into government budgets to determine the actual resources devoted to addressing a challenge, what sorts of practices the government follows, and the political ramifications for governments that act aggressively versus those that don’t. With the Cape Town water crisis, for example, the government dramatically changed residents’ water usage through naming and shaming, and transformed institutional practices of water collection. They made it through a major drought by using much less water, and doing it with greater energy efficiency. Through the government’s strong policy and implementation, and citizens’ active responses, an entire city, with all its disparate groups, gained resilience. Maybe we can highlight creative solutions to major climate-related problems and use them as prods to push more effective policies and solutions in other places.In the MIT Global Diversity Lab, along with political science faculty colleague Volha Charnysh, political science doctoral student Jared Kalow, and Institute for Data, Systems and Society doctoral student Erin Walk, we are exploring American perspectives on climate-related foreign aid, asking survey respondents whether the U.S. should be giving more to people in the global South who didn’t cause the problems of climate change but have to suffer the externalities. We are particularly interested in whether people’s desire to help vulnerable communities rests on the racial or national identity of those communities.From my new seat as director of the Center for International Studies (CIS), I hope to do more and more to connect social science findings to relevant policymakers, whether in the U.S. or in other places. CIS is making climate one of our thematic priority areas, directing hundreds of thousands of dollars for MIT faculty to spark climate collaborations with researchers worldwide through the Global Seed Fund program. COP 28 (the U.N. Climate Change Conference), which I attended in December in Dubai, really drove home the importance of people coming together from around the world to exchange ideas and form networks. It was unbelievably large, with 85,000 people. But so many of us shared the belief that we are not doing enough. We need enforceable global solutions and innovation. We need ways of financing. We need to provide opportunities for journalists to broadcast the importance of this problem. And we need to understand the incentives that different actors have and what sorts of messages and strategies will resonate with them, and inspire those who have resources to be more generous. More

  • in

    Using generative AI to improve software testing

    Generative AI is getting plenty of attention for its ability to create text and images. But those media represent only a fraction of the data that proliferate in our society today. Data are generated every time a patient goes through a medical system, a storm impacts a flight, or a person interacts with a software application.

    Using generative AI to create realistic synthetic data around those scenarios can help organizations more effectively treat patients, reroute planes, or improve software platforms — especially in scenarios where real-world data are limited or sensitive.

    For the last three years, the MIT spinout DataCebo has offered a generative software system called the Synthetic Data Vault to help organizations create synthetic data to do things like test software applications and train machine learning models.

    The Synthetic Data Vault, or SDV, has been downloaded more than 1 million times, with more than 10,000 data scientists using the open-source library for generating synthetic tabular data. The founders — Principal Research Scientist Kalyan Veeramachaneni and alumna Neha Patki ’15, SM ’16 — believe the company’s success is due to SDV’s ability to revolutionize software testing.

    SDV goes viral

    In 2016, Veeramachaneni’s group in the Data to AI Lab unveiled a suite of open-source generative AI tools to help organizations create synthetic data that matched the statistical properties of real data.

    Companies can use synthetic data instead of sensitive information in programs while still preserving the statistical relationships between datapoints. Companies can also use synthetic data to run new software through simulations to see how it performs before releasing it to the public.

    Veeramachaneni’s group came across the problem because it was working with companies that wanted to share their data for research.

    “MIT helps you see all these different use cases,” Patki explains. “You work with finance companies and health care companies, and all those projects are useful to formulate solutions across industries.”

    In 2020, the researchers founded DataCebo to build more SDV features for larger organizations. Since then, the use cases have been as impressive as they’ve been varied.

    With DataCebo’s new flight simulator, for instance, airlines can plan for rare weather events in a way that would be impossible using only historic data. In another application, SDV users synthesized medical records to predict health outcomes for patients with cystic fibrosis. A team from Norway recently used SDV to create synthetic student data to evaluate whether various admissions policies were meritocratic and free from bias.

    In 2021, the data science platform Kaggle hosted a competition for data scientists that used SDV to create synthetic data sets to avoid using proprietary data. Roughly 30,000 data scientists participated, building solutions and predicting outcomes based on the company’s realistic data.

    And as DataCebo has grown, it’s stayed true to its MIT roots: All of the company’s current employees are MIT alumni.

    Supercharging software testing

    Although their open-source tools are being used for a variety of use cases, the company is focused on growing its traction in software testing.

    “You need data to test these software applications,” Veeramachaneni says. “Traditionally, developers manually write scripts to create synthetic data. With generative models, created using SDV, you can learn from a sample of data collected and then sample a large volume of synthetic data (which has the same properties as real data), or create specific scenarios and edge cases, and use the data to test your application.”

    For example, if a bank wanted to test a program designed to reject transfers from accounts with no money in them, it would have to simulate many accounts simultaneously transacting. Doing that with data created manually would take a lot of time. With DataCebo’s generative models, customers can create any edge case they want to test.

    “It’s common for industries to have data that is sensitive in some capacity,” Patki says. “Often when you’re in a domain with sensitive data you’re dealing with regulations, and even if there aren’t legal regulations, it’s in companies’ best interest to be diligent about who gets access to what at which time. So, synthetic data is always better from a privacy perspective.”

    Scaling synthetic data

    Veeramachaneni believes DataCebo is advancing the field of what it calls synthetic enterprise data, or data generated from user behavior on large companies’ software applications.

    “Enterprise data of this kind is complex, and there is no universal availability of it, unlike language data,” Veeramachaneni says. “When folks use our publicly available software and report back if works on a certain pattern, we learn a lot of these unique patterns, and it allows us to improve our algorithms. From one perspective, we are building a corpus of these complex patterns, which for language and images is readily available. “

    DataCebo also recently released features to improve SDV’s usefulness, including tools to assess the “realism” of the generated data, called the SDMetrics library as well as a way to compare models’ performances called SDGym.

    “It’s about ensuring organizations trust this new data,” Veeramachaneni says. “[Our tools offer] programmable synthetic data, which means we allow enterprises to insert their specific insight and intuition to build more transparent models.”

    As companies in every industry rush to adopt AI and other data science tools, DataCebo is ultimately helping them do so in a way that is more transparent and responsible.

    “In the next few years, synthetic data from generative models will transform all data work,” Veeramachaneni says. “We believe 90 percent of enterprise operations can be done with synthetic data.” More

  • in

    Dealing with the limitations of our noisy world

    Tamara Broderick first set foot on MIT’s campus when she was a high school student, as a participant in the inaugural Women’s Technology Program. The monthlong summer academic experience gives young women a hands-on introduction to engineering and computer science.

    What is the probability that she would return to MIT years later, this time as a faculty member?

    That’s a question Broderick could probably answer quantitatively using Bayesian inference, a statistical approach to probability that tries to quantify uncertainty by continuously updating one’s assumptions as new data are obtained.

    In her lab at MIT, the newly tenured associate professor in the Department of Electrical Engineering and Computer Science (EECS) uses Bayesian inference to quantify uncertainty and measure the robustness of data analysis techniques.

    “I’ve always been really interested in understanding not just ‘What do we know from data analysis,’ but ‘How well do we know it?’” says Broderick, who is also a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society. “The reality is that we live in a noisy world, and we can’t always get exactly the data that we want. How do we learn from data but at the same time recognize that there are limitations and deal appropriately with them?”

    Broadly, her focus is on helping people understand the confines of the statistical tools available to them and, sometimes, working with them to craft better tools for a particular situation.

    For instance, her group recently collaborated with oceanographers to develop a machine-learning model that can make more accurate predictions about ocean currents. In another project, she and others worked with degenerative disease specialists on a tool that helps severely motor-impaired individuals utilize a computer’s graphical user interface by manipulating a single switch.

    A common thread woven through her work is an emphasis on collaboration.

    “Working in data analysis, you get to hang out in everybody’s backyard, so to speak. You really can’t get bored because you can always be learning about some other field and thinking about how we can apply machine learning there,” she says.

    Hanging out in many academic “backyards” is especially appealing to Broderick, who struggled even from a young age to narrow down her interests.

    A math mindset

    Growing up in a suburb of Cleveland, Ohio, Broderick had an interest in math for as long as she can remember. She recalls being fascinated by the idea of what would happen if you kept adding a number to itself, starting with 1+1=2 and then 2+2=4.

    “I was maybe 5 years old, so I didn’t know what ‘powers of two’ were or anything like that. I was just really into math,” she says.

    Her father recognized her interest in the subject and enrolled her in a Johns Hopkins program called the Center for Talented Youth, which gave Broderick the opportunity to take three-week summer classes on a range of subjects, from astronomy to number theory to computer science.

    Later, in high school, she conducted astrophysics research with a postdoc at Case Western University. In the summer of 2002, she spent four weeks at MIT as a member of the first class of the Women’s Technology Program.

    She especially enjoyed the freedom offered by the program, and its focus on using intuition and ingenuity to achieve high-level goals. For instance, the cohort was tasked with building a device with LEGOs that they could use to biopsy a grape suspended in Jell-O.

    The program showed her how much creativity is involved in engineering and computer science, and piqued her interest in pursuing an academic career.

    “But when I got into college at Princeton, I could not decide — math, physics, computer science — they all seemed super-cool. I wanted to do all of it,” she says.

    She settled on pursuing an undergraduate math degree but took all the physics and computer science courses she could cram into her schedule.

    Digging into data analysis

    After receiving a Marshall Scholarship, Broderick spent two years at Cambridge University in the United Kingdom, earning a master of advanced study in mathematics and a master of philosophy in physics.

    In the UK, she took a number of statistics and data analysis classes, including her first class on Bayesian data analysis in the field of machine learning.

    It was a transformative experience, she recalls.

    “During my time in the U.K., I realized that I really like solving real-world problems that matter to people, and Bayesian inference was being used in some of the most important problems out there,” she says.

    Back in the U.S., Broderick headed to the University of California at Berkeley, where she joined the lab of Professor Michael I. Jordan as a grad student. She earned a PhD in statistics with a focus on Bayesian data analysis. 

    She decided to pursue a career in academia and was drawn to MIT by the collaborative nature of the EECS department and by how passionate and friendly her would-be colleagues were.

    Her first impressions panned out, and Broderick says she has found a community at MIT that helps her be creative and explore hard, impactful problems with wide-ranging applications.

    “I’ve been lucky to work with a really amazing set of students and postdocs in my lab — brilliant and hard-working people whose hearts are in the right place,” she says.

    One of her team’s recent projects involves a collaboration with an economist who studies the use of microcredit, or the lending of small amounts of money at very low interest rates, in impoverished areas.

    The goal of microcredit programs is to raise people out of poverty. Economists run randomized control trials of villages in a region that receive or don’t receive microcredit. They want to generalize the study results, predicting the expected outcome if one applies microcredit to other villages outside of their study.

    But Broderick and her collaborators have found that results of some microcredit studies can be very brittle. Removing one or a few data points from the dataset can completely change the results. One issue is that researchers often use empirical averages, where a few very high or low data points can skew the results.

    Using machine learning, she and her collaborators developed a method that can determine how many data points must be dropped to change the substantive conclusion of the study. With their tool, a scientist can see how brittle the results are.

    “Sometimes dropping a very small fraction of data can change the major results of a data analysis, and then we might worry how far those conclusions generalize to new scenarios. Are there ways we can flag that for people? That is what we are getting at with this work,” she explains.

    At the same time, she is continuing to collaborate with researchers in a range of fields, such as genetics, to understand the pros and cons of different machine-learning techniques and other data analysis tools.

    Happy trails

    Exploration is what drives Broderick as a researcher, and it also fuels one of her passions outside the lab. She and her husband enjoy collecting patches they earn by hiking all the trails in a park or trail system.

    “I think my hobby really combines my interests of being outdoors and spreadsheets,” she says. “With these hiking patches, you have to explore everything and then you see areas you wouldn’t normally see. It is adventurous, in that way.”

    They’ve discovered some amazing hikes they would never have known about, but also embarked on more than a few “total disaster hikes,” she says. But each hike, whether a hidden gem or an overgrown mess, offers its own rewards.

    And just like in her research, curiosity, open-mindedness, and a passion for problem-solving have never led her astray. More

  • in

    Generating opportunities with generative AI

    Talking with retail executives back in 2010, Rama Ramakrishnan came to two realizations. First, although retail systems that offered customers personalized recommendations were getting a great deal of attention, these systems often provided little payoff for retailers. Second, for many of the firms, most customers shopped only once or twice a year, so companies didn’t really know much about them.

    “But by being very diligent about noting down the interactions a customer has with a retailer or an e-commerce site, we can create a very nice and detailed composite picture of what that person does and what they care about,” says Ramakrishnan, professor of the practice at the MIT Sloan School of Management. “Once you have that, then you can apply proven algorithms from machine learning.”

    These realizations led Ramakrishnan to found CQuotient, a startup whose software has now become the foundation for Salesforce’s widely adopted AI e-commerce platform. “On Black Friday alone, CQuotient technology probably sees and interacts with over a billion shoppers on a single day,” he says.

    After a highly successful entrepreneurial career, in 2019 Ramakrishnan returned to MIT Sloan, where he had earned master’s and PhD degrees in operations research in the 1990s. He teaches students “not just how these amazing technologies work, but also how do you take these technologies and actually put them to use pragmatically in the real world,” he says.

    Additionally, Ramakrishnan enjoys participating in MIT executive education. “This is a great opportunity for me to convey the things that I have learned, but also as importantly, to learn what’s on the minds of these senior executives, and to guide them and nudge them in the right direction,” he says.

    For example, executives are understandably concerned about the need for massive amounts of data to train machine learning systems. He can now guide them to a wealth of models that are pre-trained for specific tasks. “The ability to use these pre-trained AI models, and very quickly adapt them to your particular business problem, is an incredible advance,” says Ramakrishnan.

    Rama Ramakrishnan – Utilizing AI in Real World Applications for Intelligent WorkVideo: MIT Industrial Liaison Program

    Understanding AI categories

    “AI is the quest to imbue computers with the ability to do cognitive tasks that typically only humans can do,” he says. Understanding the history of this complex, supercharged landscape aids in exploiting the technologies.

    The traditional approach to AI, which basically solved problems by applying if/then rules learned from humans, proved useful for relatively few tasks. “One reason is that we can do lots of things effortlessly, but if asked to explain how we do them, we can’t actually articulate how we do them,” Ramakrishnan comments. Also, those systems may be baffled by new situations that don’t match up to the rules enshrined in the software.

    Machine learning takes a dramatically different approach, with the software fundamentally learning by example. “You give it lots of examples of inputs and outputs, questions and answers, tasks and responses, and get the computer to automatically learn how to go from the input to the output,” he says. Credit scoring, loan decision-making, disease prediction, and demand forecasting are among the many tasks conquered by machine learning.

    But machine learning only worked well when the input data was structured, for instance in a spreadsheet. “If the input data was unstructured, such as images, video, audio, ECGs, or X-rays, it wasn’t very good at going from that to a predicted output,” Ramakrishnan says. That means humans had to manually structure the unstructured data to train the system.

    Around 2010 deep learning began to overcome that limitation, delivering the ability to directly work with unstructured input data, he says. Based on a longstanding AI strategy known as neural networks, deep learning became practical due to the global flood tide of data, the availability of extraordinarily powerful parallel processing hardware called graphics processing units (originally invented for video games) and advances in algorithms and math.

    Finally, within deep learning, the generative AI software packages appearing last year can create unstructured outputs, such as human-sounding text, images of dogs, and three-dimensional models. Large language models (LLMs) such as OpenAI’s ChatGPT go from text inputs to text outputs, while text-to-image models such as OpenAI’s DALL-E can churn out realistic-appearing images.

    Rama Ramakrishnan – Making Note of Little Data to Improve Customer ServiceVideo: MIT Industrial Liaison Program

    What generative AI can (and can’t) do

    Trained on the unimaginably vast text resources of the internet, a LLM’s “fundamental capability is to predict the next most likely, most plausible word,” Ramakrishnan says. “Then it attaches the word to the original sentence, predicts the next word again, and keeps on doing it.”

    “To the surprise of many, including a lot of researchers, an LLM can do some very complicated things,” he says. “It can compose beautifully coherent poetry, write Seinfeld episodes, and solve some kinds of reasoning problems. It’s really quite remarkable how next-word prediction can lead to these amazing capabilities.”

    “But you have to always keep in mind that what it is doing is not so much finding the correct answer to your question as finding a plausible answer your question,” Ramakrishnan emphasizes. Its content may be factually inaccurate, irrelevant, toxic, biased, or offensive.

    That puts the burden on users to make sure that the output is correct, relevant, and useful for the task at hand. “You have to make sure there is some way for you to check its output for errors and fix them before it goes out,” he says.

    Intense research is underway to find techniques to address these shortcomings, adds Ramakrishnan, who expects many innovative tools to do so.

    Finding the right corporate roles for LLMs

    Given the astonishing progress in LLMs, how should industry think about applying the software to tasks such as generating content?

    First, Ramakrishnan advises, consider costs: “Is it a much less expensive effort to have a draft that you correct, versus you creating the whole thing?” Second, if the LLM makes a mistake that slips by, and the mistaken content is released to the outside world, can you live with the consequences?

    “If you have an application which satisfies both considerations, then it’s good to do a pilot project to see whether these technologies can actually help you with that particular task,” says Ramakrishnan. He stresses the need to treat the pilot as an experiment rather than as a normal IT project.

    Right now, software development is the most mature corporate LLM application. “ChatGPT and other LLMs are text-in, text-out, and a software program is just text-out,” he says. “Programmers can go from English text-in to Python text-out, as well as you can go from English-to-English or English-to-German. There are lots of tools which help you write code using these technologies.”

    Of course, programmers must make sure the result does the job properly. Fortunately, software development already offers infrastructure for testing and verifying code. “This is a beautiful sweet spot,” he says, “where it’s much cheaper to have the technology write code for you, because you can very quickly check and verify it.”

    Another major LLM use is content generation, such as writing marketing copy or e-commerce product descriptions. “Again, it may be much cheaper to fix ChatGPT’s draft than for you to write the whole thing,” Ramakrishnan says. “However, companies must be very careful to make sure there is a human in the loop.”

    LLMs also are spreading quickly as in-house tools to search enterprise documents. Unlike conventional search algorithms, an LLM chatbot can offer a conversational search experience, because it remembers each question you ask. “But again, it will occasionally make things up,” he says. “In terms of chatbots for external customers, these are very early days, because of the risk of saying something wrong to the customer.”

    Overall, Ramakrishnan notes, we’re living in a remarkable time to grapple with AI’s rapidly evolving potentials and pitfalls. “I help companies figure out how to take these very transformative technologies and put them to work, to make products and services much more intelligent, employees much more productive, and processes much more efficient,” he says. More