More stories

  • in

    How an archeological approach can help leverage biased data in AI to improve medicine

    The classic computer science adage “garbage in, garbage out” lacks nuance when it comes to understanding biased medical data, argue computer science and bioethics professors from MIT, Johns Hopkins University, and the Alan Turing Institute in a new opinion piece published in a recent edition of the New England Journal of Medicine (NEJM). The rising popularity of artificial intelligence has brought increased scrutiny to the matter of biased AI models resulting in algorithmic discrimination, which the White House Office of Science and Technology identified as a key issue in their recent Blueprint for an AI Bill of Rights. 

    When encountering biased data, particularly for AI models used in medical settings, the typical response is to either collect more data from underrepresented groups or generate synthetic data making up for missing parts to ensure that the model performs equally well across an array of patient populations. But the authors argue that this technical approach should be augmented with a sociotechnical perspective that takes both historical and current social factors into account. By doing so, researchers can be more effective in addressing bias in public health. 

    “The three of us had been discussing the ways in which we often treat issues with data from a machine learning perspective as irritations that need to be managed with a technical solution,” recalls co-author Marzyeh Ghassemi, an assistant professor in electrical engineering and computer science and an affiliate of the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Institute of Medical Engineering and Science (IMES). “We had used analogies of data as an artifact that gives a partial view of past practices, or a cracked mirror holding up a reflection. In both cases the information is perhaps not entirely accurate or favorable: Maybe we think that we behave in certain ways as a society — but when you actually look at the data, it tells a different story. We might not like what that story is, but once you unearth an understanding of the past you can move forward and take steps to address poor practices.” 

    Data as artifact 

    In the paper, titled “Considering Biased Data as Informative Artifacts in AI-Assisted Health Care,” Ghassemi, Kadija Ferryman, and Maxine Mackintosh make the case for viewing biased clinical data as “artifacts” in the same way anthropologists or archeologists would view physical objects: pieces of civilization-revealing practices, belief systems, and cultural values — in the case of the paper, specifically those that have led to existing inequities in the health care system. 

    For example, a 2019 study showed that an algorithm widely considered to be an industry standard used health-care expenditures as an indicator of need, leading to the erroneous conclusion that sicker Black patients require the same level of care as healthier white patients. What researchers found was algorithmic discrimination failing to account for unequal access to care.  

    In this instance, rather than viewing biased datasets or lack of data as problems that only require disposal or fixing, Ghassemi and her colleagues recommend the “artifacts” approach as a way to raise awareness around social and historical elements influencing how data are collected and alternative approaches to clinical AI development. 

    “If the goal of your model is deployment in a clinical setting, you should engage a bioethicist or a clinician with appropriate training reasonably early on in problem formulation,” says Ghassemi. “As computer scientists, we often don’t have a complete picture of the different social and historical factors that have gone into creating data that we’ll be using. We need expertise in discerning when models generalized from existing data may not work well for specific subgroups.” 

    When more data can actually harm performance 

    The authors acknowledge that one of the more challenging aspects of implementing an artifact-based approach is being able to assess whether data have been racially corrected: i.e., using white, male bodies as the conventional standard that other bodies are measured against. The opinion piece cites an example from the Chronic Kidney Disease Collaboration in 2021, which developed a new equation to measure kidney function because the old equation had previously been “corrected” under the blanket assumption that Black people have higher muscle mass. Ghassemi says that researchers should be prepared to investigate race-based correction as part of the research process. 

    In another recent paper accepted to this year’s International Conference on Machine Learning co-authored by Ghassemi’s PhD student Vinith Suriyakumar and University of California at San Diego Assistant Professor Berk Ustun, the researchers found that assuming the inclusion of personalized attributes like self-reported race improve the performance of ML models can actually lead to worse risk scores, models, and metrics for minority and minoritized populations.  

    “There’s no single right solution for whether or not to include self-reported race in a clinical risk score. Self-reported race is a social construct that is both a proxy for other information, and deeply proxied itself in other medical data. The solution needs to fit the evidence,” explains Ghassemi. 

    How to move forward 

    This is not to say that biased datasets should be enshrined, or biased algorithms don’t require fixing — quality training data is still key to developing safe, high-performance clinical AI models, and the NEJM piece highlights the role of the National Institutes of Health (NIH) in driving ethical practices.  

    “Generating high-quality, ethically sourced datasets is crucial for enabling the use of next-generation AI technologies that transform how we do research,” NIH acting director Lawrence Tabak stated in a press release when the NIH announced its $130 million Bridge2AI Program last year. Ghassemi agrees, pointing out that the NIH has “prioritized data collection in ethical ways that cover information we have not previously emphasized the value of in human health — such as environmental factors and social determinants. I’m very excited about their prioritization of, and strong investments towards, achieving meaningful health outcomes.” 

    Elaine Nsoesie, an associate professor at the Boston University of Public Health, believes there are many potential benefits to treating biased datasets as artifacts rather than garbage, starting with the focus on context. “Biases present in a dataset collected for lung cancer patients in a hospital in Uganda might be different from a dataset collected in the U.S. for the same patient population,” she explains. “In considering local context, we can train algorithms to better serve specific populations.” Nsoesie says that understanding the historical and contemporary factors shaping a dataset can make it easier to identify discriminatory practices that might be coded in algorithms or systems in ways that are not immediately obvious. She also notes that an artifact-based approach could lead to the development of new policies and structures ensuring that the root causes of bias in a particular dataset are eliminated. 

    “People often tell me that they are very afraid of AI, especially in health. They’ll say, ‘I’m really scared of an AI misdiagnosing me,’ or ‘I’m concerned it will treat me poorly,’” Ghassemi says. “I tell them, you shouldn’t be scared of some hypothetical AI in health tomorrow, you should be scared of what health is right now. If we take a narrow technical view of the data we extract from systems, we could naively replicate poor practices. That’s not the only option — realizing there is a problem is our first step towards a larger opportunity.”  More

  • in

    Helping computer vision and language models understand what they see

    Powerful machine-learning algorithms known as vision and language models, which learn to match text with images, have shown remarkable results when asked to generate captions or summarize videos.

    While these models excel at identifying objects, they often struggle to understand concepts, like object attributes or the arrangement of items in a scene. For instance, a vision and language model might recognize the cup and table in an image, but fail to grasp that the cup is sitting on the table.

    Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have demonstrated a new technique that utilizes computer-generated data to help vision and language models overcome this shortcoming.

    The researchers created a synthetic dataset of images that depict a wide range of scenarios, object arrangements, and human actions, coupled with detailed text descriptions. They used this annotated dataset to “fix” vision and language models so they can learn concepts more effectively. Their technique ensures these models can still make accurate predictions when they see real images.

    When they tested models on concept understanding, the researchers found that their technique boosted accuracy by up to 10 percent. This could improve systems that automatically caption videos or enhance models that provide natural language answers to questions about images, with applications in fields like e-commerce or health care.

    “With this work, we are going beyond nouns in the sense that we are going beyond just the names of objects to more of the semantic concept of an object and everything around it. Our idea was that, when a machine-learning model sees objects in many different arrangements, it will have a better idea of how arrangement matters in a scene,” says Khaled Shehada, a graduate student in the Department of Electrical Engineering and Computer Science and co-author of a paper on this technique.

    Shehada wrote the paper with lead author Paola Cascante-Bonilla, a computer science graduate student at Rice University; Aude Oliva, director of strategic industry engagement at the MIT Schwarzman College of Computing, MIT director of the MIT-IBM Watson AI Lab, and a senior research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL); senior author Leonid Karlinsky, a research staff member in the MIT-IBM Watson AI Lab; and others at MIT, the MIT-IBM Watson AI Lab, Georgia Tech, Rice University, École des Ponts, Weizmann Institute of Science, and IBM Research. The paper will be presented at the International Conference on Computer Vision.

    Focusing on objects

    Vision and language models typically learn to identify objects in a scene, and can end up ignoring object attributes, such as color and size, or positional relationships, such as which object is on top of another object.

    This is due to the method with which these models are often trained, known as contrastive learning. This training method involves forcing a model to predict the correspondence between images and text. When comparing natural images, the objects in each scene tend to cause the most striking differences. (Perhaps one image shows a horse in a field while the second shows a sailboat on the water.)

    “Every image could be uniquely defined by the objects in the image. So, when you do contrastive learning, just focusing on the nouns and objects would solve the problem. Why would the model do anything differently?” says Karlinsky.

    The researchers sought to mitigate this problem by using synthetic data to fine-tune a vision and language model. The fine-tuning process involves tweaking a model that has already been trained to improve its performance on a specific task.

    They used a computer to automatically create synthetic videos with diverse 3D environments and objects, such as furniture and luggage, and added human avatars that interacted with the objects.

    Using individual frames of these videos, they generated nearly 800,000 photorealistic images, and then paired each with a detailed caption. The researchers developed a methodology for annotating every aspect of the image to capture object attributes, positional relationships, and human-object interactions clearly and consistently in dense captions.

    Because the researchers created the images, they could control the appearance and position of objects, as well as the gender, clothing, poses, and actions of the human avatars.

    “Synthetic data allows a lot of diversity. With real images, you might not have a lot of elephants in a room, but with synthetic data, you could actually have a pink elephant in a room with a human, if you want,” Cascante-Bonilla says.

    Synthetic data have other advantages, too. They are cheaper to generate than real data, yet the images are highly photorealistic. They also preserve privacy because no real humans are shown in the images. And, because data are produced automatically by a computer, they can be generated quickly in massive quantities.

    By using different camera viewpoints, or slightly changing the positions or attributes of objects, the researchers created a dataset with a far wider variety of scenarios than one would find in a natural dataset.

    Fine-tune, but don’t forget

    However, when one fine-tunes a model with synthetic data, there is a risk that model might “forget” what it learned when it was originally trained with real data.

    The researchers employed a few techniques to prevent this problem, such as adjusting the synthetic data so colors, lighting, and shadows more closely match those found in natural images. They also made adjustments to the model’s inner-workings after fine-tuning to further reduce any forgetfulness.

    Their synthetic dataset and fine-tuning strategy improved the ability of popular vision and language models to accurately recognize concepts by up to 10 percent. At the same time, the models did not forget what they had already learned.

    Now that they have shown how synthetic data can be used to solve this problem, the researchers want to identify ways to improve the visual quality and diversity of these data, as well as the underlying physics that makes synthetic scenes look realistic. In addition, they plan to test the limits of scalability, and investigate whether model improvement starts to plateau with larger and more diverse synthetic datasets.

    This research is funded, in part, by the U.S. Defense Advanced Research Projects Agency, the National Science Foundation, and the MIT-IBM Watson AI Lab. More

  • in

    Fast-tracking fusion energy’s arrival with AI and accessibility

    As the impacts of climate change continue to grow, so does interest in fusion’s potential as a clean energy source. While fusion reactions have been studied in laboratories since the 1930s, there are still many critical questions scientists must answer to make fusion power a reality, and time is of the essence. As part of their strategy to accelerate fusion energy’s arrival and reach carbon neutrality by 2050, the U.S. Department of Energy (DoE) has announced new funding for a project led by researchers at MIT’s Plasma Science and Fusion Center (PSFC) and four collaborating institutions.

    Cristina Rea, a research scientist and group leader at the PSFC, will serve as the primary investigator for the newly funded three-year collaboration to pilot the integration of fusion data into a system that can be read by AI-powered tools. The PSFC, together with scientists from William & Mary, the University of Wisconsin at Madison, Auburn University, and the nonprofit HDF Group, plan to create a holistic fusion data platform, the elements of which could offer unprecedented access for researchers, especially underrepresented students. The project aims to encourage diverse participation in fusion and data science, both in academia and the workforce, through outreach programs led by the group’s co-investigators, of whom four out of five are women. 

    The DoE’s award, part of a $29 million funding package for seven projects across 19 institutions, will support the group’s efforts to distribute data produced by fusion devices like the PSFC’s Alcator C-Mod, a donut-shaped “tokamak” that utilized powerful magnets to control and confine fusion reactions. Alcator C-Mod operated from 1991 to 2016 and its data are still being studied, thanks in part to the PSFC’s commitment to the free exchange of knowledge.

    Currently, there are nearly 50 public experimental magnetic confinement-type fusion devices; however, both historical and current data from these devices can be difficult to access. Some fusion databases require signing user agreements, and not all data are catalogued and organized the same way. Moreover, it can be difficult to leverage machine learning, a class of AI tools, for data analysis and to enable scientific discovery without time-consuming data reorganization. The result is fewer scientists working on fusion, greater barriers to discovery, and a bottleneck in harnessing AI to accelerate progress.

    The project’s proposed data platform addresses technical barriers by being FAIR — Findable, Interoperable, Accessible, Reusable — and by adhering to UNESCO’s Open Science (OS) recommendations to improve the transparency and inclusivity of science; all of the researchers’ deliverables will adhere to FAIR and OS principles, as required by the DoE. The platform’s databases will be built using MDSplusML, an upgraded version of the MDSplus open-source software developed by PSFC researchers in the 1980s to catalogue the results of Alcator C-Mod’s experiments. Today, nearly 40 fusion research institutes use MDSplus to store and provide external access to their fusion data. The release of MDSplusML aims to continue that legacy of open collaboration.

    The researchers intend to address barriers to participation for women and disadvantaged groups not only by improving general access to fusion data, but also through a subsidized summer school that will focus on topics at the intersection of fusion and machine learning, which will be held at William & Mary for the next three years.

    Of the importance of their research, Rea says, “This project is about responding to the fusion community’s needs and setting ourselves up for success. Scientific advancements in fusion are enabled via multidisciplinary collaboration and cross-pollination, so accessibility is absolutely essential. I think we all understand now that diverse communities have more diverse ideas, and they allow faster problem-solving.”

    The collaboration’s work also aligns with vital areas of research identified in the International Atomic Energy Agency’s “AI for Fusion” Coordinated Research Project (CRP). Rea was selected as the technical coordinator for the IAEA’s CRP emphasizing community engagement and knowledge access to accelerate fusion research and development. In a letter of support written for the group’s proposed project, the IAEA stated that, “the work [the researchers] will carry out […] will be beneficial not only to our CRP but also to the international fusion community in large.”

    PSFC Director and Hitachi America Professor of Engineering Dennis Whyte adds, “I am thrilled to see PSFC and our collaborators be at the forefront of applying new AI tools while simultaneously encouraging and enabling extraction of critical data from our experiments.”

    “Having the opportunity to lead such an important project is extremely meaningful, and I feel a responsibility to show that women are leaders in STEM,” says Rea. “We have an incredible team, strongly motivated to improve our fusion ecosystem and to contribute to making fusion energy a reality.” More

  • in

    New clean air and water labs to bring together researchers, policymakers to find climate solutions

    MIT’s Abdul Latif Jameel Poverty Action Lab (J-PAL) is launching the Clean Air and Water Labs, with support from Community Jameel, to generate evidence-based solutions aimed at increasing access to clean air and water.

    Led by J-PAL’s Africa, Middle East and North Africa (MENA), and South Asia regional offices, the labs will partner with government agencies to bring together researchers and policymakers in areas where impactful clean air and water solutions are most urgently needed.

    Together, the labs aim to improve clean air and water access by informing the scaling of evidence-based policies and decisions of city, state, and national governments that serve nearly 260 million people combined.

    The Clean Air and Water Labs expand the work of J-PAL’s King Climate Action Initiative, building on the foundational support of King Philanthropies, which significantly expanded J-PAL’s work at the nexus of climate change and poverty alleviation worldwide. 

    Air pollution, water scarcity and the need for evidence 

    Africa, MENA, and South Asia are on the front lines of global air and water crises. 

    “There is no time to waste investing in solutions that do not achieve their desired effects,” says Iqbal Dhaliwal, global executive director of J-PAL. “By co-generating rigorous real-world evidence with researchers, policymakers can have the information they need to dedicate resources to scaling up solutions that have been shown to be effective.”

    In India, about 75 percent of households did not have drinking water on premises in 2018. In MENA, nearly 90 percent of children live in areas facing high or extreme water stress. Across Africa, almost 400 million people lack access to safe drinking water. 

    Simultaneously, air pollution is one of the greatest threats to human health globally. In India, extraordinary levels of air pollution are shortening the average life expectancy by five years. In Africa, rising indoor and ambient air pollution contributed to 1.1 million premature deaths in 2019. 

    There is increasing urgency to find high-impact and cost-effective solutions to the worsening threats to human health and resources caused by climate change. However, data and evidence on potential solutions are limited.

    Fostering collaboration to generate policy-relevant evidence 

    The Clean Air and Water Labs will foster deep collaboration between government stakeholders, J-PAL regional offices, and researchers in the J-PAL network. 

    Through the labs, J-PAL will work with policymakers to:

    co-diagnose the most pressing air and water challenges and opportunities for policy innovation;
    expand policymakers’ access to and use of high-quality air and water data;
    co-design potential solutions informed by existing evidence;
    co-generate evidence on promising solutions through rigorous evaluation, leveraging existing and new data sources; and
    support scaling of air and water policies and programs that are found to be effective through evaluation. 
    A research and scaling fund for each lab will prioritize resources for co-generated pilot studies, randomized evaluations, and scaling projects. 

    The labs will also collaborate with C40 Cities, a global network of mayors of the world’s leading cities that are united in action to confront the climate crisis, to share policy-relevant evidence and identify opportunities for potential new connections and research opportunities within India and across Africa.

    This model aims to strengthen the use of evidence in decision-making to ensure solutions are highly effective and to guide research to answer policymakers’ most urgent questions. J-PAL Africa, MENA, and South Asia’s strong on-the-ground presence will further bridge research and policy work by anchoring activities within local contexts. 

    “Communities across the world continue to face challenges in accessing clean air and water, a threat to human safety that has only been exacerbated by the climate crisis, along with rising temperatures and other hazards,” says George Richards, director of Community Jameel. “Through our collaboration with J-PAL and C40 in creating climate policy labs embedded in city, state, and national governments in Africa and South Asia, we are committed to innovative and science-based approaches that can help hundreds of millions of people enjoy healthier lives.”

    J-PAL Africa, MENA, and South Asia will formally launch Clean Air and Water Labs with government partners over the coming months. J-PAL is housed in the MIT Department of Economics, within the School of Humanities, Arts, and Social Sciences. More

  • in

    Supporting sustainability, digital health, and the future of work

    The MIT and Accenture Convergence Initiative for Industry and Technology has selected three new research projects that will receive support from the initiative. The research projects aim to accelerate progress in meeting complex societal needs through new business convergence insights in technology and innovation.

    Established in MIT’s School of Engineering and now in its third year, the MIT and Accenture Convergence Initiative is furthering its mission to bring together technological experts from across business and academia to share insights and learn from one another. Recently, Thomas W. Malone, the Patrick J. McGovern (1959) Professor of Management, joined the initiative as its first-ever faculty lead. The research projects relate to three of the initiative’s key focus areas: sustainability, digital health, and the future of work.

    “The solutions these research teams are developing have the potential to have tremendous impact,” says Anantha Chandrakasan, dean of the School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. “They embody the initiative’s focus on advancing data-driven research that addresses technology and industry convergence.”

    “The convergence of science and technology driven by advancements in generative AI, digital twins, quantum computing, and other technologies makes this an especially exciting time for Accenture and MIT to be undertaking this joint research,” says Kenneth Munie, senior managing director at Accenture Strategy, Life Sciences. “Our three new research projects focusing on sustainability, digital health, and the future of work have the potential to help guide and shape future innovations that will benefit the way we work and live.”

    The MIT and Accenture Convergence Initiative charter project researchers are described below.

    Accelerating the journey to net zero with industrial clusters

    Jessika Trancik is a professor at the Institute for Data, Systems, and Society (IDSS). Trancik’s research examines the dynamic costs, performance, and environmental impacts of energy systems to inform climate policy and accelerate beneficial and equitable technology innovation. Trancik’s project aims to identify how industrial clusters can enable companies to derive greater value from decarbonization, potentially making companies more willing to invest in the clean energy transition.

    To meet the ambitious climate goals that have been set by countries around the world, rising greenhouse gas emissions trends must be rapidly reversed. Industrial clusters — geographically co-located or otherwise-aligned groups of companies representing one or more industries — account for a significant portion of greenhouse gas emissions globally. With major energy consumers “clustered” in proximity, industrial clusters provide a potential platform to scale low-carbon solutions by enabling the aggregation of demand and the coordinated investment in physical energy supply infrastructure.

    In addition to Trancik, the research team working on this project will include Aliza Khurram, a postdoc in IDSS; Micah Ziegler, an IDSS research scientist; Melissa Stark, global energy transition services lead at Accenture; Laura Sanderfer, strategy consulting manager at Accenture; and Maria De Miguel, strategy senior analyst at Accenture.

    Eliminating childhood obesity

    Anette “Peko” Hosoi is the Neil and Jane Pappalardo Professor of Mechanical Engineering. A common theme in her work is the fundamental study of shape, kinematic, and rheological optimization of biological systems with applications to the emergent field of soft robotics. Her project will use both data from existing studies and synthetic data to create a return-on-investment (ROI) calculator for childhood obesity interventions so that companies can identify earlier returns on their investment beyond reduced health-care costs.

    Childhood obesity is too prevalent to be solved by a single company, industry, drug, application, or program. In addition to the physical and emotional impact on children, society bears a cost through excess health care spending, lost workforce productivity, poor school performance, and increased family trauma. Meaningful solutions require multiple organizations, representing different parts of society, working together with a common understanding of the problem, the economic benefits, and the return on investment. ROI is particularly difficult to defend for any single organization because investment and return can be separated by many years and involve asymmetric investments, returns, and allocation of risk. Hosoi’s project will consider the incentives for a particular entity to invest in programs in order to reduce childhood obesity.

    Hosoi will be joined by graduate students Pragya Neupane and Rachael Kha, both of IDSS, as well a team from Accenture that includes Kenneth Munie, senior managing director at Accenture Strategy, Life Sciences; Kaveh Safavi, senior managing director in Accenture Health Industry; and Elizabeth Naik, global health and public service research lead.

    Generating innovative organizational configurations and algorithms for dealing with the problem of post-pandemic employment

    Thomas Malone is the Patrick J. McGovern (1959) Professor of Management at the MIT Sloan School of Management and the founding director of the MIT Center for Collective Intelligence. His research focuses on how new organizations can be designed to take advantage of the possibilities provided by information technology. Malone will be joined in this project by John Horton, the Richard S. Leghorn (1939) Career Development Professor at the MIT Sloan School of Management, whose research focuses on the intersection of labor economics, market design, and information systems. Malone and Horton’s project will look to reshape the future of work with the help of lessons learned in the wake of the pandemic.

    The Covid-19 pandemic has been a major disrupter of work and employment, and it is not at all obvious how governments, businesses, and other organizations should manage the transition to a desirable state of employment as the pandemic recedes. Using natural language processing algorithms such as GPT-4, this project will look to identify new ways that companies can use AI to better match applicants to necessary jobs, create new types of jobs, assess skill training needed, and identify interventions to help include women and other groups whose employment was disproportionately affected by the pandemic.

    In addition to Malone and Horton, the research team will include Rob Laubacher, associate director and research scientist at the MIT Center for Collective Intelligence, and Kathleen Kennedy, executive director at the MIT Center for Collective Intelligence and senior director at MIT Horizon. The team will also include Nitu Nivedita, managing director of artificial intelligence at Accenture, and Thomas Hancock, data science senior manager at Accenture. More

  • in

    Artificial intelligence for augmentation and productivity

    The MIT Stephen A. Schwarzman College of Computing has awarded seed grants to seven projects that are exploring how artificial intelligence and human-computer interaction can be leveraged to enhance modern work spaces to achieve better management and higher productivity.

    Funded by Andrew W. Houston ’05 and Dropbox Inc., the projects are intended to be interdisciplinary and bring together researchers from computing, social sciences, and management.

    The seed grants can enable the project teams to conduct research that leads to bigger endeavors in this rapidly evolving area, as well as build community around questions related to AI-augmented management.

    The seven selected projects and research leads include:

    “LLMex: Implementing Vannevar Bush’s Vision of the Memex Using Large Language Models,” led by Patti Maes of the Media Lab and David Karger of the Department of Electrical Engineering and Computer Science (EECS) and the Computer Science and Artificial Intelligence Laboratory (CSAIL). Inspired by Vannevar Bush’s Memex, this project proposes to design, implement, and test the concept of memory prosthetics using large language models (LLMs). The AI-based system will intelligently help an individual keep track of vast amounts of information, accelerate productivity, and reduce errors by automatically recording their work actions and meetings, supporting retrieval based on metadata and vague descriptions, and suggesting relevant, personalized information proactively based on the user’s current focus and context.

    “Using AI Agents to Simulate Social Scenarios,” led by John Horton of the MIT Sloan School of Management and Jacob Andreas of EECS and CSAIL. This project imagines the ability to easily simulate policies, organizational arrangements, and communication tools with AI agents before implementation. Tapping into the capabilities of modern LLMs to serve as a computational model of humans makes this vision of social simulation more realistic, and potentially more predictive.

    “Human Expertise in the Age of AI: Can We Have Our Cake and Eat it Too?” led by Manish Raghavan of MIT Sloan and EECS, and Devavrat Shah of EECS and the Laboratory for Information and Decision Systems. Progress in machine learning, AI, and in algorithmic decision aids has raised the prospect that algorithms may complement human decision-making in a wide variety of settings. Rather than replacing human professionals, this project sees a future where AI and algorithmic decision aids play a role that is complementary to human expertise.

    “Implementing Generative AI in U.S. Hospitals,” led by Julie Shah of the Department of Aeronautics and Astronautics and CSAIL, Retsef Levi of MIT Sloan and the Operations Research Center, Kate Kellog of MIT Sloan, and Ben Armstrong of the Industrial Performance Center. In recent years, studies have linked a rise in burnout from doctors and nurses in the United States with increased administrative burdens associated with electronic health records and other technologies. This project aims to develop a holistic framework to study how generative AI technologies can both increase productivity for organizations and improve job quality for workers in health care settings.

    “Generative AI Augmented Software Tools to Democratize Programming,” led by Harold Abelson of EECS and CSAIL, Cynthia Breazeal of the Media Lab, and Eric Klopfer of the Comparative Media Studies/Writing. Progress in generative AI over the past year is fomenting an upheaval in assumptions about future careers in software and deprecating the role of coding. This project will stimulate a similar transformation in computing education for those who have no prior technical training by creating a software tool that could eliminate much of the need for learners to deal with code when creating applications.

    “Acquiring Expertise and Societal Productivity in a World of Artificial Intelligence,” led by David Atkin and Martin Beraja of the Department of Economics, and Danielle Li of MIT Sloan. Generative AI is thought to augment the capabilities of workers performing cognitive tasks. This project seeks to better understand how the arrival of AI technologies may impact skill acquisition and productivity, and to explore complementary policy interventions that will allow society to maximize the gains from such technologies.

    “AI Augmented Onboarding and Support,” led by Tim Kraska of EECS and CSAIL, and Christoph Paus of the Department of Physics. While LLMs have made enormous leaps forward in recent years and are poised to fundamentally change the way students and professionals learn about new tools and systems, there is often a steep learning curve which people have to climb in order to make full use of the resource. To help mitigate the issue, this project proposes the development of new LLM-powered onboarding and support systems that will positively impact the way support teams operate and improve the user experience. More

  • in

    To improve solar and other clean energy tech, look beyond hardware

    To continue reducing the costs of solar energy and other clean energy technologies, scientists and engineers will likely need to focus, at least in part, on improving technology features that are not based on hardware, according to MIT researchers. They describe this finding and the mechanisms behind it today in Nature Energy.

    While the cost of installing a solar energy system has dropped by more than 99 percent since 1980, this new analysis shows that “soft technology” features, such as the codified permitting practices, supply chain management techniques, and system design processes that go into deploying a solar energy plant, contributed only 10 to 15 percent of total cost declines. Improvements to hardware features were responsible for the lion’s share.

    But because soft technology is increasingly dominating the total costs of installing solar energy systems, this trend threatens to slow future cost savings and hamper the global transition to clean energy, says the study’s senior author, Jessika Trancik, a professor in MIT’s Institute for Data, Systems, and Society (IDSS).

    Trancik’s co-authors include lead author Magdalena M. Klemun, a former IDSS graduate student and postdoc who is now an assistant professor at the Hong Kong University of Science and Technology; Goksin Kavlak, a former IDSS graduate student and postdoc who is now an associate at the Brattle Group; and James McNerney, a former IDSS postdoc and now senior research fellow at the Harvard Kennedy School.

    The team created a quantitative model to analyze the cost evolution of solar energy systems, which captures the contributions of both hardware technology features and soft technology features.

    The framework shows that soft technology hasn’t improved much over time — and that soft technology features contributed even less to overall cost declines than previously estimated.

    Their findings indicate that to reverse this trend and accelerate cost declines, engineers could look at making solar energy systems less reliant on soft technology to begin with, or they could tackle the problem directly by improving inefficient deployment processes.  

    “Really understanding where the efficiencies and inefficiencies are, and how to address those inefficiencies, is critical in supporting the clean energy transition. We are making huge investments of public dollars into this, and soft technology is going to be absolutely essential to making those funds count,” says Trancik.

    “However,” Klemun adds, “we haven’t been thinking about soft technology design as systematically as we have for hardware. That needs to change.”

    The hard truth about soft costs

    Researchers have observed that the so-called “soft costs” of building a solar power plant — the costs of designing and installing the plant — are becoming a much larger share of total costs. In fact, the share of soft costs now typically ranges from 35 to 64 percent.

    “We wanted to take a closer look at where these soft costs were coming from and why they weren’t coming down over time as quickly as the hardware costs,” Trancik says.

    In the past, scientists have modeled the change in solar energy costs by dividing total costs into additive components — hardware components and nonhardware components — and then tracking how these components changed over time.

    “But if you really want to understand where those rates of change are coming from, you need to go one level deeper to look at the technology features. Then things split out differently,” Trancik says.

    The researchers developed a quantitative approach that models the change in solar energy costs over time by assigning contributions to the individual technology features, including both hardware features and soft technology features.

    For instance, their framework would capture how much of the decline in system installation costs — a soft cost — is due to standardized practices of certified installers — a soft technology feature. It would also capture how that same soft cost is affected by increased photovoltaic module efficiency — a hardware technology feature.

    With this approach, the researchers saw that improvements in hardware had the greatest impacts on driving down soft costs in solar energy systems. For example, the efficiency of photovoltaic modules doubled between 1980 and 2017, reducing overall system costs by 17 percent. But about 40 percent of that overall decline could be attributed to reductions in soft costs tied to improved module efficiency.

    The framework shows that, while hardware technology features tend to improve many cost components, soft technology features affect only a few.

    “You can see this structural difference even before you collect data on how the technologies have changed over time. That’s why mapping out a technology’s network of cost dependencies is a useful first step to identify levers of change, for solar PV and for other technologies as well,” Klemun notes.  

    Static soft technology

    The researchers used their model to study several countries, since soft costs can vary widely around the world. For instance, solar energy soft costs in Germany are about 50 percent less than those in the U.S.

    The fact that hardware technology improvements are often shared globally led to dramatic declines in costs over the past few decades across locations, the analysis showed. Soft technology innovations typically aren’t shared across borders. Moreover, the team found that countries with better soft technology performance 20 years ago still have better performance today, while those with worse performance didn’t see much improvement.

    This country-by-country difference could be driven by regulation and permitting processes, cultural factors, or by market dynamics such as how firms interact with each other, Trancik says.

    “But not all soft technology variables are ones that you would want to change in a cost-reducing direction, like lower wages. So, there are other considerations, beyond just bringing the cost of the technology down, that we need to think about when interpreting these results,” she says.

    Their analysis points to two strategies for reducing soft costs. For one, scientists could focus on developing hardware improvements that make soft costs more dependent on hardware technology variables and less on soft technology variables, such as by creating simpler, more standardized equipment that could reduce on-site installation time.

    Or researchers could directly target soft technology features without changing hardware, perhaps by creating more efficient workflows for system installation or automated permitting platforms.

    “In practice, engineers will often pursue both approaches, but separating the two in a formal model makes it easier to target innovation efforts by leveraging specific relationships between technology characteristics and costs,” Klemun says.

    “Often, when we think about information processing, we are leaving out processes that still happen in a very low-tech way through people communicating with one another. But it is just as important to think about that as a technology as it is to design fancy software,” Trancik notes.

    In the future, she and her collaborators want to apply their quantitative model to study the soft costs related to other technologies, such as electrical vehicle charging and nuclear fission. They are also interested in better understanding the limits of soft technology improvement, and how one could design better soft technology from the outset.

    This research is funded by the U.S. Department of Energy Solar Energy Technologies Office. More

  • in

    How machine learning models can amplify inequities in medical diagnosis and treatment

    Prior to receiving a PhD in computer science from MIT in 2017, Marzyeh Ghassemi had already begun to wonder whether the use of AI techniques might enhance the biases that already existed in health care. She was one of the early researchers to take up this issue, and she’s been exploring it ever since. In a new paper, Ghassemi, now an assistant professor in MIT’s Department of Electrical Science and Engineering (EECS), and three collaborators based at the Computer Science and Artificial Intelligence Laboratory, have probed the roots of the disparities that can arise in machine learning, often causing models that perform well overall to falter when it comes to subgroups for which relatively few data have been collected and utilized in the training process. The paper — written by two MIT PhD students, Yuzhe Yang and Haoran Zhang, EECS computer scientist Dina Katabi (the Thuan and Nicole Pham Professor), and Ghassemi — was presented last month at the 40th International Conference on Machine Learning in Honolulu, Hawaii.

    In their analysis, the researchers focused on “subpopulation shifts” — differences in the way machine learning models perform for one subgroup as compared to another. “We want the models to be fair and work equally well for all groups, but instead we consistently observe the presence of shifts among different groups that can lead to inferior medical diagnosis and treatment,” says Yang, who along with Zhang are the two lead authors on the paper. The main point of their inquiry is to determine the kinds of subpopulation shifts that can occur and to uncover the mechanisms behind them so that, ultimately, more equitable models can be developed.

    The new paper “significantly advances our understanding” of the subpopulation shift phenomenon, claims Stanford University computer scientist Sanmi Koyejo. “This research contributes valuable insights for future advancements in machine learning models’ performance on underrepresented subgroups.”

    Camels and cattle

    The MIT group has identified four principal types of shifts — spurious correlations, attribute imbalance, class imbalance, and attribute generalization — which, according to Yang, “have never been put together into a coherent and unified framework. We’ve come up with a single equation that shows you where biases can come from.”

    Biases can, in fact, stem from what the researchers call the class, or from the attribute, or both. To pick a simple example, suppose the task assigned to the machine learning model is to sort images of objects — animals in this case — into two classes: cows and camels. Attributes are descriptors that don’t specifically relate to the class itself. It might turn out, for instance, that all the images used in the analysis show cows standing on grass and camels on sand — grass and sand serving as the attributes here. Given the data available to it, the machine could reach an erroneous conclusion — namely that cows can only be found on grass, not on sand, with the opposite being true for camels. Such a finding would be incorrect, however, giving rise to a spurious correlation, which, Yang explains, is a “special case” among subpopulation shifts — “one in which you have a bias in both the class and the attribute.”

    In a medical setting, one could rely on machine learning models to determine whether a person has pneumonia or not based on an examination of X-ray images. There would be two classes in this situation, one consisting of people who have the lung ailment, another for those who are infection-free. A relatively straightforward case would involve just two attributes: the people getting X-rayed are either female or male. If, in this particular dataset, there were 100 males diagnosed with pneumonia for every one female diagnosed with pneumonia, that could lead to an attribute imbalance, and the model would likely do a better job of correctly detecting pneumonia for a man than for a woman. Similarly, having 1,000 times more healthy (pneumonia-free) subjects than sick ones would lead to a class imbalance, with the model biased toward healthy cases. Attribute generalization is the last shift highlighted in the new study. If your sample contained 100 male patients with pneumonia and zero female subjects with the same illness, you still would like the model to be able to generalize and make predictions about female subjects even though there are no samples in the training data for females with pneumonia.

    The team then took 20 advanced algorithms, designed to carry out classification tasks, and tested them on a dozen datasets to see how they performed across different population groups. They reached some unexpected conclusions: By improving the “classifier,” which is the last layer of the neural network, they were able to reduce the occurrence of spurious correlations and class imbalance, but the other shifts were unaffected. Improvements to the “encoder,” one of the uppermost layers in the neural network, could reduce the problem of attribute imbalance. “However, no matter what we did to the encoder or classifier, we did not see any improvements in terms of attribute generalization,” Yang says, “and we don’t yet know how to address that.”

    Precisely accurate

    There is also the question of assessing how well your model actually works in terms of evenhandedness among different population groups. The metric normally used, called worst-group accuracy or WGA, is based on the assumption that if you can improve the accuracy — of, say, medical diagnosis — for the group that has the worst model performance, you would have improved the model as a whole. “The WGA is considered the gold standard in subpopulation evaluation,” the authors contend, but they made a surprising discovery: boosting worst-group accuracy results in a decrease in what they call “worst-case precision.” In medical decision-making of all sorts, one needs both accuracy — which speaks to the validity of the findings — and precision, which relates to the reliability of the methodology. “Precision and accuracy are both very important metrics in classification tasks, and that is especially true in medical diagnostics,” Yang explains. “You should never trade precision for accuracy. You always need to balance the two.”

    The MIT scientists are putting their theories into practice. In a study they’re conducting with a medical center, they’re looking at public datasets for tens of thousands of patients and hundreds of thousands of chest X-rays, trying to see whether it’s possible for machine learning models to work in an unbiased manner for all populations. That’s still far from the case, even though more awareness has been drawn to this problem, Yang says. “We are finding many disparities across different ages, gender, ethnicity, and intersectional groups.”

    He and his colleagues agree on the eventual goal, which is to achieve fairness in health care among all populations. But before we can reach that point, they maintain, we still need a better understanding of the sources of unfairness and how they permeate our current system. Reforming the system as a whole will not be easy, they acknowledge. In fact, the title of the paper they introduced at the Honolulu conference, “Change is Hard,” gives some indications as to the challenges that they and like-minded researchers face. More