More stories

  • in

    MIT Faculty Founder Initiative announces three winners of entrepreneurship awards

    Patients with intractable cancers, chronic pain sufferers, and people who depend on battery-powered medical implants may all benefit from the ideas presented at the 2023-24 MIT-Royalty Pharma Prize Competition’s recent awards. This year’s top prizes went to researchers and biotech entrepreneurs Anne Carpenter, Frederike Petzschner, and Betar Gallant ’08, SM ’10, PhD ’13.MIT Faculty Founder Initiative Executive Director Kit Hickey MBA ’13 describes the time and hard work the three awardees and other finalists devoted to the initiative and its mission of cultivating female faculty in biotech to cross the chasm between laboratory research and its clinical application.“They have taken the first brave step of getting off the bench when they already work seven days a week. They have carved out time from their facilities, from their labs, from their lives in order to put themselves out there and leap into entrepreneurship,” Hickey says. “They’ve done it because they each want to see their innovations out in the world improving patients’ lives.”Carpenter, senior director of the Imaging Platform at the Broad Institute of MIT and Harvard, where she is also an institute scientist, won the competition’s $250,000 2023-24 MIT-Royalty Pharma Faculty Founder Prize Competition Grand Prize. Carpenter specializes in using microscopy imaging of cells and computational methods such as machine learning to accelerate the identification of chemical compounds with therapeutic potential to, for instance, shrink tumors. The identified compounds are then tested in biological assays that model the tumor ecosystem to see how the compounds would perform on actual tumors.Carpenter’s startup, SyzOnc, launched in April, a feat Carpenter associates with the assistance provided by the MIT Faculty Founder Initiative. Participants in the program receive mentorship, stipends, and advice from industry experts, as well as help with incorporating, assembling a management team, fundraising, and intellectual property strategy.“The program offered key insights and input at major decision points that gave us the momentum to open our doors,” Carpenter says, adding that participating “offered validation of our scientific ideas and business plan. That kind of credibility is really helpful to raising funding, particularly for those starting their first company.”Carpenter says she and her team will employ “the best biological and computational advancements to develop new therapies to fight tumors such as sarcoma, pancreatic cancer, and glioblastoma, which currently have dismal survival rates.”The MIT Faculty Founder Initiative was begun in 2020 by the School of Engineering and the Martin Trust Center for MIT Entrepreneurship, based on research findings by Sangeeta Bhatia, the Wilson Professor of Health Sciences and Technology, professor of electrical engineering and computer science, and faculty director of the MIT Faculty Founder Initiative; Susan Hockfield, MIT Corporation life member, MIT president emerita, and professor of neuroscience; and Nancy Hopkins, professor emerita of biology. An investigation they conducted showed that only about 9 percent of MIT’s 250 biotech startups were started by women, whereas women made up 22 percent of the faculty, as was presented in a 2021 MIT Faculty Newsletter.That data showed that “technologies from female labs were not getting out in the world, resulting in lost potential,” Hickey says.“The MIT Faculty Founder Initiative plays a pivotal role in MIT’s entrepreneurship ecosystem. It elevates visionary faculty working on solutions in biotech by providing them with critical mentorship and resources, ensuring these solutions can be rapidly scaled to market,” says Anantha Chandrakasan, MIT’s chief innovation and strategy officer, dean of engineering, and Vannevar Bush Professor of Electrical Engineering and Computer Science.The MIT Faculty Founder Initiative Prize Competition was launched in 2021. At this year’s competition, the judges represented academia, health care, biotech, and financial investment. In addition to awarding a grand prize, the competition also distributed two $100,000 prizes, one to a researcher from Brown University, the first university to collaborate with MIT in the entrepreneurship program.This year’s winner of the $100,000 2023-24 MIT-Royalty Pharma Faculty Founder Prize Competition Runner-Up Prize was Frederike Petzschner, assistant professor at the Carney Institute for Brain Science at Brown, for her SOMA startup’s digital pain management system, which helps sufferers to manage and relieve chronic pain.“We leverage cutting-edge technology to provide precision care, focusing specifically on personalized cognitive interventions tailored to each patient’s unique needs,” she says.With her startup on the verge of incorporating, Petzschner says, “without the Faculty Finder Initiative, our startup would still be pursuing commercialization, but undoubtedly at a much earlier and perhaps less structured stage.”“The constant support from the program organizers and our mentors was truly transformative,” she says.Gallant, associate professor of mechanical engineering at MIT and winner of the $100,000 2023-24 MIT-Royalty Pharma Faculty Founder Prize Competition Breakthrough Prize, is leading the startup Halogen. An expert on advanced battery technologies, Gallant and her team have developed high-density battery storage to improve the lifetime and performance of such medical devices as pacemakers.“If you can extend lifetime, you’re talking about longer times between invasive replacement surgeries, which really affects patient quality of life,” Gallant told MIT News in a 2022 interview.Jim Reddoch, executive vice president and chief scientific officer of sponsor Royalty Pharma, emphasized his company’s support for both the competition and the MIT Faculty Finder Initiative program.“Royalty Pharma is thrilled to support the 2023-2024 MIT-Royalty Pharma Prize Competition and accelerate life sciences innovation at leading research institutions such as MIT and Brown,” Reddoch says. “By supporting the amazing female entrepreneurs in this program, we hope to catalyze more ideas from the lab to biotech companies and eventually into the hands of patients.”Bhatia has referred to the MIT Faculty Founder Initiative as a “playbook” on how to direct female faculty’s high-impact technologies that are not being commercialized into the world of health care.“To me, changing the game means that when you have an invention in your lab, you’re connected enough to the ecosystem to know when it should be a company, and to know who to call and how to get your first investors and how to quickly catalyze your team — and you’re off to the races,” Bhatia says. “Every one one of those inventions can be a medicine as quickly as possible. That’s the future I imagine.”Co-founder Hockfield referred to MIT’s role in promoting entrepreneurship in remarks at the award ceremony, alluding to Brown University’s having joined the effort.“MIT has always been a leader in entrepreneurship,” Hockfield says. “Part of leading is sharing with the world. The collaboration with Brown University for this cohort shows that MIT can share our approach with the world, allowing other universities to follow our model of supporting academic entrepreneurship.”Hickey says that when she and Bhatia asked 30 female faculty members three years ago why they were not commercializing their technologies, many said they had no access to the appropriate networks of mentors, investors, role models, and business partners necessary to begin the journey.“We encourage you to become this network that has been missing,” Hickey told the awards event audience, which included an array of leaders in the biotech world. “Get to know our amazing faculty members and continue to support them. Become a part of this movement.” More

  • in

    A data-driven approach to making better choices

    Imagine a world in which some important decision — a judge’s sentencing recommendation, a child’s treatment protocol, which person or business should receive a loan — was made more reliable because a well-designed algorithm helped a key decision-maker arrive at a better choice. A new MIT economics course is investigating these interesting possibilities.Class 14.163 (Algorithms and Behavioral Science) is a new cross-disciplinary course focused on behavioral economics, which studies the cognitive capacities and limitations of human beings. The course was co-taught this past spring by assistant professor of economics Ashesh Rambachan and visiting lecturer Sendhil Mullainathan.Rambachan studies the economic applications of machine learning, focusing on algorithmic tools that drive decision-making in the criminal justice system and consumer lending markets. He also develops methods for determining causation using cross-sectional and dynamic data.Mullainathan will soon join the MIT departments of Electrical Engineering and Computer Science and Economics as a professor. His research uses machine learning to understand complex problems in human behavior, social policy, and medicine. Mullainathan co-founded the Abdul Latif Jameel Poverty Action Lab (J-PAL) in 2003.The new course’s goals are both scientific (to understand people) and policy-driven (to improve society by improving decisions). Rambachan believes that machine-learning algorithms provide new tools for both the scientific and applied goals of behavioral economics.“The course investigates the deployment of computer science, artificial intelligence (AI), economics, and machine learning in service of improved outcomes and reduced instances of bias in decision-making,” Rambachan says.There are opportunities, Rambachan believes, for constantly evolving digital tools like AI, machine learning, and large language models (LLMs) to help reshape everything from discriminatory practices in criminal sentencing to health-care outcomes among underserved populations.Students learn how to use machine learning tools with three main objectives: to understand what they do and how they do it, to formalize behavioral economics insights so they compose well within machine learning tools, and to understand areas and topics where the integration of behavioral economics and algorithmic tools might be most fruitful.Students also produce ideas, develop associated research, and see the bigger picture. They’re led to understand where an insight fits and see where the broader research agenda is leading. Participants can think critically about what supervised LLMs can (and cannot) do, to understand how to integrate those capacities with the models and insights of behavioral economics, and to recognize the most fruitful areas for the application of what investigations uncover.The dangers of subjectivity and biasAccording to Rambachan, behavioral economics acknowledges that biases and mistakes exist throughout our choices, even absent algorithms. “The data used by our algorithms exist outside computer science and machine learning, and instead are often produced by people,” he continues. “Understanding behavioral economics is therefore essential to understanding the effects of algorithms and how to better build them.”Rambachan sought to make the course accessible regardless of attendees’ academic backgrounds. The class included advanced degree students from a variety of disciplines.By offering students a cross-disciplinary, data-driven approach to investigating and discovering ways in which algorithms might improve problem-solving and decision-making, Rambachan hopes to build a foundation on which to redesign existing systems of jurisprudence, health care, consumer lending, and industry, to name a few areas.“Understanding how data are generated can help us understand bias,” Rambachan says. “We can ask questions about producing a better outcome than what currently exists.”Useful tools for re-imagining social operationsEconomics doctoral student Jimmy Lin was skeptical about the claims Rambachan and Mullainathan made when the class began, but changed his mind as the course continued.“Ashesh and Sendhil started with two provocative claims: The future of behavioral science research will not exist without AI, and the future of AI research will not exist without behavioral science,” Lin says. “Over the course of the semester, they deepened my understanding of both fields and walked us through numerous examples of how economics informed AI research and vice versa.”Lin, who’d previously done research in computational biology, praised the instructors’ emphasis on the importance of a “producer mindset,” thinking about the next decade of research rather than the previous decade. “That’s especially important in an area as interdisciplinary and fast-moving as the intersection of AI and economics — there isn’t an old established literature, so you’re forced to ask new questions, invent new methods, and create new bridges,” he says.The speed of change to which Lin alludes is a draw for him, too. “We’re seeing black-box AI methods facilitate breakthroughs in math, biology, physics, and other scientific disciplines,” Lin  says. “AI can change the way we approach intellectual discovery as researchers.”An interdisciplinary future for economics and social systemsStudying traditional economic tools and enhancing their value with AI may yield game-changing shifts in how institutions and organizations teach and empower leaders to make choices.“We’re learning to track shifts, to adjust frameworks and better understand how to deploy tools in service of a common language,” Rambachan says. “We must continually interrogate the intersection of human judgment, algorithms, AI, machine learning, and LLMs.”Lin enthusiastically recommended the course regardless of students’ backgrounds. “Anyone broadly interested in algorithms in society, applications of AI across academic disciplines, or AI as a paradigm for scientific discovery should take this class,” he says. “Every lecture felt like a goldmine of perspectives on research, novel application areas, and inspiration on how to produce new, exciting ideas.”The course, Rambachan says, argues that better-built algorithms can improve decision-making across disciplines. “By building connections between economics, computer science, and machine learning, perhaps we can automate the best of human choices to improve outcomes while minimizing or eliminating the worst,” he says.Lin remains excited about the course’s as-yet unexplored possibilities. “It’s a class that makes you excited about the future of research and your own role in it,” he says. More

  • in

    From steel engineering to ovarian tumor research

    Ashutosh Kumar is a classically trained materials engineer. Having grown up with a passion for making things, he has explored steel design and studied stress fractures in alloys.Throughout Kumar’s education, however, he was also drawn to biology and medicine. When he was accepted into an undergraduate metallurgical engineering and materials science program at Indian Institute of Technology (IIT) Bombay, the native of Jamshedpur was very excited — and “a little dissatisfied, since I couldn’t do biology anymore.”Now a PhD candidate and a MathWorks Fellow in MIT’s Department of Materials Science and Engineering, Kumar can merge his wide-ranging interests. He studies the effect of certain bacteria that have been observed encouraging the spread of ovarian cancer and possibly reducing the effectiveness of chemotherapy and immunotherapy.“Some microbes have an affinity toward infecting ovarian cancer cells, which can lead to changes in the cellular structure and reprogramming cells to survive in stressful conditions,” Kumar says. “This means that cells can migrate to different sites and may have a mechanism to develop chemoresistance. This opens an avenue to develop therapies to see if we can start to undo some of these changes.”Kumar’s research combines microbiology, bioengineering, artificial intelligence, big data, and materials science. Using microbiome sequencing and AI, he aims to define microbiome changes that may correlate with poor patient outcomes. Ultimately, his goal is to engineer bacteriophage viruses to reprogram bacteria to work therapeutically.Kumar started inching toward work in the health sciences just months into earning his bachelor’s degree at IIT Bombay.“I realized engineering is so flexible that its applications extend to any field,” he says, adding that he started working with biomaterials “to respect both my degree program and my interests.”“I loved it so much that I decided to go to graduate school,” he adds.Starting his PhD program at MIT, he says, “was a fantastic opportunity to switch gears and work on more interdisciplinary or ‘MIT-type’ work.”Kumar says he and Angela Belcher, the James Mason Crafts Professor of biological engineering and materials science, began discussing the impact of the microbiome on ovarian cancer when he first arrived at MIT.“I shared my enthusiasm about human health and biology, and we started brainstorming,” he says. “We realized that there’s an unmet need to understand a lot of gynecological cancers. Ovarian cancer is an aggressive cancer, which is usually diagnosed when it’s too late and has already spread.”In 2022, Kumar was awarded a MathWorks Fellowship. The fellowships are awarded to School of Engineering graduate students, preferably those who use MATLAB or Simulink — which were developed by the mathematical computer software company MathWorks — in their research. The philanthropic support fueled Kumar’s full transition into health science research.“The work we are doing now was initially not funded by traditional sources, and the MathWorks Fellowship gave us the flexibility to pursue this field,” Kumar says. “It provided me with opportunities to learn new skills and ask questions about this topic. MathWorks gave me a chance to explore my interests and helped me navigate from being a steel engineer to a cancer scientist.”Kumar’s work on the relationship between bacteria and ovarian cancer started with studying which bacteria are incorporated into tumors in mouse models.“We started looking closely at changes in cell structure and how those changes impact cancer progression,” he says, adding that MATLAB image processing helps him and his collaborators track tumor metastasis.The research team also uses RNA sequencing and MATLAB algorithms to construct a taxonomy of the bacteria.“Once we have identified the microbiome composition,” Kumar says, “we want to see how the microbiome changes as cancer progresses and identify changes in, let’s say, patients who develop chemoresistance.”He says recent findings that ovarian cancer may originate in the fallopian tubes are promising because detecting cancer-related biomarkers or lesions before cancer spreads to the ovaries could lead to better prognoses.As he pursues his research, Kumar says he is extremely thankful to Belcher “for believing in me to work on this project.“She trusted me and my passion for making an impact on human health — even though I come from a materials engineering background — and supported me throughout. It was her passion to take on new challenges that made it possible for me to work on this idea. She has been an amazing mentor and motivated me to continue moving forward.”For her part, Belcher is equally enthralled.“It has been amazing to work with Ashutosh on this ovarian cancer microbiome project,” she says. “He has been so passionate and dedicated to looking for less-conventional approaches to solve this debilitating disease. His innovations around looking for very early changes in the microenvironment of this disease could be critical in interception and prevention of ovarian cancer. We started this project with very little preliminary data, so his MathWorks fellowship was critical in the initiation of the project.”Kumar, who has been very active in student government and community-building activities, believes it is very important for students to feel included and at home at their institutions so they can develop in ways outside of academics. He says that his own involvement helps him take time off from work.“Science can never stop, and there will always be something to do,” he says, explaining that he deliberately schedules time off and that social engagement helps him to experience downtime. “Engaging with community members through events on campus or at the dorm helps set a mental boundary with work.”Regarding his unusual route through materials science to cancer research, Kumar regards it as something that occurred organically.“I have observed that life is very dynamic,” he says. “What we think we might do versus what we end up doing is never consistent. Five years back, I had no idea I would be at MIT working with such excellent scientific mentors around me.” More

  • in

    Growing our donated organ supply

    For those in need of one, an organ transplant is a matter of life and death. 

    Every year, the medical procedure gives thousands of people with advanced or end-stage diseases extended life. This “second chance” is heavily dependent on the availability, compatibility, and proximity of a precious resource that can’t be simply bought, grown, or manufactured — at least not yet.

    Instead, organs must be given — cut from one body and implanted into another. And because living organ donation is only viable in certain cases, many organs are only available for donation after the donor’s death.

    Unsurprisingly, the logistical and ethical complexity of distributing a limited number of transplant organs to a growing wait list of patients has received much attention. There’s an important part of the process that has received less focus, however, and which may hold significant untapped potential: organ procurement itself.

    “If you have a donated organ, who should you give it to? This question has been extensively studied in operations research, economics, and even applied computer science,” says Hammaad Adam, a graduate student in the Social and Engineering Systems (SES) doctoral program at the MIT Institute for Data, Systems, and Society (IDSS). “But there’s been a lot less research on where that organ comes from in the first place.”

    In the United States, nonprofits called organ procurement organizations, or OPOs, are responsible for finding and evaluating potential donors, interacting with grieving families and hospital administrations, and recovering and delivering organs — all while following the federal laws that serve as both their mandate and guardrails. Recent studies estimate that obstacles and inefficiencies lead to thousands of organs going uncollected every year, even as the demand for transplants continues to grow.

    “There’s been little transparent data on organ procurement,” argues Adam. Working with MIT computer science professors Marzyeh Ghassemi and Ashia Wilson, and in collaboration with stakeholders in organ procurement, Adam led a project to create a dataset called ORCHID: Organ Retrieval and Collection of Health Information for Donation. ORCHID contains a decade of clinical, financial, and administrative data from six OPOs.

    “Our goal is for the ORCHID database to have an impact in how organ procurement is understood, internally and externally,” says Ghassemi.

    Efficiency and equity 

    It was looking to make an impact that drew Adam to SES and MIT. With a background in applied math and experience in strategy consulting, solving problems with technical components sits right in his wheelhouse.

    “I really missed challenging technical problems from a statistics and machine learning standpoint,” he says of his time in consulting. “So I went back and got a master’s in data science, and over the course of my master’s got involved in a bunch of academic research projects in a few different fields, including biology, management science, and public policy. What I enjoyed most were some of the more social science-focused projects that had immediate impact.”

    As a grad student in SES, Adam’s research focuses on using statistical tools to uncover health-care inequities, and developing machine learning approaches to address them. “Part of my dissertation research focuses on building tools that can improve equity in clinical trials and other randomized experiments,” he explains.

    One recent example of Adam’s work: developing a novel method to stop clinical trials early if the treatment has an unintended harmful effect for a minority group of participants. “I’ve also been thinking about ways to increase minority representation in clinical trials through improved patient recruitment,” he adds.

    Racial inequities in health care extend into organ transplantation, where a majority of wait-listed patients are not white — far in excess of their demographic groups’ proportion to the overall population. There are fewer organ donations from many of these communities, due to various obstacles in need of better understanding if they are to be overcome. 

    “My work in organ transplantation began on the allocation side,” explains Adam. “In work under review, we examined the role of race in the acceptance of heart, liver, and lung transplant offers by physicians on behalf of their patients. We found that Black race of the patient was associated with significantly lower odds of organ offer acceptance — in other words, transplant doctors seemed more likely to turn down organs offered to Black patients. This trend may have multiple explanations, but it is nevertheless concerning.”

    Adam’s research has also found that donor-candidate race match was associated with significantly higher odds of offer acceptance, an association that Adam says “highlights the importance of organ donation from racial minority communities, and has motivated our work on equitable organ procurement.”

    Working with Ghassemi through the IDSS Initiative on Combatting Systemic Racism, Adam was introduced to OPO stakeholders looking to collaborate. “It’s this opportunity to impact not only health-care efficiency, but also health-care equity, that really got me interested in this research,” says Adam.

    Play video

    MIT Initiative on Combatting Systemic Racism – HealthcareVideo: IDSS

    Making an impact

    Creating a database like ORCHID means solving problems in multiple domains, from the technical to the political. Some efforts never overcome the first step: getting data in the first place. Thankfully, several OPOs were already seeking collaborations and looking to improve their performance.

    “We have been lucky to have a strong partnership with the OPOs, and we hope to work together to find important insights to improve efficiency and equity,” says Ghassemi.

    The value of a database like ORCHID is in its potential for generating new insights, especially through quantitative analysis with statistics and computing tools like machine learning. The potential value in ORCHID was recognized with an MIT Prize for Open Data, an MIT Libraries award highlighting the importance and impact of research data that is openly shared.

    “It’s nice that the work got some recognition,” says Adam of the prize. “And it was cool to see some of the other great open data work that’s happening at MIT. I think there’s real impact in releasing publicly available data in an important and understudied domain.”

    All the same, Adam knows that building the database is only the first step.

    “I’m very interested in understanding the bottlenecks in the organ procurement process,” he explains. “As part of my thesis research, I’m exploring this by modeling OPO decision-making using causal inference and structural econometrics.”

    Using insights from this research, Adam also aims to evaluate policy changes that can improve both equity and efficiency in organ procurement. “And we’re hoping to recruit more OPOs, and increase the amount of data we’re releasing,” he says. “The dream state is every OPO joins our collaboration and provides updated data every year.”

    Adam is excited to see how other researchers might use the data to address inefficiencies in organ procurement. “Every organ donor saves between three and four lives,” he says. “So every research project that comes out of this dataset could make a real impact.” More

  • in

    New hope for early pancreatic cancer intervention via AI-based risk prediction

    The first documented case of pancreatic cancer dates back to the 18th century. Since then, researchers have undertaken a protracted and challenging odyssey to understand the elusive and deadly disease. To date, there is no better cancer treatment than early intervention. Unfortunately, the pancreas, nestled deep within the abdomen, is particularly elusive for early detection. 

    MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) scientists, alongside Limor Appelbaum, a staff scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), were eager to better identify potential high-risk patients. They set out to develop two machine-learning models for early detection of pancreatic ductal adenocarcinoma (PDAC), the most common form of the cancer. To access a broad and diverse database, the team synced up with a federated network company, using electronic health record data from various institutions across the United States. This vast pool of data helped ensure the models’ reliability and generalizability, making them applicable across a wide range of populations, geographical locations, and demographic groups.

    The two models — the “PRISM” neural network, and the logistic regression model (a statistical technique for probability), outperformed current methods. The team’s comparison showed that while standard screening criteria identify about 10 percent of PDAC cases using a five-times higher relative risk threshold, Prism can detect 35 percent of PDAC cases at this same threshold. 

    Using AI to detect cancer risk is not a new phenomena — algorithms analyze mammograms, CT scans for lung cancer, and assist in the analysis of Pap smear tests and HPV testing, to name a few applications. “The PRISM models stand out for their development and validation on an extensive database of over 5 million patients, surpassing the scale of most prior research in the field,” says Kai Jia, an MIT PhD student in electrical engineering and computer science (EECS), MIT CSAIL affiliate, and first author on an open-access paper in eBioMedicine outlining the new work. “The model uses routine clinical and lab data to make its predictions, and the diversity of the U.S. population is a significant advancement over other PDAC models, which are usually confined to specific geographic regions, like a few health-care centers in the U.S. Additionally, using a unique regularization technique in the training process enhanced the models’ generalizability and interpretability.” 

    “This report outlines a powerful approach to use big data and artificial intelligence algorithms to refine our approach to identifying risk profiles for cancer,” says David Avigan, a Harvard Medical School professor and the cancer center director and chief of hematology and hematologic malignancies at BIDMC, who was not involved in the study. “This approach may lead to novel strategies to identify patients with high risk for malignancy that may benefit from focused screening with the potential for early intervention.” 

    Prismatic perspectives

    The journey toward the development of PRISM began over six years ago, fueled by firsthand experiences with the limitations of current diagnostic practices. “Approximately 80-85 percent of pancreatic cancer patients are diagnosed at advanced stages, where cure is no longer an option,” says senior author Appelbaum, who is also a Harvard Medical School instructor as well as radiation oncologist. “This clinical frustration sparked the idea to delve into the wealth of data available in electronic health records (EHRs).”The CSAIL group’s close collaboration with Appelbaum made it possible to understand the combined medical and machine learning aspects of the problem better, eventually leading to a much more accurate and transparent model. “The hypothesis was that these records contained hidden clues — subtle signs and symptoms that could act as early warning signals of pancreatic cancer,” she adds. “This guided our use of federated EHR networks in developing these models, for a scalable approach for deploying risk prediction tools in health care.”Both PrismNN and PrismLR models analyze EHR data, including patient demographics, diagnoses, medications, and lab results, to assess PDAC risk. PrismNN uses artificial neural networks to detect intricate patterns in data features like age, medical history, and lab results, yielding a risk score for PDAC likelihood. PrismLR uses logistic regression for a simpler analysis, generating a probability score of PDAC based on these features. Together, the models offer a thorough evaluation of different approaches in predicting PDAC risk from the same EHR data.

    One paramount point for gaining the trust of physicians, the team notes, is better understanding how the models work, known in the field as interpretability. The scientists pointed out that while logistic regression models are inherently easier to interpret, recent advancements have made deep neural networks somewhat more transparent. This helped the team to refine the thousands of potentially predictive features derived from EHR of a single patient to approximately 85 critical indicators. These indicators, which include patient age, diabetes diagnosis, and an increased frequency of visits to physicians, are automatically discovered by the model but match physicians’ understanding of risk factors associated with pancreatic cancer. 

    The path forward

    Despite the promise of the PRISM models, as with all research, some parts are still a work in progress. U.S. data alone are the current diet for the models, necessitating testing and adaptation for global use. The path forward, the team notes, includes expanding the model’s applicability to international datasets and integrating additional biomarkers for more refined risk assessment.

    “A subsequent aim for us is to facilitate the models’ implementation in routine health care settings. The vision is to have these models function seamlessly in the background of health care systems, automatically analyzing patient data and alerting physicians to high-risk cases without adding to their workload,” says Jia. “A machine-learning model integrated with the EHR system could empower physicians with early alerts for high-risk patients, potentially enabling interventions well before symptoms manifest. We are eager to deploy our techniques in the real world to help all individuals enjoy longer, healthier lives.” 

    Jia wrote the paper alongside Applebaum and MIT EECS Professor and CSAIL Principal Investigator Martin Rinard, who are both senior authors of the paper. Researchers on the paper were supported during their time at MIT CSAIL, in part, by the Defense Advanced Research Projects Agency, Boeing, the National Science Foundation, and Aarno Labs. TriNetX provided resources for the project, and the Prevent Cancer Foundation also supported the team. More

  • in

    2023-24 Takeda Fellows: Advancing research at the intersection of AI and health

    The School of Engineering has selected 13 new Takeda Fellows for the 2023-24 academic year. With support from Takeda, the graduate students will conduct pathbreaking research ranging from remote health monitoring for virtual clinical trials to ingestible devices for at-home, long-term diagnostics.

    Now in its fourth year, the MIT-Takeda Program, a collaboration between MIT’s School of Engineering and Takeda, fuels the development and application of artificial intelligence capabilities to benefit human health and drug development. Part of the Abdul Latif Jameel Clinic for Machine Learning in Health, the program coalesces disparate disciplines, merges theory and practical implementation, combines algorithm and hardware innovations, and creates multidimensional collaborations between academia and industry.

    The 2023-24 Takeda Fellows are:

    Adam Gierlach

    Adam Gierlach is a PhD candidate in the Department of Electrical Engineering and Computer Science. Gierlach’s work combines innovative biotechnology with machine learning to create ingestible devices for advanced diagnostics and delivery of therapeutics. In his previous work, Gierlach developed a non-invasive, ingestible device for long-term gastric recordings in free-moving patients. With the support of a Takeda Fellowship, he will build on this pathbreaking work by developing smart, energy-efficient, ingestible devices powered by application-specific integrated circuits for at-home, long-term diagnostics. These revolutionary devices — capable of identifying, characterizing, and even correcting gastrointestinal diseases — represent the leading edge of biotechnology. Gierlach’s innovative contributions will help to advance fundamental research on the enteric nervous system and help develop a better understanding of gut-brain axis dysfunctions in Parkinson’s disease, autism spectrum disorder, and other prevalent disorders and conditions.

    Vivek Gopalakrishnan

    Vivek Gopalakrishnan is a PhD candidate in the Harvard-MIT Program in Health Sciences and Technology. Gopalakrishnan’s goal is to develop biomedical machine-learning methods to improve the study and treatment of human disease. Specifically, he employs computational modeling to advance new approaches for minimally invasive, image-guided neurosurgery, offering a safe alternative to open brain and spinal procedures. With the support of a Takeda Fellowship, Gopalakrishnan will develop real-time computer vision algorithms that deliver high-quality, 3D intraoperative image guidance by extracting and fusing information from multimodal neuroimaging data. These algorithms could allow surgeons to reconstruct 3D neurovasculature from X-ray angiography, thereby enhancing the precision of device deployment and enabling more accurate localization of healthy versus pathologic anatomy.

    Hao He

    Hao He is a PhD candidate in the Department of Electrical Engineering and Computer Science. His research interests lie at the intersection of generative AI, machine learning, and their applications in medicine and human health, with a particular emphasis on passive, continuous, remote health monitoring to support virtual clinical trials and health-care management. More specifically, He aims to develop trustworthy AI models that promote equitable access and deliver fair performance independent of race, gender, and age. In his past work, He has developed monitoring systems applied in clinical studies of Parkinson’s disease, Alzheimer’s disease, and epilepsy. Supported by a Takeda Fellowship, He will develop a novel technology for the passive monitoring of sleep stages (using radio signaling) that seeks to address existing gaps in performance across different demographic groups. His project will tackle the problem of imbalance in available datasets and account for intrinsic differences across subpopulations, using generative AI and multi-modality/multi-domain learning, with the goal of learning robust features that are invariant to different subpopulations. He’s work holds great promise for delivering advanced, equitable health-care services to all people and could significantly impact health care and AI.

    Chengyi Long

    Chengyi Long is a PhD candidate in the Department of Civil and Environmental Engineering. Long’s interdisciplinary research integrates the methodology of physics, mathematics, and computer science to investigate questions in ecology. Specifically, Long is developing a series of potentially groundbreaking techniques to explain and predict the temporal dynamics of ecological systems, including human microbiota, which are essential subjects in health and medical research. His current work, supported by a Takeda Fellowship, is focused on developing a conceptual, mathematical, and practical framework to understand the interplay between external perturbations and internal community dynamics in microbial systems, which may serve as a key step toward finding bio solutions to health management. A broader perspective of his research is to develop AI-assisted platforms to anticipate the changing behavior of microbial systems, which may help to differentiate between healthy and unhealthy hosts and design probiotics for the prevention and mitigation of pathogen infections. By creating novel methods to address these issues, Long’s research has the potential to offer powerful contributions to medicine and global health.

    Omar Mohd

    Omar Mohd is a PhD candidate in the Department of Electrical Engineering and Computer Science. Mohd’s research is focused on developing new technologies for the spatial profiling of microRNAs, with potentially important applications in cancer research. Through innovative combinations of micro-technologies and AI-enabled image analysis to measure the spatial variations of microRNAs within tissue samples, Mohd hopes to gain new insights into drug resistance in cancer. This work, supported by a Takeda Fellowship, falls within the emerging field of spatial transcriptomics, which seeks to understand cancer and other diseases by examining the relative locations of cells and their contents within tissues. The ultimate goal of Mohd’s current project is to find multidimensional patterns in tissues that may have prognostic value for cancer patients. One valuable component of his work is an open-source AI program developed with collaborators at Beth Israel Deaconess Medical Center and Harvard Medical School to auto-detect cancer epithelial cells from other cell types in a tissue sample and to correlate their abundance with the spatial variations of microRNAs. Through his research, Mohd is making innovative contributions at the interface of microsystem technology, AI-based image analysis, and cancer treatment, which could significantly impact medicine and human health.

    Sanghyun Park

    Sanghyun Park is a PhD candidate in the Department of Mechanical Engineering. Park specializes in the integration of AI and biomedical engineering to address complex challenges in human health. Drawing on his expertise in polymer physics, drug delivery, and rheology, his research focuses on the pioneering field of in-situ forming implants (ISFIs) for drug delivery. Supported by a Takeda Fellowship, Park is currently developing an injectable formulation designed for long-term drug delivery. The primary goal of his research is to unravel the compaction mechanism of drug particles in ISFI formulations through comprehensive modeling and in-vitro characterization studies utilizing advanced AI tools. He aims to gain a thorough understanding of this unique compaction mechanism and apply it to drug microcrystals to achieve properties optimal for long-term drug delivery. Beyond these fundamental studies, Park’s research also focuses on translating this knowledge into practical applications in a clinical setting through animal studies specifically aimed at extending drug release duration and improving mechanical properties. The innovative use of AI in developing advanced drug delivery systems, coupled with Park’s valuable insights into the compaction mechanism, could contribute to improving long-term drug delivery. This work has the potential to pave the way for effective management of chronic diseases, benefiting patients, clinicians, and the pharmaceutical industry.

    Huaiyao Peng

    Huaiyao Peng is a PhD candidate in the Department of Biological Engineering. Peng’s research interests are focused on engineered tissue, microfabrication platforms, cancer metastasis, and the tumor microenvironment. Specifically, she is advancing novel AI techniques for the development of pre-cancer organoid models of high-grade serous ovarian cancer (HGSOC), an especially lethal and difficult-to-treat cancer, with the goal of gaining new insights into progression and effective treatments. Peng’s project, supported by a Takeda Fellowship, will be one of the first to use cells from serous tubal intraepithelial carcinoma lesions found in the fallopian tubes of many HGSOC patients. By examining the cellular and molecular changes that occur in response to treatment with small molecule inhibitors, she hopes to identify potential biomarkers and promising therapeutic targets for HGSOC, including personalized treatment options for HGSOC patients, ultimately improving their clinical outcomes. Peng’s work has the potential to bring about important advances in cancer treatment and spur innovative new applications of AI in health care. 

    Priyanka Raghavan

    Priyanka Raghavan is a PhD candidate in the Department of Chemical Engineering. Raghavan’s research interests lie at the frontier of predictive chemistry, integrating computational and experimental approaches to build powerful new predictive tools for societally important applications, including drug discovery. Specifically, Raghavan is developing novel models to predict small-molecule substrate reactivity and compatibility in regimes where little data is available (the most realistic regimes). A Takeda Fellowship will enable Raghavan to push the boundaries of her research, making innovative use of low-data and multi-task machine learning approaches, synthetic chemistry, and robotic laboratory automation, with the goal of creating an autonomous, closed-loop system for the discovery of high-yielding organic small molecules in the context of underexplored reactions. Raghavan’s work aims to identify new, versatile reactions to broaden a chemist’s synthetic toolbox with novel scaffolds and substrates that could form the basis of essential drugs. Her work has the potential for far-reaching impacts in early-stage, small-molecule discovery and could help make the lengthy drug-discovery process significantly faster and cheaper.

    Zhiye Song

    Zhiye “Zoey” Song is a PhD candidate in the Department of Electrical Engineering and Computer Science. Song’s research integrates cutting-edge approaches in machine learning (ML) and hardware optimization to create next-generation, wearable medical devices. Specifically, Song is developing novel approaches for the energy-efficient implementation of ML computation in low-power medical devices, including a wearable ultrasound “patch” that captures and processes images for real-time decision-making capabilities. Her recent work, conducted in collaboration with clinicians, has centered on bladder volume monitoring; other potential applications include blood pressure monitoring, muscle diagnosis, and neuromodulation. With the support of a Takeda Fellowship, Song will build on that promising work and pursue key improvements to existing wearable device technologies, including developing low-compute and low-memory ML algorithms and low-power chips to enable ML on smart wearable devices. The technologies emerging from Song’s research could offer exciting new capabilities in health care, enabling powerful and cost-effective point-of-care diagnostics and expanding individual access to autonomous and continuous medical monitoring.

    Peiqi Wang

    Peiqi Wang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Wang’s research aims to develop machine learning methods for learning and interpretation from medical images and associated clinical data to support clinical decision-making. He is developing a multimodal representation learning approach that aligns knowledge captured in large amounts of medical image and text data to transfer this knowledge to new tasks and applications. Supported by a Takeda Fellowship, Wang will advance this promising line of work to build robust tools that interpret images, learn from sparse human feedback, and reason like doctors, with potentially major benefits to important stakeholders in health care.

    Oscar Wu

    Haoyang “Oscar” Wu is a PhD candidate in the Department of Chemical Engineering. Wu’s research integrates quantum chemistry and deep learning methods to accelerate the process of small-molecule screening in the development of new drugs. By identifying and automating reliable methods for finding transition state geometries and calculating barrier heights for new reactions, Wu’s work could make it possible to conduct the high-throughput ab initio calculations of reaction rates needed to screen the reactivity of large numbers of active pharmaceutical ingredients (APIs). A Takeda Fellowship will support his current project to: (1) develop open-source software for high-throughput quantum chemistry calculations, focusing on the reactivity of drug-like molecules, and (2) develop deep learning models that can quantitatively predict the oxidative stability of APIs. The tools and insights resulting from Wu’s research could help to transform and accelerate the drug-discovery process, offering significant benefits to the pharmaceutical and medical fields and to patients.

    Soojung Yang

    Soojung Yang is a PhD candidate in the Department of Materials Science and Engineering. Yang’s research applies cutting-edge methods in geometric deep learning and generative modeling, along with atomistic simulations, to better understand and model protein dynamics. Specifically, Yang is developing novel tools in generative AI to explore protein conformational landscapes that offer greater speed and detail than physics-based simulations at a substantially lower cost. With the support of a Takeda Fellowship, she will build upon her successful work on the reverse transformation of coarse-grained proteins to the all-atom resolution, aiming to build machine-learning models that bridge multiple size scales of protein conformation diversity (all-atom, residue-level, and domain-level). Yang’s research holds the potential to provide a powerful and widely applicable new tool for researchers who seek to understand the complex protein functions at work in human diseases and to design drugs to treat and cure those diseases.

    Yuzhe Yang

    Yuzhe Yang is a PhD candidate in the Department of Electrical Engineering and Computer Science. Yang’s research interests lie at the intersection of machine learning and health care. In his past and current work, Yang has developed and applied innovative machine-learning models that address key challenges in disease diagnosis and tracking. His many notable achievements include the creation of one of the first machine learning-based solutions using nocturnal breathing signals to detect Parkinson’s disease (PD), estimate disease severity, and track PD progression. With the support of a Takeda Fellowship, Yang will expand this promising work to develop an AI-based diagnosis model for Alzheimer’s disease (AD) using sleep-breathing data that is significantly more reliable, flexible, and economical than current diagnostic tools. This passive, in-home, contactless monitoring system — resembling a simple home Wi-Fi router — will also enable remote disease assessment and continuous progression tracking. Yang’s groundbreaking work has the potential to advance the diagnosis and treatment of prevalent diseases like PD and AD, and it offers exciting possibilities for addressing many health challenges with reliable, affordable machine-learning tools.  More

  • in

    Making genetic prediction models more inclusive

    While any two human genomes are about 99.9 percent identical, genetic variation in the remaining 0.1 percent plays an important role in shaping human diversity, including a person’s risk for developing certain diseases.

    Measuring the cumulative effect of these small genetic differences can provide an estimate of an individual’s genetic risk for a particular disease or their likelihood of having a particular trait. However, the majority of models used to generate these “polygenic scores” are based on studies done in people of European descent, and do not accurately gauge the risk for people of non-European ancestry or people whose genomes contain a mixture of chromosome regions inherited from previously isolated populations, also known as admixed ancestry.

    In an effort to make these genetic scores more inclusive, MIT researchers have created a new model that takes into account genetic information from people from a wider diversity of genetic ancestries across the world. Using this model, they showed that they could increase the accuracy of genetics-based predictions for a variety of traits, especially for people from populations that have been traditionally underrepresented in genetic studies.

    “For people of African ancestry, our model proved to be about 60 percent more accurate on average,” says Manolis Kellis, a professor of computer science in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and a member of the Broad Institute of MIT and Harvard. “For people of admixed genetic backgrounds more broadly, who have been excluded from most previous models, the accuracy of our model increased by an average of about 18 percent.”

    The researchers hope their more inclusive modeling approach could help improve health outcomes for a wider range of people and promote health equity by spreading the benefits of genomic sequencing more widely across the globe.

    “What we have done is created a method that allows you to be much more accurate for admixed and ancestry-diverse individuals, and ensure the results and the benefits of human genetics research are equally shared by everyone,” says MIT postdoc Yosuke Tanigawa, the lead and co-corresponding author of the paper, which appears today in open-access form in the American Journal of Human Genetics. The researchers have made all of their data publicly available for the broader scientific community to use.

    More inclusive models

    The work builds on the Human Genome Project, which mapped all of the genes found in the human genome, and on subsequent large-scale, cohort-based studies of how genetic variants in the human genome are linked to disease risk and other differences between individuals.

    These studies showed that the effect of any individual genetic variant on its own is typically very small. Together, these small effects add up and influence the risk of developing heart disease or diabetes, having a stroke, or being diagnosed with psychiatric disorders such as schizophrenia.

    “We have hundreds of thousands of genetic variants that are associated with complex traits, each of which is individually playing a weak effect, but together they are beginning to be predictive for disease predispositions,” Kellis says.

    However, most of these genome-wide association studies included few people of non-European descent, so polygenic risk models based on them translate poorly to non-European populations. People from different geographic areas can have different patterns of genetic variation, shaped by stochastic drift, population history, and environmental factors — for example, in people of African descent, genetic variants that protect against malaria are more common than in other populations. Those variants also affect other traits involving the immune system, such as counts of neutrophils, a type of immune cell. That variation would not be well-captured in a model based on genetic analysis of people of European ancestry alone.

    “If you are an individual of African descent, of Latin American descent, of Asian descent, then you are currently being left out by the system,” Kellis says. “This inequity in the utilization of genetic information for predicting risk of patients can cause unnecessary burden, unnecessary deaths, and unnecessary lack of prevention, and that’s where our work comes in.”

    Some researchers have begun trying to address these disparities by creating distinct models for people of European descent, of African descent, or of Asian descent. These emerging approaches assign individuals to distinct genetic ancestry groups, aggregate the data to create an association summary, and make genetic prediction models. However, these approaches still don’t represent people of admixed genetic backgrounds well.

    “Our approach builds on the previous work without requiring researchers to assign individuals or local genomic segments of individuals to predefined distinct genetic ancestry groups,” Tanigawa says. “Instead, we develop a single model for everybody by directly working on individuals across the continuum of their genetic ancestries.”

    In creating their new model, the MIT team used computational and statistical techniques that enabled them to study each individual’s unique genetic profile instead of grouping individuals by population. This methodological advancement allowed the researchers to include people of admixed ancestry, who made up nearly 10 percent of the UK Biobank dataset used for this study and currently account for about one in seven newborns in the United States.

    “Because we work at the individual level, there is no need for computing summary-level data for different populations,” Kellis says. “Thus, we did not need to exclude individuals of admixed ancestry, increasing our power by including more individuals and representing contributions from all populations in our combined model.”

    Better predictions

    To create their new model, the researchers used genetic data from more than 280,000 people, which was collected by UK Biobank, a large-scale biomedical database and research resource containing de-identified genetic, lifestyle, and health information from half a million U.K. participants. Using another set of about 81,000 held-out individuals from the UK Biobank, the researchers evaluated their model across 60 traits, which included traits related to body size and shape, such as height and body mass index, as well as blood traits such as white blood cell count and red blood cell count, which also have a genetic basis.

    The researchers found that, compared to models trained only on European-ancestry individuals, their model’s predictions are more accurate for all genetic ancestry groups. The most notable gain was for people of African ancestry, who showed 61 percent average improvements, even though they only made up about 1.5 percent of samples in UK Biobank. The researchers also saw improvements of 11 percent for people of South Asian descent and 5 percent for white British people. Predictions for people of admixed ancestry improved by about 18 percent.

    “When you bring all the individuals together in the training set, everybody contributes to the training of the polygenic score modeling on equal footing,” Tanigawa says. “Combined with increasingly more inclusive data collection efforts, our method can help leverage these efforts to improve predictive accuracy for all.”

    The MIT team hopes its approach can eventually be incorporated into tests of an individual’s risk of a variety of diseases. Such tests could be combined with conventional risk factors and used to help doctors diagnose disease or to help people manage their risk for certain diseases before they develop.

    “Our work highlights the power of diversity, equity, and inclusion efforts in the context of genomics research,” Tanigawa says.

    The researchers now hope to add even more data to their model, including data from the United States, and to apply it to additional traits that they didn’t analyze in this study.

    “This is just the start,” Kellis says. “We can’t wait to see more people join our effort to propel inclusive human genetics research.”

    The research was funded by the National Institutes of Health. More

  • in

    How an archeological approach can help leverage biased data in AI to improve medicine

    The classic computer science adage “garbage in, garbage out” lacks nuance when it comes to understanding biased medical data, argue computer science and bioethics professors from MIT, Johns Hopkins University, and the Alan Turing Institute in a new opinion piece published in a recent edition of the New England Journal of Medicine (NEJM). The rising popularity of artificial intelligence has brought increased scrutiny to the matter of biased AI models resulting in algorithmic discrimination, which the White House Office of Science and Technology identified as a key issue in their recent Blueprint for an AI Bill of Rights. 

    When encountering biased data, particularly for AI models used in medical settings, the typical response is to either collect more data from underrepresented groups or generate synthetic data making up for missing parts to ensure that the model performs equally well across an array of patient populations. But the authors argue that this technical approach should be augmented with a sociotechnical perspective that takes both historical and current social factors into account. By doing so, researchers can be more effective in addressing bias in public health. 

    “The three of us had been discussing the ways in which we often treat issues with data from a machine learning perspective as irritations that need to be managed with a technical solution,” recalls co-author Marzyeh Ghassemi, an assistant professor in electrical engineering and computer science and an affiliate of the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Institute of Medical Engineering and Science (IMES). “We had used analogies of data as an artifact that gives a partial view of past practices, or a cracked mirror holding up a reflection. In both cases the information is perhaps not entirely accurate or favorable: Maybe we think that we behave in certain ways as a society — but when you actually look at the data, it tells a different story. We might not like what that story is, but once you unearth an understanding of the past you can move forward and take steps to address poor practices.” 

    Data as artifact 

    In the paper, titled “Considering Biased Data as Informative Artifacts in AI-Assisted Health Care,” Ghassemi, Kadija Ferryman, and Maxine Mackintosh make the case for viewing biased clinical data as “artifacts” in the same way anthropologists or archeologists would view physical objects: pieces of civilization-revealing practices, belief systems, and cultural values — in the case of the paper, specifically those that have led to existing inequities in the health care system. 

    For example, a 2019 study showed that an algorithm widely considered to be an industry standard used health-care expenditures as an indicator of need, leading to the erroneous conclusion that sicker Black patients require the same level of care as healthier white patients. What researchers found was algorithmic discrimination failing to account for unequal access to care.  

    In this instance, rather than viewing biased datasets or lack of data as problems that only require disposal or fixing, Ghassemi and her colleagues recommend the “artifacts” approach as a way to raise awareness around social and historical elements influencing how data are collected and alternative approaches to clinical AI development. 

    “If the goal of your model is deployment in a clinical setting, you should engage a bioethicist or a clinician with appropriate training reasonably early on in problem formulation,” says Ghassemi. “As computer scientists, we often don’t have a complete picture of the different social and historical factors that have gone into creating data that we’ll be using. We need expertise in discerning when models generalized from existing data may not work well for specific subgroups.” 

    When more data can actually harm performance 

    The authors acknowledge that one of the more challenging aspects of implementing an artifact-based approach is being able to assess whether data have been racially corrected: i.e., using white, male bodies as the conventional standard that other bodies are measured against. The opinion piece cites an example from the Chronic Kidney Disease Collaboration in 2021, which developed a new equation to measure kidney function because the old equation had previously been “corrected” under the blanket assumption that Black people have higher muscle mass. Ghassemi says that researchers should be prepared to investigate race-based correction as part of the research process. 

    In another recent paper accepted to this year’s International Conference on Machine Learning co-authored by Ghassemi’s PhD student Vinith Suriyakumar and University of California at San Diego Assistant Professor Berk Ustun, the researchers found that assuming the inclusion of personalized attributes like self-reported race improve the performance of ML models can actually lead to worse risk scores, models, and metrics for minority and minoritized populations.  

    “There’s no single right solution for whether or not to include self-reported race in a clinical risk score. Self-reported race is a social construct that is both a proxy for other information, and deeply proxied itself in other medical data. The solution needs to fit the evidence,” explains Ghassemi. 

    How to move forward 

    This is not to say that biased datasets should be enshrined, or biased algorithms don’t require fixing — quality training data is still key to developing safe, high-performance clinical AI models, and the NEJM piece highlights the role of the National Institutes of Health (NIH) in driving ethical practices.  

    “Generating high-quality, ethically sourced datasets is crucial for enabling the use of next-generation AI technologies that transform how we do research,” NIH acting director Lawrence Tabak stated in a press release when the NIH announced its $130 million Bridge2AI Program last year. Ghassemi agrees, pointing out that the NIH has “prioritized data collection in ethical ways that cover information we have not previously emphasized the value of in human health — such as environmental factors and social determinants. I’m very excited about their prioritization of, and strong investments towards, achieving meaningful health outcomes.” 

    Elaine Nsoesie, an associate professor at the Boston University of Public Health, believes there are many potential benefits to treating biased datasets as artifacts rather than garbage, starting with the focus on context. “Biases present in a dataset collected for lung cancer patients in a hospital in Uganda might be different from a dataset collected in the U.S. for the same patient population,” she explains. “In considering local context, we can train algorithms to better serve specific populations.” Nsoesie says that understanding the historical and contemporary factors shaping a dataset can make it easier to identify discriminatory practices that might be coded in algorithms or systems in ways that are not immediately obvious. She also notes that an artifact-based approach could lead to the development of new policies and structures ensuring that the root causes of bias in a particular dataset are eliminated. 

    “People often tell me that they are very afraid of AI, especially in health. They’ll say, ‘I’m really scared of an AI misdiagnosing me,’ or ‘I’m concerned it will treat me poorly,’” Ghassemi says. “I tell them, you shouldn’t be scared of some hypothetical AI in health tomorrow, you should be scared of what health is right now. If we take a narrow technical view of the data we extract from systems, we could naively replicate poor practices. That’s not the only option — realizing there is a problem is our first step towards a larger opportunity.”  More