More stories

  • in

    In-home wireless device tracks disease progression in Parkinson’s patients

    Parkinson’s disease is the fastest-growing neurological disease, now affecting more than 10 million people worldwide, yet clinicians still face huge challenges in tracking its severity and progression.

    Clinicians typically evaluate patients by testing their motor skills and cognitive functions during clinic visits. These semisubjective measurements are often skewed by outside factors — perhaps a patient is tired after a long drive to the hospital. More than 40 percent of individuals with Parkinson’s are never treated by a neurologist or Parkinson’s specialist, often because they live too far from an urban center or have difficulty traveling.

    In an effort to address these problems, researchers from MIT and elsewhere demonstrated an in-home device that can monitor a patient’s movement and gait speed, which can be used to evaluate Parkinson’s severity, the progression of the disease, and the patient’s response to medication.

    The device, which is about the size of a Wi-Fi router, gathers data passively using radio signals that reflect off the patient’s body as they move around their home. The patient does not need to wear a gadget or change their behavior. (A recent study, for example, showed that this type of device could be used to detect Parkinson’s from a person’s breathing patterns while sleeping.)

    The researchers used these devices to conduct a one-year at-home study with 50 participants. They showed that, by using machine-learning algorithms to analyze the troves of data they passively gathered (more than 200,000 gait speed measurements), a clinician could track Parkinson’s progression and medication response more effectively than they would with periodic, in-clinic evaluations.

    “By being able to have a device in the home that can monitor a patient and tell the doctor remotely about the progression of the disease, and the patient’s medication response so they can attend to the patient even if the patient can’t come to the clinic — now they have real, reliable information — that actually goes a long way toward improving equity and access,” says senior author Dina Katabi, the Thuan and Nicole Pham Professor in the Department of Electrical Engineering and Computer Science (EECS), and a principle investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Jameel Clinic.

    The co-lead authors are EECS graduate students Yingcheng Liu and Guo Zhang. The research is published today in Science Translational Medicine.

    A human radar

    This work utilizes a wireless device previously developed in the Katabi lab that analyzes radio signals that bounce off people’s bodies. It transmits signals that use a tiny fraction of the power of a Wi-Fi router — these super-low-power signals don’t interfere with other wireless devices in the home. While radio signals pass through walls and other solid objects, they are reflected off humans due to the water in our bodies.  

    This creates a “human radar” that can track the movement of a person in a room. Radio waves always travel at the same speed, so the length of time it takes the signals to reflect back to the device indicates how the person is moving.

    The device incorporates a machine-learning classifier that can pick out the precise radio signals reflected off the patient even when there are other people moving around the room. Advanced algorithms use these movement data to compute gait speed — how fast the person is walking.

    Because the device operates in the background and runs all day, every day, it can collect a massive amount of data. The researchers wanted to see if they could apply machine learning to these datasets to gain insights about the disease over time.

    They gathered 50 participants, 34 of whom had Parkinson’s, and conducted a one-year study of in-home gait measurements Through the study, the researchers collected more than 200,000 individual measurements that they averaged to smooth out variability due to the conditions irrelevant to the disease. (For example, a patient may hurry up to answer an alarm or walk slower when talking on the phone.)

    They used statistical methods to analyze the data and found that in-home gait speed can be used to effectively track Parkinson’s progression and severity. For instance, they showed that gait speed declined almost twice as fast for individuals with Parkinson’s, compared to those without. 

    “Monitoring the patient continuously as they move around the room enabled us to get really good measurements of their gait speed. And with so much data, we were able to perform aggregation that allowed us to see very small differences,” Zhang says.

    Better, faster results

    Drilling down on these variabilities offered some key insights. For instance, the researchers showed that daily fluctuations in a patient’s walking speed correspond with how they are responding to their medication — walking speed may improve after a dose and then begin to decline after a few hours, as the medication impact wears off.

    “This enables us to objectively measure how your mobility responds to your medication. Previously, this was very cumbersome to do because this medication effect could only be measured by having the patient keep a journal,” Liu says.

    A clinician could use these data to adjust medication dosage more effectively and accurately. This is especially important since drugs used to treat disease symptoms can cause serious side effects if the patient receives too much.

    The researchers were able to demonstrate statistically significant results regarding Parkinson’s progression after studying 50 people for just one year. By contrast, an often-cited study by the Michael J. Fox Foundation involved more than 500 individuals and monitored them for more than five years, Katabi says.

    “For a pharmaceutical company or a biotech company trying to develop medicines for this disease, this could greatly reduce the burden and cost and speed up the development of new therapies,” she adds.

    Katabi credits much of the study’s success to the dedicated team of scientists and clinicians who worked together to tackle the many difficulties that arose along the way. For one, they began the study before the Covid-19 pandemic, so team members initially visited people’s homes to set up the devices. When that was no longer possible, they developed a user-friendly phone app to remotely help participants as they deployed the device at home.

    Through the course of the study, they learned to automate processes and reduce effort, especially for the participants and clinical team.

    This knowledge will prove useful as they look to deploy devices in at-home studies of other neurological disorders, such as Alzheimer’s, ALS, and Huntington’s. They also want to explore how these methods could be used, in conjunction with other work from the Katabi lab showing that Parkinson’s can be diagnosed by monitoring breathing, to collect a holistic set of markers that could diagnose the disease early and then be used to track and treat it.

    “This radio-wave sensor can enable more care (and research) to migrate from hospitals to the home where it is most desired and needed,” says Ray Dorsey, a professor of neurology at the University of Rochester Medical Center, co-author of Ending Parkinson’s, and a co-author of this research paper. “Its potential is just beginning to be seen. We are moving toward a day where we can diagnose and predict disease at home. In the future, we may even be able to predict and ideally prevent events like falls and heart attacks.”

    This work is supported, in part, by the National Institutes of Health and the Michael J. Fox Foundation. More

  • in

    Emma Gibson: Optimizing health care logistics in Africa

    Growing up in South Africa at the turn of the century, Emma Gibson saw the rise of the HIV/AIDS epidemic and its devastating impact on her home country, where many people lacked life-saving health care. At the time, Gibson was too young to understand what a sexually transmitted infection was, but she knew that HIV was infecting millions of South Africans and AIDS was taking hundreds of thousands of lives. “As a child, I was terrified by this monster that was HIV and felt so powerless to do anything about it,” she says.

    Now, as an adult, her childhood fear of the HIV epidemic has evolved into a desire to fight it. Gibson seeks to improve health care for HIV and other diseases in regions with limited resources, including South Africa. She wants to help health care facilities in these areas to use their resources more effectively so that patients can more easily obtain care.

    To help reach her goal, Gibson sought mathematics and logistics training through higher education in South Africa. She first earned her bachelor’s degree in mathematical sciences at the University of the Witwatersrand, and then her master’s degree in operations research at Stellenbosch University. There, she learned to tackle complex decision-making problems using math, statistics, and computer simulations.

    During her master’s, Gibson studied the operational challenges faced in rural South African health care facilities by working with staff at Zithulele Hospital in the Eastern Cape, one of the country’s poorest provinces. Her research focused on ways to reduce hours-long wait times for patients seeking same-day care. In the end, she developed a software tool to model patient congestion throughout the day and optimize staff schedules accordingly, enabling the hospital to care for its patients more efficiently.

    After completing her master’s, Gibson wanted to further her education outside of South Africa and left to pursue a PhD in operations research at MIT. Upon arrival, she branched out in her research and worked on a project to improve breast cancer treatment in U.S. health care, a very different environment from what she was used to.

    Two years later, Gibson had the opportunity to return to researching health care in resource-limited settings and began working with Jónas Jónasson, an associate professor at the MIT Sloan School of Management, on a new project to improve diagnostic services in sub-Saharan Africa. For the past four years, she has been working diligently on this project in collaboration with researchers at the Indian School of Business and Northwestern University. “My love language is time,” she says. “If I’m investing a lot of time in something, I really value it.”

    Scheduling sample transport

    Diagnostic testing is an essential tool that allows medical professionals to identify new diagnoses in patients and monitor patients’ conditions as they undergo treatment. For example, people living with HIV require regular blood tests to ensure that their prescribed treatments are working effectively and provide an early warning of potential treatment failures.

    For Gibson’s current project, she’s trying to improve diagnostic services in Malawi, a landlocked country in southeast Africa. “We have the tools” to diagnose and treat diseases like HIV, she says. “But in resource-limited settings, we often lack the money, the staff, and the infrastructure to reach every patient that needs them.”

    When diagnostic testing is needed, clinicians collect samples from patients and send the samples to be tested at a laboratory, which then returns the results to the facility where the patient is treated. To move these items between facilities and laboratories, Malawi has developed a national sample transportation network. The transportation system plays an important role in linking remote, rural facilities to laboratory services and ensuring that patients in these areas can access diagnostic testing through community clinics. Samples collected at these clinics are first transported to nearby district hubs, and then forwarded to laboratories located in urban areas. Since most facilities do not have computers or communications infrastructure, laboratories print copies of test results and send them back to facilities through the same transportation process.

    The sample transportation cycle is onerous, but it’s a practical solution to a difficult problem. “During the Covid pandemic, we saw how hard it was to scale up diagnostic infrastructure,” Gibson says. Diagnostic services in sub-Saharan Africa face “similar challenges, but in a much poorer setting.”

    In Malawi, sample transportation is managed by a  nongovernment organization called Riders 4 Health. The organization has around 80 couriers on motorcycles who transport samples and test results between facilities. “When we started working with [Riders], the couriers operated on fixed weekly schedules, visiting each site once or twice a week,” Gibson says. But that led to “a lot of unnecessary trips and delays.”

    To make sample transportation more efficient, Gibson developed a dynamic scheduling system that adapts to the current demand for diagnostic testing. The system consists of two main parts: an information sharing platform that aggregates sample transportation data, and an algorithm that uses the data to generate optimized routes and schedules for sample transport couriers.

    In 2019, Gibson ran a four-month-long pilot test for this system in three out of the 27 districts in Malawi. During the pilot study, six couriers transported over 20,000 samples and results across 51 health care facilities, and 150 health care workers participated in data sharing.

    The pilot was a success. Gibson’s dynamic scheduling system eliminated about half the unnecessary trips and reduced transportation delays by 25 percent — a delay that used to be four days was reduced to three. Now, Riders 4 Health is developing their own version of Gibson’s system to operate nationally in Malawi. Throughout this project, “we focused on making sure this was something that could grow with the organization,” she says. “It’s gratifying to see that actually happening.”

    Leveraging patient data

    Gibson is completing her MIT degree this September but will continue working to improve health care in Africa. After graduation, she will join the technology and analytics health care practice of an established company in South Africa. Her initial focus will be on public health care institutions, including Chris Hani Baragwanath Academic Hospital in Johannesburg, the third-largest hospital in the world.

    In this role, Gibson will work to fill in gaps in African patient data for medical operational research and develop ways to use this data more effectively to improve health care in resource-limited areas. For example, better data systems can help to monitor the prevalence and impact of different diseases, guiding where health care workers and researchers put their efforts to help the most people. “You can’t make good decisions if you don’t have all the information,” Gibson says.

    To best leverage patient data for improving health care, Gibson plans to reevaluate how data systems are structured and used in the hospital. For ideas on upgrading the current system, she’ll look to existing data systems in other countries to see what works and what doesn’t, while also drawing upon her past research experience in U.S. health care. Ultimately, she’ll tailor the new hospital data system to South African needs to accurately inform future directions in health care.

    Gibson’s new job — her “dream job” — will be based in the United Kingdom, but she anticipates spending a significant amount of time in Johannesburg. “I have so many opportunities in the wider world, but the ones that appeal to me are always back in the place I came from,” she says. More

  • in

    Teaching AI to ask clinical questions

    Physicians often query a patient’s electronic health record for information that helps them make treatment decisions, but the cumbersome nature of these records hampers the process. Research has shown that even when a doctor has been trained to use an electronic health record (EHR), finding an answer to just one question can take, on average, more than eight minutes.

    The more time physicians must spend navigating an oftentimes clunky EHR interface, the less time they have to interact with patients and provide treatment.

    Researchers have begun developing machine-learning models that can streamline the process by automatically finding information physicians need in an EHR. However, training effective models requires huge datasets of relevant medical questions, which are often hard to come by due to privacy restrictions. Existing models struggle to generate authentic questions — those that would be asked by a human doctor — and are often unable to successfully find correct answers.

    To overcome this data shortage, researchers at MIT partnered with medical experts to study the questions physicians ask when reviewing EHRs. Then, they built a publicly available dataset of more than 2,000 clinically relevant questions written by these medical experts.

    When they used their dataset to train a machine-learning model to generate clinical questions, they found that the model asked high-quality and authentic questions, as compared to real questions from medical experts, more than 60 percent of the time.

    With this dataset, they plan to generate vast numbers of authentic medical questions and then use those questions to train a machine-learning model which would help doctors find sought-after information in a patient’s record more efficiently.

    “Two thousand questions may sound like a lot, but when you look at machine-learning models being trained nowadays, they have so much data, maybe billions of data points. When you train machine-learning models to work in health care settings, you have to be really creative because there is such a lack of data,” says lead author Eric Lehman, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    The senior author is Peter Szolovits, a professor in the Department of Electrical Engineering and Computer Science (EECS) who heads the Clinical Decision-Making Group in CSAIL and is also a member of the MIT-IBM Watson AI Lab. The research paper, a collaboration between co-authors at MIT, the MIT-IBM Watson AI Lab, IBM Research, and the doctors and medical experts who helped create questions and participated in the study, will be presented at the annual conference of the North American Chapter of the Association for Computational Linguistics.

    “Realistic data is critical for training models that are relevant to the task yet difficult to find or create,” Szolovits says. “The value of this work is in carefully collecting questions asked by clinicians about patient cases, from which we are able to develop methods that use these data and general language models to ask further plausible questions.”

    Data deficiency

    The few large datasets of clinical questions the researchers were able to find had a host of issues, Lehman explains. Some were composed of medical questions asked by patients on web forums, which are a far cry from physician questions. Other datasets contained questions produced from templates, so they are mostly identical in structure, making many questions unrealistic.

    “Collecting high-quality data is really important for doing machine-learning tasks, especially in a health care context, and we’ve shown that it can be done,” Lehman says.

    To build their dataset, the MIT researchers worked with practicing physicians and medical students in their last year of training. They gave these medical experts more than 100 EHR discharge summaries and told them to read through a summary and ask any questions they might have. The researchers didn’t put any restrictions on question types or structures in an effort to gather natural questions. They also asked the medical experts to identify the “trigger text” in the EHR that led them to ask each question.

    For instance, a medical expert might read a note in the EHR that says a patient’s past medical history is significant for prostate cancer and hypothyroidism. The trigger text “prostate cancer” could lead the expert to ask questions like “date of diagnosis?” or “any interventions done?”

    They found that most questions focused on symptoms, treatments, or the patient’s test results. While these findings weren’t unexpected, quantifying the number of questions about each broad topic will help them build an effective dataset for use in a real, clinical setting, says Lehman.

    Once they had compiled their dataset of questions and accompanying trigger text, they used it to train machine-learning models to ask new questions based on the trigger text.

    Then the medical experts determined whether those questions were “good” using four metrics: understandability (Does the question make sense to a human physician?), triviality (Is the question too easily answerable from the trigger text?), medical relevance (Does it makes sense to ask this question based on the context?), and relevancy to the trigger (Is the trigger related to the question?).

    Cause for concern

    The researchers found that when a model was given trigger text, it was able to generate a good question 63 percent of the time, whereas a human physician would ask a good question 80 percent of the time.

    They also trained models to recover answers to clinical questions using the publicly available datasets they had found at the outset of this project. Then they tested these trained models to see if they could find answers to “good” questions asked by human medical experts.

    The models were only able to recover about 25 percent of answers to physician-generated questions.

    “That result is really concerning. What people thought were good-performing models were, in practice, just awful because the evaluation questions they were testing on were not good to begin with,” Lehman says.

    The team is now applying this work toward their initial goal: building a model that can automatically answer physicians’ questions in an EHR. For the next step, they will use their dataset to train a machine-learning model that can automatically generate thousands or millions of good clinical questions, which can then be used to train a new model for automatic question answering.

    While there is still much work to do before that model could be a reality, Lehman is encouraged by the strong initial results the team demonstrated with this dataset.

    This research was supported, in part, by the MIT-IBM Watson AI Lab. Additional co-authors include Leo Anthony Celi of the MIT Institute for Medical Engineering and Science; Preethi Raghavan and Jennifer J. Liang of the MIT-IBM Watson AI Lab; Dana Moukheiber of the University of Buffalo; Vladislav Lialin and Anna Rumshisky of the University of Massachusetts at Lowell; Katelyn Legaspi, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, and Pia Gabrielle I. Alfonso of the University of the Philippines; Anne Janelle R. Sy and Patricia Therese S. Pile of the University of the East Ramon Magsaysay Memorial Medical Center; Marianne Taliño of the Ateneo de Manila University School of Medicine and Public Health; and Byron C. Wallace of Northeastern University. More

  • in

    Artificial intelligence predicts patients’ race from their medical images

    The miseducation of algorithms is a critical problem; when artificial intelligence mirrors unconscious thoughts, racism, and biases of the humans who generated these algorithms, it can lead to serious harm. Computer programs, for example, have wrongly flagged Black defendants as twice as likely to reoffend as someone who’s white. When an AI used cost as a proxy for health needs, it falsely named Black patients as healthier than equally sick white ones, as less money was spent on them. Even AI used to write a play relied on using harmful stereotypes for casting. 

    Removing sensitive features from the data seems like a viable tweak. But what happens when it’s not enough? 

    Examples of bias in natural language processing are boundless — but MIT scientists have investigated another important, largely underexplored modality: medical images. Using both private and public datasets, the team found that AI can accurately predict self-reported race of patients from medical images alone. Using imaging data of chest X-rays, limb X-rays, chest CT scans, and mammograms, the team trained a deep learning model to identify race as white, Black, or Asian — even though the images themselves contained no explicit mention of the patient’s race. This is a feat even the most seasoned physicians cannot do, and it’s not clear how the model was able to do this. 

    In an attempt to tease out and make sense of the enigmatic “how” of it all, the researchers ran a slew of experiments. To investigate possible mechanisms of race detection, they looked at variables like differences in anatomy, bone density, resolution of images — and many more, and the models still prevailed with high ability to detect race from chest X-rays. “These results were initially confusing, because the members of our research team could not come anywhere close to identifying a good proxy for this task,” says paper co-author Marzyeh Ghassemi, an assistant professor in the MIT Department of Electrical Engineering and Computer Science and the Institute for Medical Engineering and Science (IMES), who is an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and of the MIT Jameel Clinic. “Even when you filter medical images past where the images are recognizable as medical images at all, deep models maintain a very high performance. That is concerning because superhuman capacities are generally much more difficult to control, regulate, and prevent from harming people.”

    In a clinical setting, algorithms can help tell us whether a patient is a candidate for chemotherapy, dictate the triage of patients, or decide if a movement to the ICU is necessary. “We think that the algorithms are only looking at vital signs or laboratory tests, but it’s possible they’re also looking at your race, ethnicity, sex, whether you’re incarcerated or not — even if all of that information is hidden,” says paper co-author Leo Anthony Celi, principal research scientist in IMES at MIT and associate professor of medicine at Harvard Medical School. “Just because you have representation of different groups in your algorithms, that doesn’t guarantee it won’t perpetuate or magnify existing disparities and inequities. Feeding the algorithms with more data with representation is not a panacea. This paper should make us pause and truly reconsider whether we are ready to bring AI to the bedside.” 

    The study, “AI recognition of patient race in medical imaging: a modeling study,” was published in Lancet Digital Health on May 11. Celi and Ghassemi wrote the paper alongside 20 other authors in four countries.

    To set up the tests, the scientists first showed that the models were able to predict race across multiple imaging modalities, various datasets, and diverse clinical tasks, as well as across a range of academic centers and patient populations in the United States. They used three large chest X-ray datasets, and tested the model on an unseen subset of the dataset used to train the model and a completely different one. Next, they trained the racial identity detection models for non-chest X-ray images from multiple body locations, including digital radiography, mammography, lateral cervical spine radiographs, and chest CTs to see whether the model’s performance was limited to chest X-rays. 

    The team covered many bases in an attempt to explain the model’s behavior: differences in physical characteristics between different racial groups (body habitus, breast density), disease distribution (previous studies have shown that Black patients have a higher incidence for health issues like cardiac disease), location-specific or tissue specific differences, effects of societal bias and environmental stress, the ability of deep learning systems to detect race when multiple demographic and patient factors were combined, and if specific image regions contributed to recognizing race. 

    What emerged was truly staggering: The ability of the models to predict race from diagnostic labels alone was much lower than the chest X-ray image-based models. 

    For example, the bone density test used images where the thicker part of the bone appeared white, and the thinner part appeared more gray or translucent. Scientists assumed that since Black people generally have higher bone mineral density, the color differences helped the AI models to detect race. To cut that off, they clipped the images with a filter, so the model couldn’t color differences. It turned out that cutting off the color supply didn’t faze the model — it still could accurately predict races. (The “Area Under the Curve” value, meaning the measure of the accuracy of a quantitative diagnostic test, was 0.94–0.96). As such, the learned features of the model appeared to rely on all regions of the image, meaning that controlling this type of algorithmic behavior presents a messy, challenging problem. 

    The scientists acknowledge limited availability of racial identity labels, which caused them to focus on Asian, Black, and white populations, and that their ground truth was a self-reported detail. Other forthcoming work will include potentially looking at isolating different signals before image reconstruction, because, as with bone density experiments, they couldn’t account for residual bone tissue that was on the images. 

    Notably, other work by Ghassemi and Celi led by MIT student Hammaad Adam has found that models can also identify patient self-reported race from clinical notes even when those notes are stripped of explicit indicators of race. Just as in this work, human experts are not able to accurately predict patient race from the same redacted clinical notes.

    “We need to bring social scientists into the picture. Domain experts, which are usually the clinicians, public health practitioners, computer scientists, and engineers are not enough. Health care is a social-cultural problem just as much as it’s a medical problem. We need another group of experts to weigh in and to provide input and feedback on how we design, develop, deploy, and evaluate these algorithms,” says Celi. “We need to also ask the data scientists, before any exploration of the data, are there disparities? Which patient groups are marginalized? What are the drivers of those disparities? Is it access to care? Is it from the subjectivity of the care providers? If we don’t understand that, we won’t have a chance of being able to identify the unintended consequences of the algorithms, and there’s no way we’ll be able to safeguard the algorithms from perpetuating biases.”

    “The fact that algorithms ‘see’ race, as the authors convincingly document, can be dangerous. But an important and related fact is that, when used carefully, algorithms can also work to counter bias,” says Ziad Obermeyer, associate professor at the University of California at Berkeley, whose research focuses on AI applied to health. “In our own work, led by computer scientist Emma Pierson at Cornell, we show that algorithms that learn from patients’ pain experiences can find new sources of knee pain in X-rays that disproportionately affect Black patients — and are disproportionately missed by radiologists. So just like any tool, algorithms can be a force for evil or a force for good — which one depends on us, and the choices we make when we build algorithms.”

    The work is supported, in part, by the National Institutes of Health. More

  • in

    Estimating the informativeness of data

    Not all data are created equal. But how much information is any piece of data likely to contain? This question is central to medical testing, designing scientific experiments, and even to everyday human learning and thinking. MIT researchers have developed a new way to solve this problem, opening up new applications in medicine, scientific discovery, cognitive science, and artificial intelligence.

    In theory, the 1948 paper, “A Mathematical Theory of Communication,” by the late MIT Professor Emeritus Claude Shannon answered this question definitively. One of Shannon’s breakthrough results is the idea of entropy, which lets us quantify the amount of information inherent in any random object, including random variables that model observed data. Shannon’s results created the foundations of information theory and modern telecommunications. The concept of entropy has also proven central to computer science and machine learning.

    The challenge of estimating entropy

    Unfortunately, the use of Shannon’s formula can quickly become computationally intractable. It requires precisely calculating the probability of the data, which in turn requires calculating every possible way the data could have arisen under a probabilistic model. If the data-generating process is very simple — for example, a single toss of a coin or roll of a loaded die — then calculating entropies is straightforward. But consider the problem of medical testing, where a positive test result is the result of hundreds of interacting variables, all unknown. With just 10 unknowns, there are already 1,000 possible explanations for the data. With a few hundred, there are more possible explanations than atoms in the known universe, which makes calculating the entropy exactly an unmanageable problem.

    MIT researchers have developed a new method to estimate good approximations to many information quantities such as Shannon entropy by using probabilistic inference. The work appears in a paper presented at AISTATS 2022 by authors Feras Saad ’16, MEng ’16, a PhD candidate in electrical engineering and computer science; Marco-Cusumano Towner PhD ’21; and Vikash Mansinghka ’05, MEng ’09, PhD ’09, a principal research scientist in the Department of Brain and Cognitive Sciences. The key insight is, rather than enumerate all explanations, to instead use probabilistic inference algorithms to first infer which explanations are probable and then use these probable explanations to construct high-quality entropy estimates. The paper shows that this inference-based approach can be much faster and more accurate than previous approaches.

    Estimating entropy and information in a probabilistic model is fundamentally hard because it often requires solving a high-dimensional integration problem. Many previous works have developed estimators of these quantities for certain special cases, but the new estimators of entropy via inference (EEVI) offer the first approach that can deliver sharp upper and lower bounds on a broad set of information-theoretic quantities. An upper and lower bound means that although we don’t know the true entropy, we can get a number that is smaller than it and a number that is higher than it.

    “The upper and lower bounds on entropy delivered by our method are particularly useful for three reasons,” says Saad. “First, the difference between the upper and lower bounds gives a quantitative sense of how confident we should be about the estimates. Second, by using more computational effort we can drive the difference between the two bounds to zero, which ‘squeezes’ the true value with a high degree of accuracy. Third, we can compose these bounds to form estimates of many other quantities that tell us how informative different variables in a model are of one another.”

    Solving fundamental problems with data-driven expert systems

    Saad says he is most excited about the possibility that this method gives for querying probabilistic models in areas like machine-assisted medical diagnoses. He says one goal of the EEVI method is to be able to solve new queries using rich generative models for things like liver disease and diabetes that have already been developed by experts in the medical domain. For example, suppose we have a patient with a set of observed attributes (height, weight, age, etc.) and observed symptoms (nausea, blood pressure, etc.). Given these attributes and symptoms, EEVI can be used to help determine which medical tests for symptoms the physician should conduct to maximize information about the absence or presence of a given liver disease (like cirrhosis or primary biliary cholangitis).

    For insulin diagnosis, the authors showed how to use the method for computing optimal times to take blood glucose measurements that maximize information about a patient’s insulin sensitivity, given an expert-built probabilistic model of insulin metabolism and the patient’s personalized meal and medication schedule. As routine medical tracking like glucose monitoring moves away from doctor’s offices and toward wearable devices, there are even more opportunities to improve data acquisition, if the value of the data can be estimated accurately in advance.

    Vikash Mansinghka, senior author on the paper, adds, “We’ve shown that probabilistic inference algorithms can be used to estimate rigorous bounds on information measures that AI engineers often think of as intractable to calculate. This opens up many new applications. It also shows that inference may be more computationally fundamental than we thought. It also helps to explain how human minds might be able to estimate the value of information so pervasively, as a central building block of everyday cognition, and help us engineer AI expert systems that have these capabilities.”

    The paper, “Estimators of Entropy and Information via Inference in Probabilistic Models,” was presented at AISTATS 2022. More

  • in

    Emery Brown earns American Institute for Medical and Biological Engineering Pierre Galletti Award

    The American Institute for Medical and Biological Engineering has awarded its highest honor this year to Emery N. Brown, the Edward Hood Taplin Professor of Computational Neuroscience and Health Sciences and Technology in The Picower Institute for Learning and Memory and the Institute for Medical Engineering and Science at MIT.

    Brown, who is also an anesthesiologist at Massachusetts General Hospital and the Warren M. Zapol Professor at Harvard Medical School, received the 2022 Pierre M. Galletti Award during the national organization’s Annual Event held on March 25.

    For decades, Brown’s lab has uniquely unified three fields: neuroscience, statistics, and anesthesiology. He is renowned for the development of statistical methods and signal-processing algorithms to enable and improve analysis of neural activity measurements. The work has had numerous applications including studies of learning and memory, brain-computer interfaces, and systems neuroscience. He has also pioneered investigations of how general anesthetic drugs work in the brain to induce and maintain simultaneous but reversible states of unconsciousness, amnesia, immobility, and analgesia. Building on these improvements in fundamental understanding, his lab engineers systems to improve monitoring of patient state and anesthetic dosing during surgery. Optimizing doses of general anesthetic drugs can improve patient care in many ways, including by minimizing side effects such as post-operative delirium and by improving post-operative pain management.

    AIMBE said Brown earned the award in recognition of his “significant contributions to neuroscience data analysis and for characterizing the neurophysiology of anesthesia-induced unconsciousness and demonstrating how it can be reliably monitored in real time using electroencephalogram recordings.”

    Brown, who is also a faculty member in MIT’s Department of Brain and Cognitive Sciences, is now working to develop a research center at MIT dedicated to taking neuroscience-based approaches to advance anesthesiology.

    “I am extremely honored and grateful to the AIMBE for choosing me to receive the 2022 Galletti Award in recognition of my research deciphering the neuroscience of how anesthetics work,” he says. “I would like to express my gratitude to my collaborators, post-doctoral fellows, students, research assistants, and clinical coordinators who have made this possible.” More

  • in

    Generating new molecules with graph grammar

    Chemical engineers and materials scientists are constantly looking for the next revolutionary material, chemical, and drug. The rise of machine-learning approaches is expediting the discovery process, which could otherwise take years. “Ideally, the goal is to train a machine-learning model on a few existing chemical samples and then allow it to produce as many manufacturable molecules of the same class as possible, with predictable physical properties,” says Wojciech Matusik, professor of electrical engineering and computer science at MIT. “If you have all these components, you can build new molecules with optimal properties, and you also know how to synthesize them. That’s the overall vision that people in that space want to achieve”

    However, current techniques, mainly deep learning, require extensive datasets for training models, and many class-specific chemical datasets contain a handful of example compounds, limiting their ability to generalize and generate physical molecules that could be created in the real world.

    Now, a new paper from researchers at MIT and IBM tackles this problem using a generative graph model to build new synthesizable molecules within the same chemical class as their training data. To do this, they treat the formation of atoms and chemical bonds as a graph and develop a graph grammar — a linguistics analogy of systems and structures for word ordering — that contains a sequence of rules for building molecules, such as monomers and polymers. Using the grammar and production rules that were inferred from the training set, the model can not only reverse engineer its examples, but can create new compounds in a systematic and data-efficient way. “We basically built a language for creating molecules,” says Matusik “This grammar essentially is the generative model.”

    Matusik’s co-authors include MIT graduate students Minghao Guo, who is the lead author, and Beichen Li as well as Veronika Thost, Payal Das, and Jie Chen, research staff members with IBM Research. Matusik, Thost, and Chen are affiliated with the MIT-IBM Watson AI Lab. Their method, which they’ve called data-efficient graph grammar (DEG), will be presented at the International Conference on Learning Representations.

    “We want to use this grammar representation for monomer and polymer generation, because this grammar is explainable and expressive,” says Guo. “With only a few number of the production rules, we can generate many kinds of structures.”

    A molecular structure can be thought of as a symbolic representation in a graph — a string of atoms (nodes) joined together by chemical bonds (edges). In this method, the researchers allow the model to take the chemical structure and collapse a substructure of the molecule down to one node; this may be two atoms connected by a bond, a short sequence of bonded atoms, or a ring of atoms. This is done repeatedly, creating the production rules as it goes, until a single node remains. The rules and grammar then could be applied in the reverse order to recreate the training set from scratch or combined in different combinations to produce new molecules of the same chemical class.

    “Existing graph generation methods would produce one node or one edge sequentially at a time, but we are looking at higher-level structures and, specifically, exploiting chemistry knowledge, so that we don’t treat the individual atoms and bonds as the unit. This simplifies the generation process and also makes it more data-efficient to learn,” says Chen.

    Further, the researchers optimized the technique so that the bottom-up grammar was relatively simple and straightforward, such that it fabricated molecules that could be made.

    “If we switch the order of applying these production rules, we would get another molecule; what’s more, we can enumerate all the possibilities and generate tons of them,” says Chen. “Some of these molecules are valid and some of them not, so the learning of the grammar itself is actually to figure out a minimal collection of production rules, such that the percentage of molecules that can actually be synthesized is maximized.” While the researchers concentrated on three training sets of less than 33 samples each — acrylates, chain extenders, and isocyanates — they note that the process could be applied to any chemical class.

    To see how their method performed, the researchers tested DEG against other state-of-the-art models and techniques, looking at percentages of chemically valid and unique molecules, diversity of those created, success rate of retrosynthesis, and percentage of molecules belonging to the training data’s monomer class.

    “We clearly show that, for the synthesizability and membership, our algorithm outperforms all the existing methods by a very large margin, while it’s comparable for some other widely-used metrics,” says Guo. Further, “what is amazing about our algorithm is that we only need about 0.15 percent of the original dataset to achieve very similar results compared to state-of-the-art approaches that train on tens of thousands of samples. Our algorithm can specifically handle the problem of data sparsity.”

    In the immediate future, the team plans to address scaling up this grammar learning process to be able to generate large graphs, as well as produce and identify chemicals with desired properties.

    Down the road, the researchers see many applications for the DEG method, as it’s adaptable beyond generating new chemical structures, the team points out. A graph is a very flexible representation, and many entities can be symbolized in this form — robots, vehicles, buildings, and electronic circuits, for example. “Essentially, our goal is to build up our grammar, so that our graphic representation can be widely used across many different domains,” says Guo, as “DEG can automate the design of novel entities and structures,” says Chen.

    This research was supported, in part, by the MIT-IBM Watson AI Lab and Evonik. More

  • in

    Deep-learning technique predicts clinical treatment outcomes

    When it comes to treatment strategies for critically ill patients, clinicians want to be able to consider all their options and timing of administration, and make the optimal decision for their patients. While clinician experience and study has helped them to be successful in this effort, not all patients are the same, and treatment decisions at this crucial time could mean the difference between patient improvement and quick deterioration. Therefore, it would be helpful for doctors to be able to take a patient’s previous known health status and received treatments and use that to predict that patient’s health outcome under different treatment scenarios, in order to pick the best path.

    Now, a deep-learning technique, called G-Net, from researchers at MIT and IBM provides a window into causal counterfactual prediction, affording physicians the opportunity to explore how a patient might fare under different treatment plans. The foundation of G-Net is the g-computation algorithm, a causal inference method that estimates the effect of dynamic exposures in the presence of measured confounding variables — ones that may influence both treatments and outcomes. Unlike previous implementations of the g-computation framework, which have used linear modeling approaches, G-Net uses recurrent neural networks (RNN), which have node connections that allow them to better model temporal sequences with complex and nonlinear dynamics, like those found in the physiological and clinical time series data. In this way, physicians can develop alternative plans based on patient history and test them before making a decision.

    “Our ultimate goal is to develop a machine learning technique that would allow doctors to explore various ‘What if’ scenarios and treatment options,” says Li-wei Lehman, MIT research scientist in the MIT Institute for Medical Engineering and Science and an MIT-IBM Watson AI Lab project lead. “A lot of work has been done in terms of deep learning for counterfactual prediction but [it’s] been focusing on a point exposure setting,” or a static, time-varying treatment strategy, which doesn’t allow for adjustment of treatments as patient history changes. However, her team’s new prediction approach provides for treatment plan flexibility and chances for treatment alteration over time as patient covariate history and past treatments change. “G-Net is the first deep-learning approach based on g-computation that can predict both the population-level and individual-level treatment effects under dynamic and time varying treatment strategies.”

    The research, which was recently published in the Proceedings of Machine Learning Research, was co-authored by Rui Li MEng ’20, Stephanie Hu MEng ’21, former MIT postdoc Mingyu Lu MD, graduate student Yuria Utsumi, IBM research staff member Prithwish Chakraborty, IBM Research director of Hybrid Cloud Services Daby Sow, IBM data scientist Piyush Madan, IBM research scientist Mohamed Ghalwash, and IBM research scientist Zach Shahn.

    Tracking disease progression

    To build, validate, and test G-Net’s predictive abilities, the researchers considered the circulatory system in septic patients in the ICU. During critical care, doctors need to make trade-offs and judgement calls, such as ensuring the organs are receiving adequate blood supply without overworking the heart. For this, they could give intravenous fluids to patients to increase blood pressure; however, too much can cause edema. Alternatively, physicians can administer vasopressors, which act to contract blood vessels and raise blood pressure.

    In order to mimic this and demonstrate G-Net’s proof-of-concept, the team used CVSim, a mechanistic model of a human cardiovascular system that’s governed by 28 input variables characterizing the system’s current state, such as arterial pressure, central venous pressure, total blood volume, and total peripheral resistance, and modified it to simulate various disease processes (e.g., sepsis or blood loss) and effects of interventions (e.g., fluids and vasopressors). The researchers used CVSim to generate observational patient data for training and for “ground truth” comparison against counterfactual prediction. In their G-Net architecture, the researchers ran two RNNs to handle and predict variables that are continuous, meaning they can take on a range of values, like blood pressure, and categorical variables, which have discrete values, like the presence or absence of pulmonary edema. The researchers simulated the health trajectories of thousands of “patients” exhibiting symptoms under one treatment regime, let’s say A, for 66 timesteps, and used them to train and validate their model.

    Testing G-Net’s prediction capability, the team generated two counterfactual datasets. Each contained roughly 1,000 known patient health trajectories, which were created from CVSim using the same “patient” condition as the starting point under treatment A. Then at timestep 33, treatment changed to plan B or C, depending on the dataset. The team then performed 100 prediction trajectories for each of these 1,000 patients, whose treatment and medical history was known up until timestep 33 when a new treatment was administered. In these cases, the prediction agreed well with the “ground-truth” observations for individual patients and averaged population-level trajectories.

    A cut above the rest

    Since the g-computation framework is flexible, the researchers wanted to examine G-Net’s prediction using different nonlinear models — in this case, long short-term memory (LSTM) models, which are a type of RNN that can learn from previous data patterns or sequences — against the more classical linear models and a multilayer perception model (MLP), a type of neural network that can make predictions using a nonlinear approach. Following a similar setup as before, the team found that the error between the known and predicted cases was smallest in the LSTM models compared to the others. Since G-Net is able to model the temporal patterns of the patient’s ICU history and past treatment, whereas a linear model and MLP cannot, it was better able to predict the patient’s outcome.

    The team also compared G-Net’s prediction in a static, time-varying treatment setting against two state-of-the-art deep-learning based counterfactual prediction approaches, a recurrent marginal structural network (rMSN) and a counterfactual recurrent neural network (CRN), as well as a linear model and an MLP. For this, they investigated a model for tumor growth under no treatment, radiation, chemotherapy, and both radiation and chemotherapy scenarios. “Imagine a scenario where there’s a patient with cancer, and an example of a static regime would be if you only give a fixed dosage of chemotherapy, radiation, or any kind of drug, and wait until the end of your trajectory,” comments Lu. For these investigations, the researchers generated simulated observational data using tumor volume as the primary influence dictating treatment plans and demonstrated that G-Net outperformed the other models. One potential reason could be because g-computation is known to be more statistically efficient than rMSN and CRN, when models are correctly specified.

    While G-Net has done well with simulated data, more needs to be done before it can be applied to real patients. Since neural networks can be thought of as “black boxes” for prediction results, the researchers are beginning to investigate the uncertainty in the model to help ensure safety. In contrast to these approaches that recommend an “optimal” treatment plan without any clinician involvement, “as a decision support tool, I believe that G-Net would be more interpretable, since the clinicians would input treatment strategies themselves,” says Lehman, and “G-Net will allow them to be able to explore different hypotheses.” Further, the team has moved on to using real data from ICU patients with sepsis, bringing it one step closer to implementation in hospitals.

    “I think it is pretty important and exciting for real-world applications,” says Hu. “It’d be helpful to have some way to predict whether or not a treatment might work or what the effects might be — a quicker iteration process for developing these hypotheses for what to try, before actually trying to implement them in in a years-long, potentially very involved and very invasive type of clinical trial.”

    This research was funded by the MIT-IBM Watson AI Lab. More