More stories

  • in

    Advancing social studies at MIT Sloan

    Around 2010, Facebook was a relatively small company with about 2,000 employees. So, when a PhD student named Dean Eckles showed up to serve an intership at the firm, he landed in a position with some real duties.

    Eckles essentially became the primary data scientist for the product manager who was overseeing the platform’s news feeds. That manager would pepper Eckles with questions. How exactly do people influence each other online? If Facebook tweaked its content-ranking algorithms, what would happen? What occurs when you show people more photos?

    As a doctoral candidate already studying social influence, Eckles was well-equipped to think about such questions, and being at Facebook gave him a lot of data to study them. 

    “If you show people more photos, they post more photos themselves,” Eckles says. “In turn, that affects the experience of all their friends. Plus they’re getting more likes and more comments. It affects everybody’s experience. But can you account for all of these compounding effects across the network?”

    Eckles, now an associate professor in the MIT Sloan School of Management and an affiliate faculty member of the Institute for Data, Systems, and Society, has made a career out of thinking carefully about that last question. Studying social networks allows Eckles to tackle significant questions involving, for example, the economic and political effects of social networks, the spread of misinformation, vaccine uptake during the Covid-19 crisis, and other aspects of the formation and shape of social networks. For instance, one study he co-authored this summer shows that people who either move between U.S. states, change high schools, or attend college out of state, wind up with more robust social networks, which are strongly associated with greater economic success.

    Eckles maintains another research channel focused on what scholars call “causal inference,” the methods and techniques that allow researchers to identify cause-and-effect connections in the world.

    “Learning about cause-and-effect relationships is core to so much science,” Eckles says. “In behavioral, social, economic, or biomedical science, it’s going to be hard. When you start thinking about humans, causality gets difficult. People do things strategically, and they’re electing into situations based on their own goals, so that complicates a lot of cause-and-effect relationships.”

    Eckles has now published dozens of papers in each of his different areas of work; for his research and teaching, Eckles received tenure from MIT last year.

    Five degrees and a job

    Eckles grew up in California, mostly near the Lake Tahoe area. He attended Stanford University as an undergraduate, arriving on campus in fall 2002 — and didn’t really leave for about a decade. Eckles has five degrees from Stanford. As an undergrad, he received a BA in philosophy and a BS in symbolic systems, an interdisciplinary major combining computer science, philosophy, psychology, and more. Eckles was set to attend Oxford University for graduate work in philosophy but changed his mind and stayed at Stanford for an MS in symbolic systems too. 

    “[Oxford] might have been a great experience, but I decided to focus more on the tech side of things,” he says.

    After receiving his first master’s degree, Eckles did take a year off from school and worked for Nokia, although the firm’s offices were adjacent to the Stanford campus and Eckles would sometimes stop and talk to faculty during the workday. Soon he was enrolled at Stanford again, this time earning his PhD in communication, in 2012, while receiving an MA in statistics the year before. His doctoral dissertation wound up being about peer influence in networks. PhD in hand, Eckles promptly headed back to Facebook, this time for three years as a full-time researcher.

     “They were really supportive of the work I was doing,” Eckles says.

    Still, Eckles remained interested in moving into academia, and joined the MIT faculty in 2017 with a position in MIT Sloan’s Marketing Group. The group consists of a set of scholars with far-ranging interests, from cognitive science to advertising to social network dynamics.

    “Our group reflects something deeper about the Sloan school and about MIT as well, an openness to doing things differently and not having to fit into narrowly defined tracks,” Eckles says.

    For that matter, MIT has many faculty in different domains who work on causal inference, and whose work Eckles quickly cites — including economists Victor Chernozhukov and Alberto Abadie, and Joshua Angrist, whose book “Mostly Harmless Econometrics” Eckles name-checks as an influence.

    “I’ve been fortunate in my career that causal inference turned out to be a hot area,” Eckles says. “But I think it’s hot for good reasons. People started to realize that, yes, causal inference is really important. There are economists, computer scientists, statisticians, and epidemiologists who are going to the same conferences and citing each other’s papers. There’s a lot happening.”

    How do networks form?

    These days, Eckles is interested in expanding the questions he works on. In the past, he has often studied existing social networks and looked at their effects. For instance: One study Eckles co-authored, examining the 2012 U.S. elections, found that get-out-the-vote messages work very well, especially when relayed via friends.

    That kind of study takes the existence of the network as a given, though. Another kind of research question is, as Eckles puts it, “How do social networks form and evolve? And what are the consequences of these network structures?” His recent study about social networks expanding as people move around and change schools is one example of research that digs into the core life experiences underlying social networks.

    “I’m excited about doing more on how these networks arise and what factors, including everything from personality to public transit, affect their formation,” Eckles says.

    Understanding more about how social networks form gets at key questions about social life and civic structure. Suppose research shows how some people develop and maintain beneficial connections in life; it’s possible that those insights could be applied to programs helping people in more disadvantaged situations realize some of the same opportunities.

    “We want to act on things,” Eckles says. “Sometimes people say, ‘We care about prediction.’ I would say, ‘We care about prediction under intervention.’ We want to predict what’s going to happen if we try different things.”

    Ultimately, Eckles reflects, “Trying to reason about the origins and maintenance of social networks, and the effects of networks, is interesting substantively and methodologically. Networks are super-high-dimensional objects, even just a single person’s network and all its connections. You have to summarize it, so for instance we talk about weak ties or strong ties, but do we have the correct description? There are fascinating questions that require development, and I’m eager to keep working on them.”   More

  • in

    M’Care and MIT students join forces to improve child health in Nigeria

    Through a collaboration between M’Care, a 2021 Health Security and Pandemics Solver team, and students from MIT, the landscape of child health care in Nigeria could undergo a transformative change, wherein the power of data is harnessed to improve child health outcomes in economically disadvantaged communities. 

    M’Care is a mobile application of Promane and Promade Limited, developed by Opeoluwa Ashimi, which gives community health workers in Nigeria real-time diagnostic and treatment support. The application also creates a dashboard that is available to government health officials to help identify disease trends and deploy timely interventions. As part of its work, M’Care is working to mitigate malnutrition by providing micronutrient powder, vitamin A, and zinc to children below the age of 5. To help deepen its impact, Ashimi decided to work with students in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) course 6.S897 (Machine Learning for Healthcare) — instructed by professors Peter Szolovits and Manolis Kellis — to leverage data in order to improve nutrient delivery to children across Nigeria. The collaboration also enabled students to see real-world applications for data analysis in the health care space.

    A meeting of minds: M’Care, MIT, and national health authorities

    “Our primary goal for collaborating with the ML for Health team was to spot the missing link in the continuum of care. With over 1 million cumulative consultations that qualify for a continuum of care evaluation, it was important to spot why patients could be lost to followup, prevent this, and ensure completion of care to successfully address the health needs of our patients,” says Ashimi, founder and CEO of M’Care.

    In May 2023, Ashimi attended a meeting that brought together key national stakeholders, including the representatives of the National Ministry of Health in Nigeria. This gathering served as a platform to discuss the profound impact of M’Care’s and ML for Health team’s collaboration — bolstered by data analysis provided on dosage regimens and a child’s age to enhance continuum of care with its attendant impact on children’s health, particularly in relation to brain development with regards to the use of essential micronutrients. The data analyzed by the students using ML methods that were shared during the meeting provided strong supporting evidence to individualize dosage regimens for children based on their age in months for the ANRIN project — a national nutrition project supported by the World Bank — as well as policy decisions to extend months of coverage for children, redefining health care practices in Nigeria.

    MIT students drive change by harnessing the power of data

    At the heart of this collaboration lies the contribution of MIT students. Armed with their dedication and skill in data analysis and machine learning, they played a pivotal role in helping M’Care analyze their data and prepare for their meeting with the Ministry of Health. Their most significant findings included ways to identify patients at risk of not completing their full course of micronutrient powder and/or vitamin A, and identifying gaps in M’Care’s data, such as postdated delivery dates and community demographics. These findings are already helping M’Care better plan its resources and adjust the scope of its program to ensure more children complete the intervention.

    Darcy Kim, an undergraduate at Wellesley College studying math and computer science, who is cross-registered for the MIT machine learning course, expresses enthusiasm about the practical applications found within the project: “To me, data and math is storytelling, and the story is why I love studying it. … I learned that data exploration involves asking questions about how the data is collected, and that surprising patterns that arise often have a qualitative explanation. Impactful research requires radical collaboration with the people the research intends to help. Otherwise, these qualitative explanations get lost in the numbers.”

    Joyce Luo, a first-year operations research PhD student at the Operations Research Center at MIT, shares similar thoughts about the project: “I learned the importance of understanding the context behind data to figure out what kind of analysis might be most impactful. This involves being in frequent contact with the company or organization who provides the data to learn as much as you can about how the data was collected and the people the analysis could help. Stepping back and looking at the bigger picture, rather than just focusing on accuracy or metrics, is extremely important.”

    Insights to implementation: A new era for micronutrient dosing

    As a direct result of M’Care’s collaboration with MIT, policymakers revamped the dosing scheme for essential micronutrient administration for children in Nigeria to prevent malnutrition. M’Care and MIT’s data analysis unearthed critical insights into the limited frequency of medical visits caused by late-age enrollment. 

    “One big takeaway for me was that the data analysis portion of the project — doing a deep dive into the data; understanding, analyzing, visualizing, and summarizing the data — can be just as important as building the machine learning models. M’Care shared our data analysis with the National Ministry of Health, and the insights from it drove them to change their dosing scheme and schedule for delivering micronutrient powder to young children. This really showed us the value of understanding and knowing your data before modeling,” shares Angela Lin, a second-year PhD student at the Operations Research Center.

    Armed with this knowledge, policymakers are eager to develop an optimized dosing scheme that caters to the unique needs of children in disadvantaged communities, ensuring maximum impact on their brain development and overall well-being.

    Siddharth Srivastava, M’Care’s corporate technology liaison, shares his gratitude for the MIT student’s input. “Collaborating with enthusiastic and driven students was both empowering and inspiring. Each of them brought unique perspectives and technical skills to the table. Their passion for applying machine learning to health care was evident in their unwavering dedication and proactive approach to problem-solving.”

    Forging a path to impact

    The collaboration between M’Care and MIT exemplifies the remarkable achievements that arise when academia, innovative problem-solvers, and policy authorities unite. By merging academic rigor with real-world expertise, this partnership has the potential to revolutionize child health care not only in Nigeria but also in similar contexts worldwide.

    “I believe applying innovative methods of machine learning, data gathering, instrumentation, and planning to real problems in the developing world can be highly effective for those countries and highly motivating for our students. I was happy to have such a project in our class portfolio this year and look forward to future opportunities,” says Peter Szolovits, professor of computer science and engineering at MIT.

    By harnessing the power of data, innovation, and collective expertise, this collaboration between M’Care and MIT has the potential to improve equitable child health care in Nigeria. “It has been so fulfilling to see how our team’s work has been able to create even the smallest positive impact in such a short period of time, and it has been amazing to work with a company like Promane and Promade Limited that is so knowledgeable and caring for the communities that they serve,” shares Elizabeth Whittier, a second-year PhD electrical engineering student at MIT. More

  • in

    Artificial intelligence for augmentation and productivity

    The MIT Stephen A. Schwarzman College of Computing has awarded seed grants to seven projects that are exploring how artificial intelligence and human-computer interaction can be leveraged to enhance modern work spaces to achieve better management and higher productivity.

    Funded by Andrew W. Houston ’05 and Dropbox Inc., the projects are intended to be interdisciplinary and bring together researchers from computing, social sciences, and management.

    The seed grants can enable the project teams to conduct research that leads to bigger endeavors in this rapidly evolving area, as well as build community around questions related to AI-augmented management.

    The seven selected projects and research leads include:

    “LLMex: Implementing Vannevar Bush’s Vision of the Memex Using Large Language Models,” led by Patti Maes of the Media Lab and David Karger of the Department of Electrical Engineering and Computer Science (EECS) and the Computer Science and Artificial Intelligence Laboratory (CSAIL). Inspired by Vannevar Bush’s Memex, this project proposes to design, implement, and test the concept of memory prosthetics using large language models (LLMs). The AI-based system will intelligently help an individual keep track of vast amounts of information, accelerate productivity, and reduce errors by automatically recording their work actions and meetings, supporting retrieval based on metadata and vague descriptions, and suggesting relevant, personalized information proactively based on the user’s current focus and context.

    “Using AI Agents to Simulate Social Scenarios,” led by John Horton of the MIT Sloan School of Management and Jacob Andreas of EECS and CSAIL. This project imagines the ability to easily simulate policies, organizational arrangements, and communication tools with AI agents before implementation. Tapping into the capabilities of modern LLMs to serve as a computational model of humans makes this vision of social simulation more realistic, and potentially more predictive.

    “Human Expertise in the Age of AI: Can We Have Our Cake and Eat it Too?” led by Manish Raghavan of MIT Sloan and EECS, and Devavrat Shah of EECS and the Laboratory for Information and Decision Systems. Progress in machine learning, AI, and in algorithmic decision aids has raised the prospect that algorithms may complement human decision-making in a wide variety of settings. Rather than replacing human professionals, this project sees a future where AI and algorithmic decision aids play a role that is complementary to human expertise.

    “Implementing Generative AI in U.S. Hospitals,” led by Julie Shah of the Department of Aeronautics and Astronautics and CSAIL, Retsef Levi of MIT Sloan and the Operations Research Center, Kate Kellog of MIT Sloan, and Ben Armstrong of the Industrial Performance Center. In recent years, studies have linked a rise in burnout from doctors and nurses in the United States with increased administrative burdens associated with electronic health records and other technologies. This project aims to develop a holistic framework to study how generative AI technologies can both increase productivity for organizations and improve job quality for workers in health care settings.

    “Generative AI Augmented Software Tools to Democratize Programming,” led by Harold Abelson of EECS and CSAIL, Cynthia Breazeal of the Media Lab, and Eric Klopfer of the Comparative Media Studies/Writing. Progress in generative AI over the past year is fomenting an upheaval in assumptions about future careers in software and deprecating the role of coding. This project will stimulate a similar transformation in computing education for those who have no prior technical training by creating a software tool that could eliminate much of the need for learners to deal with code when creating applications.

    “Acquiring Expertise and Societal Productivity in a World of Artificial Intelligence,” led by David Atkin and Martin Beraja of the Department of Economics, and Danielle Li of MIT Sloan. Generative AI is thought to augment the capabilities of workers performing cognitive tasks. This project seeks to better understand how the arrival of AI technologies may impact skill acquisition and productivity, and to explore complementary policy interventions that will allow society to maximize the gains from such technologies.

    “AI Augmented Onboarding and Support,” led by Tim Kraska of EECS and CSAIL, and Christoph Paus of the Department of Physics. While LLMs have made enormous leaps forward in recent years and are poised to fundamentally change the way students and professionals learn about new tools and systems, there is often a steep learning curve which people have to climb in order to make full use of the resource. To help mitigate the issue, this project proposes the development of new LLM-powered onboarding and support systems that will positively impact the way support teams operate and improve the user experience. More

  • in

    To improve solar and other clean energy tech, look beyond hardware

    To continue reducing the costs of solar energy and other clean energy technologies, scientists and engineers will likely need to focus, at least in part, on improving technology features that are not based on hardware, according to MIT researchers. They describe this finding and the mechanisms behind it today in Nature Energy.

    While the cost of installing a solar energy system has dropped by more than 99 percent since 1980, this new analysis shows that “soft technology” features, such as the codified permitting practices, supply chain management techniques, and system design processes that go into deploying a solar energy plant, contributed only 10 to 15 percent of total cost declines. Improvements to hardware features were responsible for the lion’s share.

    But because soft technology is increasingly dominating the total costs of installing solar energy systems, this trend threatens to slow future cost savings and hamper the global transition to clean energy, says the study’s senior author, Jessika Trancik, a professor in MIT’s Institute for Data, Systems, and Society (IDSS).

    Trancik’s co-authors include lead author Magdalena M. Klemun, a former IDSS graduate student and postdoc who is now an assistant professor at the Hong Kong University of Science and Technology; Goksin Kavlak, a former IDSS graduate student and postdoc who is now an associate at the Brattle Group; and James McNerney, a former IDSS postdoc and now senior research fellow at the Harvard Kennedy School.

    The team created a quantitative model to analyze the cost evolution of solar energy systems, which captures the contributions of both hardware technology features and soft technology features.

    The framework shows that soft technology hasn’t improved much over time — and that soft technology features contributed even less to overall cost declines than previously estimated.

    Their findings indicate that to reverse this trend and accelerate cost declines, engineers could look at making solar energy systems less reliant on soft technology to begin with, or they could tackle the problem directly by improving inefficient deployment processes.  

    “Really understanding where the efficiencies and inefficiencies are, and how to address those inefficiencies, is critical in supporting the clean energy transition. We are making huge investments of public dollars into this, and soft technology is going to be absolutely essential to making those funds count,” says Trancik.

    “However,” Klemun adds, “we haven’t been thinking about soft technology design as systematically as we have for hardware. That needs to change.”

    The hard truth about soft costs

    Researchers have observed that the so-called “soft costs” of building a solar power plant — the costs of designing and installing the plant — are becoming a much larger share of total costs. In fact, the share of soft costs now typically ranges from 35 to 64 percent.

    “We wanted to take a closer look at where these soft costs were coming from and why they weren’t coming down over time as quickly as the hardware costs,” Trancik says.

    In the past, scientists have modeled the change in solar energy costs by dividing total costs into additive components — hardware components and nonhardware components — and then tracking how these components changed over time.

    “But if you really want to understand where those rates of change are coming from, you need to go one level deeper to look at the technology features. Then things split out differently,” Trancik says.

    The researchers developed a quantitative approach that models the change in solar energy costs over time by assigning contributions to the individual technology features, including both hardware features and soft technology features.

    For instance, their framework would capture how much of the decline in system installation costs — a soft cost — is due to standardized practices of certified installers — a soft technology feature. It would also capture how that same soft cost is affected by increased photovoltaic module efficiency — a hardware technology feature.

    With this approach, the researchers saw that improvements in hardware had the greatest impacts on driving down soft costs in solar energy systems. For example, the efficiency of photovoltaic modules doubled between 1980 and 2017, reducing overall system costs by 17 percent. But about 40 percent of that overall decline could be attributed to reductions in soft costs tied to improved module efficiency.

    The framework shows that, while hardware technology features tend to improve many cost components, soft technology features affect only a few.

    “You can see this structural difference even before you collect data on how the technologies have changed over time. That’s why mapping out a technology’s network of cost dependencies is a useful first step to identify levers of change, for solar PV and for other technologies as well,” Klemun notes.  

    Static soft technology

    The researchers used their model to study several countries, since soft costs can vary widely around the world. For instance, solar energy soft costs in Germany are about 50 percent less than those in the U.S.

    The fact that hardware technology improvements are often shared globally led to dramatic declines in costs over the past few decades across locations, the analysis showed. Soft technology innovations typically aren’t shared across borders. Moreover, the team found that countries with better soft technology performance 20 years ago still have better performance today, while those with worse performance didn’t see much improvement.

    This country-by-country difference could be driven by regulation and permitting processes, cultural factors, or by market dynamics such as how firms interact with each other, Trancik says.

    “But not all soft technology variables are ones that you would want to change in a cost-reducing direction, like lower wages. So, there are other considerations, beyond just bringing the cost of the technology down, that we need to think about when interpreting these results,” she says.

    Their analysis points to two strategies for reducing soft costs. For one, scientists could focus on developing hardware improvements that make soft costs more dependent on hardware technology variables and less on soft technology variables, such as by creating simpler, more standardized equipment that could reduce on-site installation time.

    Or researchers could directly target soft technology features without changing hardware, perhaps by creating more efficient workflows for system installation or automated permitting platforms.

    “In practice, engineers will often pursue both approaches, but separating the two in a formal model makes it easier to target innovation efforts by leveraging specific relationships between technology characteristics and costs,” Klemun says.

    “Often, when we think about information processing, we are leaving out processes that still happen in a very low-tech way through people communicating with one another. But it is just as important to think about that as a technology as it is to design fancy software,” Trancik notes.

    In the future, she and her collaborators want to apply their quantitative model to study the soft costs related to other technologies, such as electrical vehicle charging and nuclear fission. They are also interested in better understanding the limits of soft technology improvement, and how one could design better soft technology from the outset.

    This research is funded by the U.S. Department of Energy Solar Energy Technologies Office. More

  • in

    How machine learning models can amplify inequities in medical diagnosis and treatment

    Prior to receiving a PhD in computer science from MIT in 2017, Marzyeh Ghassemi had already begun to wonder whether the use of AI techniques might enhance the biases that already existed in health care. She was one of the early researchers to take up this issue, and she’s been exploring it ever since. In a new paper, Ghassemi, now an assistant professor in MIT’s Department of Electrical Science and Engineering (EECS), and three collaborators based at the Computer Science and Artificial Intelligence Laboratory, have probed the roots of the disparities that can arise in machine learning, often causing models that perform well overall to falter when it comes to subgroups for which relatively few data have been collected and utilized in the training process. The paper — written by two MIT PhD students, Yuzhe Yang and Haoran Zhang, EECS computer scientist Dina Katabi (the Thuan and Nicole Pham Professor), and Ghassemi — was presented last month at the 40th International Conference on Machine Learning in Honolulu, Hawaii.

    In their analysis, the researchers focused on “subpopulation shifts” — differences in the way machine learning models perform for one subgroup as compared to another. “We want the models to be fair and work equally well for all groups, but instead we consistently observe the presence of shifts among different groups that can lead to inferior medical diagnosis and treatment,” says Yang, who along with Zhang are the two lead authors on the paper. The main point of their inquiry is to determine the kinds of subpopulation shifts that can occur and to uncover the mechanisms behind them so that, ultimately, more equitable models can be developed.

    The new paper “significantly advances our understanding” of the subpopulation shift phenomenon, claims Stanford University computer scientist Sanmi Koyejo. “This research contributes valuable insights for future advancements in machine learning models’ performance on underrepresented subgroups.”

    Camels and cattle

    The MIT group has identified four principal types of shifts — spurious correlations, attribute imbalance, class imbalance, and attribute generalization — which, according to Yang, “have never been put together into a coherent and unified framework. We’ve come up with a single equation that shows you where biases can come from.”

    Biases can, in fact, stem from what the researchers call the class, or from the attribute, or both. To pick a simple example, suppose the task assigned to the machine learning model is to sort images of objects — animals in this case — into two classes: cows and camels. Attributes are descriptors that don’t specifically relate to the class itself. It might turn out, for instance, that all the images used in the analysis show cows standing on grass and camels on sand — grass and sand serving as the attributes here. Given the data available to it, the machine could reach an erroneous conclusion — namely that cows can only be found on grass, not on sand, with the opposite being true for camels. Such a finding would be incorrect, however, giving rise to a spurious correlation, which, Yang explains, is a “special case” among subpopulation shifts — “one in which you have a bias in both the class and the attribute.”

    In a medical setting, one could rely on machine learning models to determine whether a person has pneumonia or not based on an examination of X-ray images. There would be two classes in this situation, one consisting of people who have the lung ailment, another for those who are infection-free. A relatively straightforward case would involve just two attributes: the people getting X-rayed are either female or male. If, in this particular dataset, there were 100 males diagnosed with pneumonia for every one female diagnosed with pneumonia, that could lead to an attribute imbalance, and the model would likely do a better job of correctly detecting pneumonia for a man than for a woman. Similarly, having 1,000 times more healthy (pneumonia-free) subjects than sick ones would lead to a class imbalance, with the model biased toward healthy cases. Attribute generalization is the last shift highlighted in the new study. If your sample contained 100 male patients with pneumonia and zero female subjects with the same illness, you still would like the model to be able to generalize and make predictions about female subjects even though there are no samples in the training data for females with pneumonia.

    The team then took 20 advanced algorithms, designed to carry out classification tasks, and tested them on a dozen datasets to see how they performed across different population groups. They reached some unexpected conclusions: By improving the “classifier,” which is the last layer of the neural network, they were able to reduce the occurrence of spurious correlations and class imbalance, but the other shifts were unaffected. Improvements to the “encoder,” one of the uppermost layers in the neural network, could reduce the problem of attribute imbalance. “However, no matter what we did to the encoder or classifier, we did not see any improvements in terms of attribute generalization,” Yang says, “and we don’t yet know how to address that.”

    Precisely accurate

    There is also the question of assessing how well your model actually works in terms of evenhandedness among different population groups. The metric normally used, called worst-group accuracy or WGA, is based on the assumption that if you can improve the accuracy — of, say, medical diagnosis — for the group that has the worst model performance, you would have improved the model as a whole. “The WGA is considered the gold standard in subpopulation evaluation,” the authors contend, but they made a surprising discovery: boosting worst-group accuracy results in a decrease in what they call “worst-case precision.” In medical decision-making of all sorts, one needs both accuracy — which speaks to the validity of the findings — and precision, which relates to the reliability of the methodology. “Precision and accuracy are both very important metrics in classification tasks, and that is especially true in medical diagnostics,” Yang explains. “You should never trade precision for accuracy. You always need to balance the two.”

    The MIT scientists are putting their theories into practice. In a study they’re conducting with a medical center, they’re looking at public datasets for tens of thousands of patients and hundreds of thousands of chest X-rays, trying to see whether it’s possible for machine learning models to work in an unbiased manner for all populations. That’s still far from the case, even though more awareness has been drawn to this problem, Yang says. “We are finding many disparities across different ages, gender, ethnicity, and intersectional groups.”

    He and his colleagues agree on the eventual goal, which is to achieve fairness in health care among all populations. But before we can reach that point, they maintain, we still need a better understanding of the sources of unfairness and how they permeate our current system. Reforming the system as a whole will not be easy, they acknowledge. In fact, the title of the paper they introduced at the Honolulu conference, “Change is Hard,” gives some indications as to the challenges that they and like-minded researchers face. More

  • in

    The tenured engineers of 2023

    In 2023, MIT granted tenure to nine faculty members across the School of Engineering. This year’s tenured engineers hold appointments in the departments of Biological Engineering, Civil and Environmental Engineering, Electrical Engineering and Computer Science (which reports jointly to the School of Engineering and MIT Schwarzman College of Computing), Materials Science and Engineering, and Mechanical Engineering, as well as the Institute for Medical Engineering and Science (IMES).

    “I am truly inspired by this remarkable group of talented faculty members,” says Anantha Chandrakasan, dean of the School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. “The work they are doing, both in the lab and in the classroom, has made a tremendous impact at MIT and in the wider world. Their important research has applications in a diverse range of fields and industries. I am thrilled to congratulate them on the milestone of receiving tenure.”

    This year’s newly tenured engineering faculty include:

    Michael Birnbaum, Class of 1956 Career Development Professor, associate professor of biological engineering, and faculty member at the Koch Institute for Integrative Cancer Research at MIT, works on understanding and manipulating immune recognition in cancer and infections. By using a variety of techniques to study the antigen recognition of T cells, he and his team aim to develop the next generation of immunotherapies.  
    Tamara Broderick, associate professor of electrical engineering and computer science and member of the MIT Laboratory for Information and Decision Systems (LIDS) and the MIT Institute for Data, Systems, and Society (IDSS), works to provide fast and reliable quantification of uncertainty and robustness in modern data analysis procedures. Broderick and her research group develop data analysis tools with applications in fields, including genetics, economics, and assistive technology. 
    Tal Cohen, associate professor of civil and environmental engineering and mechanical engineering, uses nonlinear solid mechanics to understand how materials behave under extreme conditions. By studying material instabilities, extreme dynamic loading conditions, growth, and chemical coupling, Cohen and her team combine theoretical models and experiments to shape our understanding of the observed phenomena and apply those insights in the design and characterization of material systems. 
    Betar Gallant, Class of 1922 Career Development Professor and associate professor of mechanical engineering, develops advanced materials and chemistries for next-generation lithium-ion and lithium primary batteries and electrochemical carbon dioxide mitigation technologies. Her group’s work could lead to higher-energy and more sustainable batteries for electric vehicles, longer-lasting implantable medical devices, and new methods of carbon capture and conversion. 
    Rafael Jaramillo, Thomas Lord Career Development Professor and associate professor of materials science and engineering, studies the synthesis, properties, and applications of electronic materials, particularly chalcogenide compound semiconductors. His work has applications in microelectronics, integrated photonics, telecommunications, and photovoltaics. 
    Benedetto Marelli, associate professor of civil and environmental engineering, conducts research on the synthesis, assembly, and nanomanufacturing of structural biopolymers. He and his research team develop biomaterials for applications in agriculture, food security, and food safety. 
    Ellen Roche, Latham Family Career Development Professor, an associate professor of mechanical engineering, and a core faculty of IMES, designs and develops implantable, biomimetic therapeutic devices and soft robotics that mechanically assist and repair tissue, deliver therapies, and enable enhanced preclinical testing. Her devices have a wide range of applications in human health, including cardiovascular and respiratory disease. 
    Serguei Saavedra, associate professor of civil and environmental engineering, uses systems thinking, synthesis, and mathematical modeling to study the persistence of ecological systems under changing environments. His theoretical research is used to develop hypotheses and corroborate predictions of how ecological systems respond to climate change. 
    Justin Solomon, associate professor of electrical engineering and computer science and member of the MIT Computer Science and Artificial Intelligence Laboratory and MIT Center for Computational Science and Engineering, works at the intersection of geometry, large-scale optimization, computer graphics, and machine learning. His research has diverse applications in machine learning, computer graphics, and geometric data processing.  More

  • in

    A simpler method for learning to control a robot

    Researchers from MIT and Stanford University have devised a new machine-learning approach that could be used to control a robot, such as a drone or autonomous vehicle, more effectively and efficiently in dynamic environments where conditions can change rapidly.

    This technique could help an autonomous vehicle learn to compensate for slippery road conditions to avoid going into a skid, allow a robotic free-flyer to tow different objects in space, or enable a drone to closely follow a downhill skier despite being buffeted by strong winds.

    The researchers’ approach incorporates certain structure from control theory into the process for learning a model in such a way that leads to an effective method of controlling complex dynamics, such as those caused by impacts of wind on the trajectory of a flying vehicle. One way to think about this structure is as a hint that can help guide how to control a system.

    “The focus of our work is to learn intrinsic structure in the dynamics of the system that can be leveraged to design more effective, stabilizing controllers,” says Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS). “By jointly learning the system’s dynamics and these unique control-oriented structures from data, we’re able to naturally create controllers that function much more effectively in the real world.”

    Using this structure in a learned model, the researchers’ technique immediately extracts an effective controller from the model, as opposed to other machine-learning methods that require a controller to be derived or learned separately with additional steps. With this structure, their approach is also able to learn an effective controller using fewer data than other approaches. This could help their learning-based control system achieve better performance faster in rapidly changing environments.

    “This work tries to strike a balance between identifying structure in your system and just learning a model from data,” says lead author Spencer M. Richards, a graduate student at Stanford University. “Our approach is inspired by how roboticists use physics to derive simpler models for robots. Physical analysis of these models often yields a useful structure for the purposes of control — one that you might miss if you just tried to naively fit a model to data. Instead, we try to identify similarly useful structure from data that indicates how to implement your control logic.”

    Additional authors of the paper are Jean-Jacques Slotine, professor of mechanical engineering and of brain and cognitive sciences at MIT, and Marco Pavone, associate professor of aeronautics and astronautics at Stanford. The research will be presented at the International Conference on Machine Learning (ICML).

    Learning a controller

    Determining the best way to control a robot to accomplish a given task can be a difficult problem, even when researchers know how to model everything about the system.

    A controller is the logic that enables a drone to follow a desired trajectory, for example. This controller would tell the drone how to adjust its rotor forces to compensate for the effect of winds that can knock it off a stable path to reach its goal.

    This drone is a dynamical system — a physical system that evolves over time. In this case, its position and velocity change as it flies through the environment. If such a system is simple enough, engineers can derive a controller by hand. 

    Modeling a system by hand intrinsically captures a certain structure based on the physics of the system. For instance, if a robot were modeled manually using differential equations, these would capture the relationship between velocity, acceleration, and force. Acceleration is the rate of change in velocity over time, which is determined by the mass of and forces applied to the robot.

    But often the system is too complex to be exactly modeled by hand. Aerodynamic effects, like the way swirling wind pushes a flying vehicle, are notoriously difficult to derive manually, Richards explains. Researchers would instead take measurements of the drone’s position, velocity, and rotor speeds over time, and use machine learning to fit a model of this dynamical system to the data. But these approaches typically don’t learn a control-based structure. This structure is useful in determining how to best set the rotor speeds to direct the motion of the drone over time.

    Once they have modeled the dynamical system, many existing approaches also use data to learn a separate controller for the system.

    “Other approaches that try to learn dynamics and a controller from data as separate entities are a bit detached philosophically from the way we normally do it for simpler systems. Our approach is more reminiscent of deriving models by hand from physics and linking that to control,” Richards says.

    Identifying structure

    The team from MIT and Stanford developed a technique that uses machine learning to learn the dynamics model, but in such a way that the model has some prescribed structure that is useful for controlling the system.

    With this structure, they can extract a controller directly from the dynamics model, rather than using data to learn an entirely separate model for the controller.

    “We found that beyond learning the dynamics, it’s also essential to learn the control-oriented structure that supports effective controller design. Our approach of learning state-dependent coefficient factorizations of the dynamics has outperformed the baselines in terms of data efficiency and tracking capability, proving to be successful in efficiently and effectively controlling the system’s trajectory,” Azizan says. 

    When they tested this approach, their controller closely followed desired trajectories, outpacing all the baseline methods. The controller extracted from their learned model nearly matched the performance of a ground-truth controller, which is built using the exact dynamics of the system.

    “By making simpler assumptions, we got something that actually worked better than other complicated baseline approaches,” Richards adds.

    The researchers also found that their method was data-efficient, which means it achieved high performance even with few data. For instance, it could effectively model a highly dynamic rotor-driven vehicle using only 100 data points. Methods that used multiple learned components saw their performance drop much faster with smaller datasets.

    This efficiency could make their technique especially useful in situations where a drone or robot needs to learn quickly in rapidly changing conditions.

    Plus, their approach is general and could be applied to many types of dynamical systems, from robotic arms to free-flying spacecraft operating in low-gravity environments.

    In the future, the researchers are interested in developing models that are more physically interpretable, and that would be able to identify very specific information about a dynamical system, Richards says. This could lead to better-performing controllers.

    “Despite its ubiquity and importance, nonlinear feedback control remains an art, making it especially suitable for data-driven and learning-based methods. This paper makes a significant contribution to this area by proposing a method that jointly learns system dynamics, a controller, and control-oriented structure,” says Nikolai Matni, an assistant professor in the Department of Electrical and Systems Engineering at the University of Pennsylvania, who was not involved with this work. “What I found particularly exciting and compelling was the integration of these components into a joint learning algorithm, such that control-oriented structure acts as an inductive bias in the learning process. The result is a data-efficient learning process that outputs dynamic models that enjoy intrinsic structure that enables effective, stable, and robust control. While the technical contributions of the paper are excellent themselves, it is this conceptual contribution that I view as most exciting and significant.”

    This research is supported, in part, by the NASA University Leadership Initiative and the Natural Sciences and Engineering Research Council of Canada. More

  • in

    A faster way to teach a robot

    Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

    “Right now, the way we train these robots, when they fail, we don’t really know why. So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that is missing from this system is enabling the robot to demonstrate why it is failing so the user can give it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

    Peng and her collaborators at MIT, New York University, and the University of California at Berkeley created a framework that enables humans to quickly teach a robot what they want it to do, with a minimal amount of effort.

    When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

    Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it can perform a second, similar task.

    The researchers tested this technique in simulations and found that it could teach a robot more efficiently than other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

    This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

    Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

    On-the-job training

    Robots often fail due to distribution shift — the robot is presented with objects and spaces it did not see during training, and it doesn’t understand what to do in this new environment.

    One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

    Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

    “I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color,” Peng says.

    To accomplish this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

    The framework has three steps. First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

    The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts do not impact the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

    In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

    Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

    From human reasoning to robot reasoning

    Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

    “It was so clear right off the bat. Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” she says.

    Then they applied their framework to three simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

    Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes the system to create new data using generative machine-learning models.

    “We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level,” Peng says.

    This research is supported, in part, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. More