More stories

  • in

    AI copilot enhances human precision for safer aviation

    Imagine you’re in an airplane with two pilots, one human and one computer. Both have their “hands” on the controllers, but they’re always looking out for different things. If they’re both paying attention to the same thing, the human gets to steer. But if the human gets distracted or misses something, the computer quickly takes over.

    Meet the Air-Guardian, a system developed by researchers at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). As modern pilots grapple with an onslaught of information from multiple monitors, especially during critical moments, Air-Guardian acts as a proactive copilot; a partnership between human and machine, rooted in understanding attention.

    But how does it determine attention, exactly? For humans, it uses eye-tracking, and for the neural system, it relies on something called “saliency maps,” which pinpoint where attention is directed. The maps serve as visual guides highlighting key regions within an image, aiding in grasping and deciphering the behavior of intricate algorithms. Air-Guardian identifies early signs of potential risks through these attention markers, instead of only intervening during safety breaches like traditional autopilot systems. 

    The broader implications of this system reach beyond aviation. Similar cooperative control mechanisms could one day be used in cars, drones, and a wider spectrum of robotics.

    “An exciting feature of our method is its differentiability,” says MIT CSAIL postdoc Lianhao Yin, a lead author on a new paper about Air-Guardian. “Our cooperative layer and the entire end-to-end process can be trained. We specifically chose the causal continuous-depth neural network model because of its dynamic features in mapping attention. Another unique aspect is adaptability. The Air-Guardian system isn’t rigid; it can be adjusted based on the situation’s demands, ensuring a balanced partnership between human and machine.”

    In field tests, both the pilot and the system made decisions based on the same raw images when navigating to the target waypoint. Air-Guardian’s success was gauged based on the cumulative rewards earned during flight and shorter path to the waypoint. The guardian reduced the risk level of flights and increased the success rate of navigating to target points. 

    “This system represents the innovative approach of human-centric AI-enabled aviation,” adds Ramin Hasani, MIT CSAIL research affiliate and inventor of liquid neural networks. “Our use of liquid neural networks provides a dynamic, adaptive approach, ensuring that the AI doesn’t merely replace human judgment but complements it, leading to enhanced safety and collaboration in the skies.”

    The true strength of Air-Guardian is its foundational technology. Using an optimization-based cooperative layer using visual attention from humans and machine, and liquid closed-form continuous-time neural networks (CfC) known for its prowess in deciphering cause-and-effect relationships, it analyzes incoming images for vital information. Complementing this is the VisualBackProp algorithm, which identifies the system’s focal points within an image, ensuring clear understanding of its attention maps. 

    For future mass adoption, there’s a need to refine the human-machine interface. Feedback suggests an indicator, like a bar, might be more intuitive to signify when the guardian system takes control.

    Air-Guardian heralds a new age of safer skies, offering a reliable safety net for those moments when human attention wavers.

    “The Air-Guardian system highlights the synergy between human expertise and machine learning, furthering the objective of using machine learning to augment pilots in challenging scenarios and reduce operational errors,” says Daniela Rus, the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT, director of CSAIL, and senior author on the paper.”One of the most interesting outcomes of using a visual attention metric in this work is the potential for allowing earlier interventions and greater interpretability by human pilots,” says Stephanie Gil, assistant professor of computer science at Harvard University, who was not involved in the work. “This showcases a great example of how AI can be used to work with a human, lowering the barrier for achieving trust by using natural communication mechanisms between the human and the AI system.”

    This research was partially funded by the U.S. Air Force (USAF) Research Laboratory, the USAF Artificial Intelligence Accelerator, the Boeing Co., and the Office of Naval Research. The findings don’t necessarily reflect the views of the U.S. government or the USAF. More

  • in

    Meet the 2023-24 Accenture Fellows

    The MIT and Accenture Convergence Initiative for Industry and Technology has selected five new research fellows for 2023-24. Now in its third year, the initiative underscores the ways in which industry and research can collaborate to spur technological innovation.

    Through its partnership with the School of Engineering, Accenture provides five annual fellowships awarded to graduate students with the aim of generating powerful new insights on the convergence of business and technology with the potential to transform society. The 2023-24 fellows will conduct research in areas including artificial intelligence, sustainability, and robotics.

    The 2023-24 Accenture Fellows are:

    Yiyue Luo

    Yiyue Luo is a PhD candidate who is developing innovative integrations of tactile sensing and haptics, interactive sensing and AI, digital fabrication, and smart wearables. Her work takes advantage of recent advances in digital manufacturing and AI, and the convergence in advanced sensing and actuation mechanisms, scalable digital manufacturing, and emerging computational techniques, with the goal of creating novel sensing and actuation devices that revolutionize interactions between people and their environments. In past projects, Luo has developed tactile sensing apparel including socks, gloves, and vests, as well as a workflow for computationally designing and digitally fabricating soft textiles-based pneumatic actuators. With the support of an Accenture Fellowship, she will advance her work of combining sensing and actuating devices and explore the development of haptic devices that simulate tactile cues captured by tactile sensors. Her ultimate aim is to build a scalable, textile-based, closed-loop human-machine interface. Luo’s research holds exciting potential to advance ground-breaking applications for smart textiles, health care, artificial and virtual reality, human-machine interactions, and robotics.

    Zanele Munyikwa is a PhD candidate whose research explores foundation models, a class of models that forms the basis of transformative general-purpose technologies (GPTs) such as GPT4. An Accenture Fellowship will enable Munyikwa to conduct research aimed at illuminating the current and potential impact of foundation models (including large language models) on work and tasks common to “high-skilled” knowledge workers in industries such as marketing, legal services, and medicine, in which foundation models are expected to have significant economic and social impacts. A primary goal of her project is to observe the impact of AI augmentation on tasks like copywriting and long-form writing. A second aim is to explore two primary ways that foundation models are driving the convergence of creative and technological industries, namely: reducing the cost of content generation and enabling the development of tools and platforms for education and training. Munyikwa’s work has important implications for the use of foundation models in many fields, from health care and education to legal services, business, and technology.

    Michelle Vaccaro is a PhD candidate in social engineering systems whose research explores human-AI collaboration with the goals of developing a deeper understanding of AI-based technologies (including ChatGPT and DALL-E), evaluating their performance and evolution, and steering their development toward societally beneficial applications, like climate change mitigation. An Accenture Fellowship will support Vaccaro’s current work toward two key objectives: identifying synergies between humans and AI-based software to help design human-AI systems that address persistent problems better than existing approaches; and investigating applications of human-AI collaboration for forecasting technological change, specifically for renewable energy technologies. By integrating the historically distinct domains of AI, systems engineering, and cognitive science with a wide range of industries, technical fields, and social applications, Vaccaro’s work has the potential to advance individual and collective productivity and creativity in all these areas.

    Chonghuan Wang is a PhD candidate in computational science and engineering whose research employs statistical learning, econometrics theory, and experimental design to create efficient, reliable, and sustainable field experiments in various domains. In his current work, Wang is applying statistical learning techniques such as online learning and bandit theory to test the effectiveness of new treatments, vaccinations, and health care interventions. With the support of an Accenture Fellowship, he will design experiments with the specific aim of understanding the trade-off between the loss of a patient’s welfare and the accuracy of estimating the treatment effect. The results of this research could help to save lives and contain disease outbreaks during pandemics like Covid-19. The benefits of enhanced experiment design and the collection of high-quality data extend well beyond health care; for example, these tools could help businesses optimize user engagement, test pricing impacts, and increase the usage of platforms and services. Wang’s research holds exciting potential to harness statistical learning, econometrics theory, and experimental design in support of strong businesses and the greater social good.

    Aaron Michael West Jr. is a PhD candidate whose research seeks to enhance our knowledge of human motor control and robotics. His work aims to advance rehabilitation technologies and prosthetic devices, as well as improve robot dexterity. His previous work has yielded valuable insights into the human ability to extract information solely from visual displays. Specifically, he demonstrated humans’ ability to estimate stiffness based solely on the visual observation of motion. These insights could advance the development of software applications with the same capability (e.g., using machine learning methods applied to video data) and may enable roboticists to develop enhanced motion control such that a robot’s intention is perceivable by humans. An Accenture Fellowship will enable West to continue this work, as well as new investigations into the functionality of the human hand to aid in the design of a prosthetic hand that better replicates human dexterity. By advancing understandings of human bio- and neuro-mechanics, West’s work has the potential to support major advances in robotics and rehabilitation technologies, with profound impacts on human health and well-being. More

  • in

    Artificial intelligence for augmentation and productivity

    The MIT Stephen A. Schwarzman College of Computing has awarded seed grants to seven projects that are exploring how artificial intelligence and human-computer interaction can be leveraged to enhance modern work spaces to achieve better management and higher productivity.

    Funded by Andrew W. Houston ’05 and Dropbox Inc., the projects are intended to be interdisciplinary and bring together researchers from computing, social sciences, and management.

    The seed grants can enable the project teams to conduct research that leads to bigger endeavors in this rapidly evolving area, as well as build community around questions related to AI-augmented management.

    The seven selected projects and research leads include:

    “LLMex: Implementing Vannevar Bush’s Vision of the Memex Using Large Language Models,” led by Patti Maes of the Media Lab and David Karger of the Department of Electrical Engineering and Computer Science (EECS) and the Computer Science and Artificial Intelligence Laboratory (CSAIL). Inspired by Vannevar Bush’s Memex, this project proposes to design, implement, and test the concept of memory prosthetics using large language models (LLMs). The AI-based system will intelligently help an individual keep track of vast amounts of information, accelerate productivity, and reduce errors by automatically recording their work actions and meetings, supporting retrieval based on metadata and vague descriptions, and suggesting relevant, personalized information proactively based on the user’s current focus and context.

    “Using AI Agents to Simulate Social Scenarios,” led by John Horton of the MIT Sloan School of Management and Jacob Andreas of EECS and CSAIL. This project imagines the ability to easily simulate policies, organizational arrangements, and communication tools with AI agents before implementation. Tapping into the capabilities of modern LLMs to serve as a computational model of humans makes this vision of social simulation more realistic, and potentially more predictive.

    “Human Expertise in the Age of AI: Can We Have Our Cake and Eat it Too?” led by Manish Raghavan of MIT Sloan and EECS, and Devavrat Shah of EECS and the Laboratory for Information and Decision Systems. Progress in machine learning, AI, and in algorithmic decision aids has raised the prospect that algorithms may complement human decision-making in a wide variety of settings. Rather than replacing human professionals, this project sees a future where AI and algorithmic decision aids play a role that is complementary to human expertise.

    “Implementing Generative AI in U.S. Hospitals,” led by Julie Shah of the Department of Aeronautics and Astronautics and CSAIL, Retsef Levi of MIT Sloan and the Operations Research Center, Kate Kellog of MIT Sloan, and Ben Armstrong of the Industrial Performance Center. In recent years, studies have linked a rise in burnout from doctors and nurses in the United States with increased administrative burdens associated with electronic health records and other technologies. This project aims to develop a holistic framework to study how generative AI technologies can both increase productivity for organizations and improve job quality for workers in health care settings.

    “Generative AI Augmented Software Tools to Democratize Programming,” led by Harold Abelson of EECS and CSAIL, Cynthia Breazeal of the Media Lab, and Eric Klopfer of the Comparative Media Studies/Writing. Progress in generative AI over the past year is fomenting an upheaval in assumptions about future careers in software and deprecating the role of coding. This project will stimulate a similar transformation in computing education for those who have no prior technical training by creating a software tool that could eliminate much of the need for learners to deal with code when creating applications.

    “Acquiring Expertise and Societal Productivity in a World of Artificial Intelligence,” led by David Atkin and Martin Beraja of the Department of Economics, and Danielle Li of MIT Sloan. Generative AI is thought to augment the capabilities of workers performing cognitive tasks. This project seeks to better understand how the arrival of AI technologies may impact skill acquisition and productivity, and to explore complementary policy interventions that will allow society to maximize the gains from such technologies.

    “AI Augmented Onboarding and Support,” led by Tim Kraska of EECS and CSAIL, and Christoph Paus of the Department of Physics. While LLMs have made enormous leaps forward in recent years and are poised to fundamentally change the way students and professionals learn about new tools and systems, there is often a steep learning curve which people have to climb in order to make full use of the resource. To help mitigate the issue, this project proposes the development of new LLM-powered onboarding and support systems that will positively impact the way support teams operate and improve the user experience. More

  • in

    How machine learning models can amplify inequities in medical diagnosis and treatment

    Prior to receiving a PhD in computer science from MIT in 2017, Marzyeh Ghassemi had already begun to wonder whether the use of AI techniques might enhance the biases that already existed in health care. She was one of the early researchers to take up this issue, and she’s been exploring it ever since. In a new paper, Ghassemi, now an assistant professor in MIT’s Department of Electrical Science and Engineering (EECS), and three collaborators based at the Computer Science and Artificial Intelligence Laboratory, have probed the roots of the disparities that can arise in machine learning, often causing models that perform well overall to falter when it comes to subgroups for which relatively few data have been collected and utilized in the training process. The paper — written by two MIT PhD students, Yuzhe Yang and Haoran Zhang, EECS computer scientist Dina Katabi (the Thuan and Nicole Pham Professor), and Ghassemi — was presented last month at the 40th International Conference on Machine Learning in Honolulu, Hawaii.

    In their analysis, the researchers focused on “subpopulation shifts” — differences in the way machine learning models perform for one subgroup as compared to another. “We want the models to be fair and work equally well for all groups, but instead we consistently observe the presence of shifts among different groups that can lead to inferior medical diagnosis and treatment,” says Yang, who along with Zhang are the two lead authors on the paper. The main point of their inquiry is to determine the kinds of subpopulation shifts that can occur and to uncover the mechanisms behind them so that, ultimately, more equitable models can be developed.

    The new paper “significantly advances our understanding” of the subpopulation shift phenomenon, claims Stanford University computer scientist Sanmi Koyejo. “This research contributes valuable insights for future advancements in machine learning models’ performance on underrepresented subgroups.”

    Camels and cattle

    The MIT group has identified four principal types of shifts — spurious correlations, attribute imbalance, class imbalance, and attribute generalization — which, according to Yang, “have never been put together into a coherent and unified framework. We’ve come up with a single equation that shows you where biases can come from.”

    Biases can, in fact, stem from what the researchers call the class, or from the attribute, or both. To pick a simple example, suppose the task assigned to the machine learning model is to sort images of objects — animals in this case — into two classes: cows and camels. Attributes are descriptors that don’t specifically relate to the class itself. It might turn out, for instance, that all the images used in the analysis show cows standing on grass and camels on sand — grass and sand serving as the attributes here. Given the data available to it, the machine could reach an erroneous conclusion — namely that cows can only be found on grass, not on sand, with the opposite being true for camels. Such a finding would be incorrect, however, giving rise to a spurious correlation, which, Yang explains, is a “special case” among subpopulation shifts — “one in which you have a bias in both the class and the attribute.”

    In a medical setting, one could rely on machine learning models to determine whether a person has pneumonia or not based on an examination of X-ray images. There would be two classes in this situation, one consisting of people who have the lung ailment, another for those who are infection-free. A relatively straightforward case would involve just two attributes: the people getting X-rayed are either female or male. If, in this particular dataset, there were 100 males diagnosed with pneumonia for every one female diagnosed with pneumonia, that could lead to an attribute imbalance, and the model would likely do a better job of correctly detecting pneumonia for a man than for a woman. Similarly, having 1,000 times more healthy (pneumonia-free) subjects than sick ones would lead to a class imbalance, with the model biased toward healthy cases. Attribute generalization is the last shift highlighted in the new study. If your sample contained 100 male patients with pneumonia and zero female subjects with the same illness, you still would like the model to be able to generalize and make predictions about female subjects even though there are no samples in the training data for females with pneumonia.

    The team then took 20 advanced algorithms, designed to carry out classification tasks, and tested them on a dozen datasets to see how they performed across different population groups. They reached some unexpected conclusions: By improving the “classifier,” which is the last layer of the neural network, they were able to reduce the occurrence of spurious correlations and class imbalance, but the other shifts were unaffected. Improvements to the “encoder,” one of the uppermost layers in the neural network, could reduce the problem of attribute imbalance. “However, no matter what we did to the encoder or classifier, we did not see any improvements in terms of attribute generalization,” Yang says, “and we don’t yet know how to address that.”

    Precisely accurate

    There is also the question of assessing how well your model actually works in terms of evenhandedness among different population groups. The metric normally used, called worst-group accuracy or WGA, is based on the assumption that if you can improve the accuracy — of, say, medical diagnosis — for the group that has the worst model performance, you would have improved the model as a whole. “The WGA is considered the gold standard in subpopulation evaluation,” the authors contend, but they made a surprising discovery: boosting worst-group accuracy results in a decrease in what they call “worst-case precision.” In medical decision-making of all sorts, one needs both accuracy — which speaks to the validity of the findings — and precision, which relates to the reliability of the methodology. “Precision and accuracy are both very important metrics in classification tasks, and that is especially true in medical diagnostics,” Yang explains. “You should never trade precision for accuracy. You always need to balance the two.”

    The MIT scientists are putting their theories into practice. In a study they’re conducting with a medical center, they’re looking at public datasets for tens of thousands of patients and hundreds of thousands of chest X-rays, trying to see whether it’s possible for machine learning models to work in an unbiased manner for all populations. That’s still far from the case, even though more awareness has been drawn to this problem, Yang says. “We are finding many disparities across different ages, gender, ethnicity, and intersectional groups.”

    He and his colleagues agree on the eventual goal, which is to achieve fairness in health care among all populations. But before we can reach that point, they maintain, we still need a better understanding of the sources of unfairness and how they permeate our current system. Reforming the system as a whole will not be easy, they acknowledge. In fact, the title of the paper they introduced at the Honolulu conference, “Change is Hard,” gives some indications as to the challenges that they and like-minded researchers face. More

  • in

    A faster way to teach a robot

    Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

    “Right now, the way we train these robots, when they fail, we don’t really know why. So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that is missing from this system is enabling the robot to demonstrate why it is failing so the user can give it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

    Peng and her collaborators at MIT, New York University, and the University of California at Berkeley created a framework that enables humans to quickly teach a robot what they want it to do, with a minimal amount of effort.

    When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

    Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it can perform a second, similar task.

    The researchers tested this technique in simulations and found that it could teach a robot more efficiently than other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

    This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

    Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

    On-the-job training

    Robots often fail due to distribution shift — the robot is presented with objects and spaces it did not see during training, and it doesn’t understand what to do in this new environment.

    One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

    Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

    “I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color,” Peng says.

    To accomplish this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

    The framework has three steps. First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

    The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts do not impact the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

    In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

    Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

    From human reasoning to robot reasoning

    Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

    “It was so clear right off the bat. Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” she says.

    Then they applied their framework to three simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

    Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes the system to create new data using generative machine-learning models.

    “We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level,” Peng says.

    This research is supported, in part, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. More

  • in

    Day of AI curriculum meets the moment

    MIT Responsible AI for Social Empowerment and Education (RAISE) recently celebrated the second annual Day of AI with two flagship local events. The Edward M. Kennedy Institute for the U.S. Senate in Boston hosted a human rights and data policy-focused event that was streamed worldwide. Dearborn STEM Academy in Roxbury, Massachusetts, hosted a student workshop in collaboration with Amazon Future Engineer. With over 8,000 registrations across all 50 U.S. states and 108 countries in 2023, participation in Day of AI has more than doubled since its inaugural year.

    Day of AI is a free curriculum of lessons and hands-on activities designed to teach kids of all ages and backgrounds the basics and responsible use of artificial intelligence, designed by researchers at MIT RAISE. This year, resources were available for educators to run at any time and in any increments they chose. The curriculum included five new modules to address timely topics like ChatGPT in School, Teachable Machines, AI and Social Media, Data Science and Me, and more. A collaboration with the International Society for Technology in Education also introduced modules for early elementary students. Educators across the world shared photos, videos, and stories of their students’ engagement, expressing excitement and even relief over the accessible lessons.

    Professor Cynthia Breazeal, director of RAISE, dean for digital learning at MIT, and head of the MIT Media Lab’s Personal Robots research group, said, “It’s been a year of extraordinary advancements in AI, and with that comes necessary conversations and concerns about who and what this technology is for. With our Day of AI events, we want to celebrate the teachers and students who are putting in the work to make sure that AI is for everyone.”

    Reflecting community values and protecting digital citizens

    Play video

    On May 18, 2023, MIT RAISE hosted a global Day of AI celebration featuring a flagship local event focused on human rights and data policy at the Edward M. Kennedy Institute for the U.S. Senate. Students from the Warren Prescott Middle School and New Mission High School heard from speakers the City of Boston, Liberty Mutual, and MIT to discuss the many benefits and challenges of artificial intelligence education. Video: MIT Open Learning

    MIT President Sally Kornbluth welcomed students from Warren Prescott Middle School and New Mission High School to the Day of AI program at the Edward M. Kennedy Institute. Kornbluth reflected on the exciting potential of AI, along with the ethical considerations society needs to be responsible for.

    “AI has the potential to do all kinds of fantastic things, including driving a car, helping us with the climate crisis, improving health care, and designing apps that we can’t even imagine yet. But what we have to make sure it doesn’t do is cause harm to individuals, to communities, to us — society as a whole,” she said.

    This theme resonated with each of the event speakers, whose jobs spanned the sectors of education, government, and business. Yo Deshpande, technologist for the public realm, and Michael Lawrence Evans, program director of new urban mechanics from the Boston Mayor’s Office, shared how Boston thinks about using AI to improve city life in ways that are “equitable, accessible, and delightful.” Deshpande said, “We have the opportunity to explore not only how AI works, but how using AI can line up with our values, the way we want to be in the world, and the way we want to be in our community.”

    Adam L’Italien, chief innovation officer at Liberty Mutual Insurance (one of Day of AI’s founding sponsors), compared our present moment with AI technologies to the early days of personal computers and internet connection. “Exposure to emerging technologies can accelerate progress in the world and in your own lives,” L’Italien said, while recognizing that the AI development process needs to be inclusive and mitigate biases.

    Human policies for artificial intelligence

    So how does society address these human rights concerns about AI? Marc Aidinoff ’21, former White House Office of Science and Technology Policy chief of staff, led a discussion on how government policy can influence the parameters of how technology is developed and used, like the Blueprint for an AI Bill of Rights. Aidinoff said, “The work of building the world you want to see is far harder than building the technical AI system … How do you work with other people and create a collective vision for what we want to do?” Warren Prescott Middle School students described how AI could be used to solve problems that humans couldn’t. But they also shared their concerns that AI could affect data privacy, learning deficits, social media addiction, job displacement, and propaganda.

    In a mock U.S. Senate trial activity designed by Daniella DiPaola, PhD student at the MIT Media Lab, the middle schoolers investigated what rights might be undermined by AI in schools, hospitals, law enforcement, and corporations. Meanwhile, New Mission High School students workshopped the ideas behind bill S.2314, the Social Media Addiction Reduction Technology (SMART) Act, in an activity designed by Raechel Walker, graduate research assistant in the Personal Robots Group, and Matt Taylor, research assistant at the Media Lab. They discussed what level of control could or should be introduced at the parental, educational, and governmental levels to reduce the risks of internet addiction.

    “Alexa, how do I program AI?”

    Play video

    The 2023 Day of AI celebration featured a flagship local event at the Dearborn STEM Academy in Roxbury in collaboration with Amazon Future Engineer. Students participated in a hands-on activity using MIT App Inventor as part of Day of AI’s Alexa lesson. Video: MIT Open Learning

    At Dearborn STEM Academy, Amazon Future Engineer helped students work through the Intro to Voice AI curriculum module in real-time. Students used MIT App Inventor to code basic commands for Alexa. In an interview with WCVB, Principal Darlene Marcano said, “It’s important that we expose our students to as many different experiences as possible. The students that are participating are on track to be future computer scientists and engineers.”

    Breazeal told Dearborn students, “We want you to have an informed voice about how you want AI to be used in society. We want you to feel empowered that you can shape the world. You can make things with AI to help make a better world and a better community.”

    Rohit Prasad ’08, senior vice president and head scientist for Alexa at Amazon, and Victor Reinoso ’97, global director of philanthropic education initiatives at Amazon, also joined the event. “Amazon and MIT share a commitment to helping students discover a world of possibilities through STEM and AI education,” said Reinoso. “There’s a lot of current excitement around the technological revolution with generative AI and large language models, so we’re excited to help students explore careers of the future and navigate the pathways available to them.” To highlight their continued investment in the local community and the school program, Amazon donated a $25,000 Innovation and Early College Pathways Program Grant to the Boston Public School system.

    Day of AI down under

    Not only was the Day of AI program widely adopted across the globe, Australian educators were inspired to adapt their own regionally specific curriculum. An estimated 161,000 AI professionals will be needed in Australia by 2030, according to the National Artificial Intelligence Center in the Commonwealth Scientific and Industrial Research Organization (CSIRO), an Australian government agency and Day of AI Australia project partner. CSIRO worked with the University of New South Wales to develop supplementary educational resources on AI ethics and machine learning. Day of AI Australia reached 85,000 students at 400-plus secondary schools this year, sparking curiosity in the next generation of AI experts.

    The interest in AI is accelerating as fast as the technology is being developed. Day of AI offers a unique opportunity for K-12 students to shape our world’s digital future and their own.

    “I hope that some of you will decide to be part of this bigger effort to help us figure out the best possible answers to questions that are raised by AI,” Kornbluth told students at the Edward M. Kennedy Institute. “We’re counting on you, the next generation, to learn how AI works and help make sure it’s for everyone.” More

  • in

    Exploring new methods for increasing safety and reliability of autonomous vehicles

    When we think of getting on the road in our cars, our first thoughts may not be that fellow drivers are particularly safe or careful — but human drivers are more reliable than one may expect. For each fatal car crash in the United States, motor vehicles log a whopping hundred million miles on the road.

    Human reliability also plays a role in how autonomous vehicles are integrated in the traffic system, especially around safety considerations. Human drivers continue to surpass autonomous vehicles in their ability to make quick decisions and perceive complex environments: Autonomous vehicles are known to struggle with seemingly common tasks, such as taking on- or off-ramps, or turning left in the face of oncoming traffic. Despite these enormous challenges, embracing autonomous vehicles in the future could yield great benefits, like clearing congested highways; enhancing freedom and mobility for non-drivers; and boosting driving efficiency, an important piece in fighting climate change.

    MIT engineer Cathy Wu envisions ways that autonomous vehicles could be deployed with their current shortcomings, without experiencing a dip in safety. “I started thinking more about the bottlenecks. It’s very clear that the main barrier to deployment of autonomous vehicles is safety and reliability,” Wu says.

    One path forward may be to introduce a hybrid system, in which autonomous vehicles handle easier scenarios on their own, like cruising on the highway, while transferring more complicated maneuvers to remote human operators. Wu, who is a member of the Laboratory for Information and Decision Systems (LIDS), a Gilbert W. Winslow Assistant Professor of Civil and Environmental Engineering (CEE) and a member of the MIT Institute for Data, Systems, and Society (IDSS), likens this approach to air traffic controllers on the ground directing commercial aircraft.

    In a paper published April 12 in IEEE Transactions on Robotics, Wu and co-authors Cameron Hickert and Sirui Li (both graduate students at LIDS) introduced a framework for how remote human supervision could be scaled to make a hybrid system efficient without compromising passenger safety. They noted that if autonomous vehicles were able to coordinate with each other on the road, they could reduce the number of moments in which humans needed to intervene.

    Humans and cars: finding a balance that’s just right

    For the project, Wu, Hickert, and Li sought to tackle a maneuver that autonomous vehicles often struggle to complete. They decided to focus on merging, specifically when vehicles use an on-ramp to enter a highway. In real life, merging cars must accelerate or slow down in order to avoid crashing into cars already on the road. In this scenario, if an autonomous vehicle was about to merge into traffic, remote human supervisors could momentarily take control of the vehicle to ensure a safe merge. In order to evaluate the efficiency of such a system, particularly while guaranteeing safety, the team specified the maximum amount of time each human supervisor would be expected to spend on a single merge. They were interested in understanding whether a small number of remote human supervisors could successfully manage a larger group of autonomous vehicles, and the extent to which this human-to-car ratio could be improved while still safely covering every merge.

    With more autonomous vehicles in use, one might assume a need for more remote supervisors. But in scenarios where autonomous vehicles coordinated with each other, the team found that cars could significantly reduce the number of times humans needed to step in. For example, a coordinating autonomous vehicle already on a highway could adjust its speed to make room for a merging car, eliminating a risky merging situation altogether.

    The team substantiated the potential to safely scale remote supervision in two theorems. First, using a mathematical framework known as queuing theory, the researchers formulated an expression to capture the probability of a given number of supervisors failing to handle all merges pooled together from multiple cars. This way, the researchers were able to assess how many remote supervisors would be needed in order to cover every potential merge conflict, depending on the number of autonomous vehicles in use. The researchers derived a second theorem to quantify the influence of cooperative autonomous vehicles on surrounding traffic for boosting reliability, to assist cars attempting to merge.

    When the team modeled a scenario in which 30 percent of cars on the road were cooperative autonomous vehicles, they estimated that a ratio of one human supervisor to every 47 autonomous vehicles could cover 99.9999 percent of merging cases. But this level of coverage drops below 99 percent, an unacceptable range, in scenarios where autonomous vehicles did not cooperate with each other.

    “If vehicles were to coordinate and basically prevent the need for supervision, that’s actually the best way to improve reliability,” Wu says.

    Cruising toward the future

    The team decided to focus on merging not only because it’s a challenge for autonomous vehicles, but also because it’s a well-defined task associated with a less-daunting scenario: driving on the highway. About half of the total miles traveled in the United States occur on interstates and other freeways. Since highways allow higher speeds than city roads, Wu says, “If you can fully automate highway driving … you give people back about a third of their driving time.”

    If it became feasible for autonomous vehicles to cruise unsupervised for most highway driving, the challenge of safely navigating complex or unexpected moments would remain. For instance, “you [would] need to be able to handle the start and end of the highway driving,” Wu says. You would also need to be able to manage times when passengers zone out or fall asleep, making them unable to quickly take over controls should it be needed. But if remote human supervisors could guide autonomous vehicles at key moments, passengers may never have to touch the wheel. Besides merging, other challenging situations on the highway include changing lanes and overtaking slower cars on the road.

    Although remote supervision and coordinated autonomous vehicles are hypotheticals for high-speed operations, and not currently in use, Wu hopes that thinking about these topics can encourage growth in the field.

    “This gives us some more confidence that the autonomous driving experience can happen,” Wu says. “I think we need to be more creative about what we mean by ‘autonomous vehicles.’ We want to give people back their time — safely. We want the benefits, we don’t strictly want something that drives autonomously.” More

  • in

    Study: AI models fail to reproduce human judgements about rule violations

    In an effort to improve fairness or reduce backlogs, machine-learning models are sometimes designed to mimic human decision making, such as deciding whether social media posts violate toxic content policies.

    But researchers from MIT and elsewhere have found that these models often do not replicate human decisions about rule violations. If models are not trained with the right data, they are likely to make different, often harsher judgements than humans would.

    In this case, the “right” data are those that have been labeled by humans who were explicitly asked whether items defy a certain rule. Training involves showing a machine-learning model millions of examples of this “normative data” so it can learn a task.

    But data used to train machine-learning models are typically labeled descriptively — meaning humans are asked to identify factual features, such as, say, the presence of fried food in a photo. If “descriptive data” are used to train models that judge rule violations, such as whether a meal violates a school policy that prohibits fried food, the models tend to over-predict rule violations.

    This drop in accuracy could have serious implications in the real world. For instance, if a descriptive model is used to make decisions about whether an individual is likely to reoffend, the researchers’ findings suggest it may cast stricter judgements than a human would, which could lead to higher bail amounts or longer criminal sentences.

    “I think most artificial intelligence/machine-learning researchers assume that the human judgements in data and labels are biased, but this result is saying something worse. These models are not even reproducing already-biased human judgments because the data they’re being trained on has a flaw: Humans would label the features of images and text differently if they knew those features would be used for a judgment. This has huge ramifications for machine learning systems in human processes,” says Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Ghassemi is senior author of a new paper detailing these findings, which was published today in Science Advances. Joining her on the paper are lead author Aparna Balagopalan, an electrical engineering and computer science graduate student; David Madras, a graduate student at the University of Toronto; David H. Yang, a former graduate student who is now co-founder of ML Estimation; Dylan Hadfield-Menell, an MIT assistant professor; and Gillian K. Hadfield, Schwartz Reisman Chair in Technology and Society and professor of law at the University of Toronto.

    Labeling discrepancy

    This study grew out of a different project that explored how a machine-learning model can justify its predictions. As they gathered data for that study, the researchers noticed that humans sometimes give different answers if they are asked to provide descriptive or normative labels about the same data.

    To gather descriptive labels, researchers ask labelers to identify factual features — does this text contain obscene language? To gather normative labels, researchers give labelers a rule and ask if the data violates that rule — does this text violate the platform’s explicit language policy?

    Surprised by this finding, the researchers launched a user study to dig deeper. They gathered four datasets to mimic different policies, such as a dataset of dog images that could be in violation of an apartment’s rule against aggressive breeds. Then they asked groups of participants to provide descriptive or normative labels.

    In each case, the descriptive labelers were asked to indicate whether three factual features were present in the image or text, such as whether the dog appears aggressive. Their responses were then used to craft judgements. (If a user said a photo contained an aggressive dog, then the policy was violated.) The labelers did not know the pet policy. On the other hand, normative labelers were given the policy prohibiting aggressive dogs, and then asked whether it had been violated by each image, and why.

    The researchers found that humans were significantly more likely to label an object as a violation in the descriptive setting. The disparity, which they computed using the absolute difference in labels on average, ranged from 8 percent on a dataset of images used to judge dress code violations to 20 percent for the dog images.

    “While we didn’t explicitly test why this happens, one hypothesis is that maybe how people think about rule violations is different from how they think about descriptive data. Generally, normative decisions are more lenient,” Balagopalan says.

    Yet data are usually gathered with descriptive labels to train a model for a particular machine-learning task. These data are often repurposed later to train different models that perform normative judgements, like rule violations.

    Training troubles

    To study the potential impacts of repurposing descriptive data, the researchers trained two models to judge rule violations using one of their four data settings. They trained one model using descriptive data and the other using normative data, and then compared their performance.

    They found that if descriptive data are used to train a model, it will underperform a model trained to perform the same judgements using normative data. Specifically, the descriptive model is more likely to misclassify inputs by falsely predicting a rule violation. And the descriptive model’s accuracy was even lower when classifying objects that human labelers disagreed about.

    “This shows that the data do really matter. It is important to match the training context to the deployment context if you are training models to detect if a rule has been violated,” Balagopalan says.

    It can be very difficult for users to determine how data have been gathered; this information can be buried in the appendix of a research paper or not revealed by a private company, Ghassemi says.

    Improving dataset transparency is one way this problem could be mitigated. If researchers know how data were gathered, then they know how those data should be used. Another possible strategy is to fine-tune a descriptively trained model on a small amount of normative data. This idea, known as transfer learning, is something the researchers want to explore in future work.

    They also want to conduct a similar study with expert labelers, like doctors or lawyers, to see if it leads to the same label disparity.

    “The way to fix this is to transparently acknowledge that if we want to reproduce human judgment, we must only use data that were collected in that setting. Otherwise, we are going to end up with systems that are going to have extremely harsh moderations, much harsher than what humans would do. Humans would see nuance or make another distinction, whereas these models don’t,” Ghassemi says.

    This research was funded, in part, by the Schwartz Reisman Institute for Technology and Society, Microsoft Research, the Vector Institute, and a Canada Research Council Chain. More