More stories

  • in

    A faster way to teach a robot

    Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

    “Right now, the way we train these robots, when they fail, we don’t really know why. So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that is missing from this system is enabling the robot to demonstrate why it is failing so the user can give it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

    Peng and her collaborators at MIT, New York University, and the University of California at Berkeley created a framework that enables humans to quickly teach a robot what they want it to do, with a minimal amount of effort.

    When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

    Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it can perform a second, similar task.

    The researchers tested this technique in simulations and found that it could teach a robot more efficiently than other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

    This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

    Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

    On-the-job training

    Robots often fail due to distribution shift — the robot is presented with objects and spaces it did not see during training, and it doesn’t understand what to do in this new environment.

    One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

    Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

    “I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color,” Peng says.

    To accomplish this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

    The framework has three steps. First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

    The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts do not impact the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

    In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

    Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

    From human reasoning to robot reasoning

    Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

    “It was so clear right off the bat. Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” she says.

    Then they applied their framework to three simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

    Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes the system to create new data using generative machine-learning models.

    “We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level,” Peng says.

    This research is supported, in part, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. More

  • in

    A new way to look at data privacy

    Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.

    But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.

    MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.

    The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.

    In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings.

    “PAC Privacy exploits the uncertainty or entropy of the sensitive data in a meaningful way,  and this allows us to add, in many cases, an order of magnitude less noise. This framework allows us to understand the characteristics of arbitrary data processing and privatize it automatically without artificial modifications. While we are in the early days and we are doing simple examples, we are excited about the promise of this technique,” says Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and co-author of a new paper on PAC Privacy.

    Devadas wrote the paper with lead author Hanshen Xiao, an electrical engineering and computer science graduate student. The research will be presented at the International Cryptography Conference (Crypto 2023).

    Defining privacy

    A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?

    Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.

    PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem.

    For instance, if the sensitive data are images of human faces, differential privacy would focus on whether the adversary can tell if someone’s face was in the dataset. PAC Privacy, on the other hand, could look at whether an adversary could extract a silhouette — an approximation — that someone could recognize as a particular individual’s face.

    Once they established the definition of PAC Privacy, the researchers created an algorithm that automatically tells the user how much noise to add to a model to prevent an adversary from confidently reconstructing a close approximation of the sensitive data. This algorithm guarantees privacy even if the adversary has infinite computing power, Xiao says.

    To find the optimal amount of noise, the PAC Privacy algorithm relies on the uncertainty, or entropy, in the original data from the viewpoint of the adversary.

    This automatic technique takes samples randomly from a data distribution or a large data pool and runs the user’s machine-learning training algorithm on that subsampled data to produce an output learned model. It does this many times on different subsamplings and compares the variance across all outputs. This variance determines how much noise one must add — a smaller variance means less noise is needed.

    Algorithm advantages

    Different from other privacy approaches, the PAC Privacy algorithm does not need knowledge of the inner workings of a model, or the training process.

    When implementing PAC Privacy, a user can specify their desired level of confidence at the outset. For instance, perhaps the user wants a guarantee that an adversary will not be more than 1 percent confident that they have successfully reconstructed the sensitive data to within 5 percent of its actual value. The PAC Privacy algorithm automatically tells the user the optimal amount of noise that needs to be added to the output model before it is shared publicly, in order to achieve those goals.

    “The noise is optimal, in the sense that if you add less than we tell you, all bets could be off. But the effect of adding noise to neural network parameters is complicated, and we are making no promises on the utility drop the model may experience with the added noise,” Xiao says.

    This points to one limitation of PAC Privacy — the technique does not tell the user how much accuracy the model will lose once the noise is added. PAC Privacy also involves repeatedly training a machine-learning model on many subsamplings of data, so it can be computationally expensive.  

    To improve PAC Privacy, one approach is to modify a user’s machine-learning training process so it is more stable, meaning that the output model it produces does not change very much when the input data is subsampled from a data pool.  This stability would create smaller variances between subsample outputs, so not only would the PAC Privacy algorithm need to be run fewer times to identify the optimal amount of noise, but it would also need to add less noise.

    An added benefit of stabler models is that they often have less generalization error, which means they can make more accurate predictions on previously unseen data, a win-win situation between machine learning and privacy, Devadas adds.

    “In the next few years, we would love to look a little deeper into this relationship between stability and privacy, and the relationship between privacy and generalization error. We are knocking on a door here, but it is not clear yet where the door leads,” he says.

    “Obfuscating the usage of an individual’s data in a model is paramount to protecting their privacy. However, to do so can come at the cost of the datas’ and therefore model’s utility,” says Jeremy Goodsitt, senior machine learning engineer at Capital One, who was not involved with this research. “PAC provides an empirical, black-box solution, which can reduce the added noise compared to current practices while maintaining equivalent privacy guarantees. In addition, its empirical approach broadens its reach to more data consuming applications.”

    This research is funded, in part, by DSTA Singapore, Cisco Systems, Capital One, and a MathWorks Fellowship. More

  • in

    Learning the language of molecules to predict their properties

    Discovering new materials and drugs typically involves a manual, trial-and-error process that can take decades and cost millions of dollars. To streamline this process, scientists often use machine learning to predict molecular properties and narrow down the molecules they need to synthesize and test in the lab.

    Researchers from MIT and the MIT-Watson AI Lab have developed a new, unified framework that can simultaneously predict molecular properties and generate new molecules much more efficiently than these popular deep-learning approaches.

    To teach a machine-learning model to predict a molecule’s biological or mechanical properties, researchers must show it millions of labeled molecular structures — a process known as training. Due to the expense of discovering molecules and the challenges of hand-labeling millions of structures, large training datasets are often hard to come by, which limits the effectiveness of machine-learning approaches.

    By contrast, the system created by the MIT researchers can effectively predict molecular properties using only a small amount of data. Their system has an underlying understanding of the rules that dictate how building blocks combine to produce valid molecules. These rules capture the similarities between molecular structures, which helps the system generate new molecules and predict their properties in a data-efficient manner.

    This method outperformed other machine-learning approaches on both small and large datasets, and was able to accurately predict molecular properties and generate viable molecules when given a dataset with fewer than 100 samples.

    “Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments,” says lead author Minghao Guo, a computer science and electrical engineering (EECS) graduate student.

    Guo’s co-authors include MIT-IBM Watson AI Lab research staff members Veronika Thost, Payel Das, and Jie Chen; recent MIT graduates Samuel Song ’23 and Adithya Balachandran ’23; and senior author Wojciech Matusik, a professor of electrical engineering and computer science and a member of the MIT-IBM Watson AI Lab, who leads the Computational Design and Fabrication Group within the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference for Machine Learning.

    Learning the language of molecules

    To achieve the best results with machine-learning models, scientists need training datasets with millions of molecules that have similar properties to those they hope to discover. In reality, these domain-specific datasets are usually very small. So, researchers use models that have been pretrained on large datasets of general molecules, which they apply to a much smaller, targeted dataset. However, because these models haven’t acquired much domain-specific knowledge, they tend to perform poorly.

    The MIT team took a different approach. They created a machine-learning system that automatically learns the “language” of molecules — what is known as a molecular grammar — using only a small, domain-specific dataset. It uses this grammar to construct viable molecules and predict their properties.

    In language theory, one generates words, sentences, or paragraphs based on a set of grammar rules. You can think of a molecular grammar the same way. It is a set of production rules that dictate how to generate molecules or polymers by combining atoms and substructures.

    Just like a language grammar, which can generate a plethora of sentences using the same rules, one molecular grammar can represent a vast number of molecules. Molecules with similar structures use the same grammar production rules, and the system learns to understand these similarities.

    Since structurally similar molecules often have similar properties, the system uses its underlying knowledge of molecular similarity to predict properties of new molecules more efficiently. 

    “Once we have this grammar as a representation for all the different molecules, we can use it to boost the process of property prediction,” Guo says.

    The system learns the production rules for a molecular grammar using reinforcement learning — a trial-and-error process where the model is rewarded for behavior that gets it closer to achieving a goal.

    But because there could be billions of ways to combine atoms and substructures, the process to learn grammar production rules would be too computationally expensive for anything but the tiniest dataset.

    The researchers decoupled the molecular grammar into two parts. The first part, called a metagrammar, is a general, widely applicable grammar they design manually and give the system at the outset. Then it only needs to learn a much smaller, molecule-specific grammar from the domain dataset. This hierarchical approach speeds up the learning process.

    Big results, small datasets

    In experiments, the researchers’ new system simultaneously generated viable molecules and polymers, and predicted their properties more accurately than several popular machine-learning approaches, even when the domain-specific datasets had only a few hundred samples. Some other methods also required a costly pretraining step that the new system avoids.

    The technique was especially effective at predicting physical properties of polymers, such as the glass transition temperature, which is the temperature required for a material to transition from solid to liquid. Obtaining this information manually is often extremely costly because the experiments require extremely high temperatures and pressures.

    To push their approach further, the researchers cut one training set down by more than half — to just 94 samples. Their model still achieved results that were on par with methods trained using the entire dataset.

    “This grammar-based representation is very powerful. And because the grammar itself is a very general representation, it can be deployed to different kinds of graph-form data. We are trying to identify other applications beyond chemistry or material science,” Guo says.

    In the future, they also want to extend their current molecular grammar to include the 3D geometry of molecules and polymers, which is key to understanding the interactions between polymer chains. They are also developing an interface that would show a user the learned grammar production rules and solicit feedback to correct rules that may be wrong, boosting the accuracy of the system.

    This work is funded, in part, by the MIT-IBM Watson AI Lab and its member company, Evonik. More

  • in

    Researchers teach an AI to write better chart captions

    Chart captions that explain complex trends and patterns are important for improving a reader’s ability to comprehend and retain the data being presented. And for people with visual disabilities, the information in a caption often provides their only means of understanding the chart.

    But writing effective, detailed captions is a labor-intensive process. While autocaptioning techniques can alleviate this burden, they often struggle to describe cognitive features that provide additional context.

    To help people author high-quality chart captions, MIT researchers have developed a dataset to improve automatic captioning systems. Using this tool, researchers could teach a machine-learning model to vary the level of complexity and type of content included in a chart caption based on the needs of users.

    The MIT researchers found that machine-learning models trained for autocaptioning with their dataset consistently generated captions that were precise, semantically rich, and described data trends and complex patterns. Quantitative and qualitative analyses revealed that their models captioned charts more effectively than other autocaptioning systems.  

    The team’s goal is to provide the dataset, called VisText, as a tool researchers can use as they work on the thorny problem of chart autocaptioning. These automatic systems could help provide captions for uncaptioned online charts and improve accessibility for people with visual disabilities, says co-lead author Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    “We’ve tried to embed a lot of human values into our dataset so that when we and other researchers are building automatic chart-captioning systems, we don’t end up with models that aren’t what people want or need,” she says.

    Boggust is joined on the paper by co-lead author and fellow graduate student Benny J. Tang and senior author Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

    Human-centered analysis

    The researchers were inspired to develop VisText from prior work in the Visualization Group that explored what makes a good chart caption. In that study, researchers found that sighted users and blind or low-vision users had different preferences for the complexity of semantic content in a caption. 

    The group wanted to bring that human-centered analysis into autocaptioning research. To do that, they developed VisText, a dataset of charts and associated captions that could be used to train machine-learning models to generate accurate, semantically rich, customizable captions.

    Developing effective autocaptioning systems is no easy task. Existing machine-learning methods often try to caption charts the way they would an image, but people and models interpret natural images differently from how we read charts. Other techniques skip the visual content entirely and caption a chart using its underlying data table. However, such data tables are often not available after charts are published.

    Given the shortfalls of using images and data tables, VisText also represents charts as scene graphs. Scene graphs, which can be extracted from a chart image, contain all the chart data but also include additional image context.

    “A scene graph is like the best of both worlds — it contains almost all the information present in an image while being easier to extract from images than data tables. As it’s also text, we can leverage advances in modern large language models for captioning,” Tang explains.

    They compiled a dataset that contains more than 12,000 charts — each represented as a data table, image, and scene graph — as well as associated captions. Each chart has two separate captions: a low-level caption that describes the chart’s construction (like its axis ranges) and a higher-level caption that describes statistics, relationships in the data, and complex trends.

    The researchers generated low-level captions using an automated system and crowdsourced higher-level captions from human workers.

    “Our captions were informed by two key pieces of prior research: existing guidelines on accessible descriptions of visual media and a conceptual model from our group for categorizing semantic content. This ensured that our captions featured important low-level chart elements like axes, scales, and units for readers with visual disabilities, while retaining human variability in how captions can be written,” says Tang.

    Translating charts

    Once they had gathered chart images and captions, the researchers used VisText to train five machine-learning models for autocaptioning. They wanted to see how each representation — image, data table, and scene graph — and combinations of the representations affected the quality of the caption.

    “You can think about a chart captioning model like a model for language translation. But instead of saying, translate this German text to English, we are saying translate this ‘chart language’ to English,” Boggust says.

    Their results showed that models trained with scene graphs performed as well or better than those trained using data tables. Since scene graphs are easier to extract from existing charts, the researchers argue that they might be a more useful representation.

    They also trained models with low-level and high-level captions separately. This technique, known as semantic prefix tuning, enabled them to teach the model to vary the complexity of the caption’s content.

    In addition, they conducted a qualitative examination of captions produced by their best-performing method and categorized six types of common errors. For instance, a directional error occurs if a model says a trend is decreasing when it is actually increasing.

    This fine-grained, robust qualitative evaluation was important for understanding how the model was making its errors. For example, using quantitative methods, a directional error might incur the same penalty as a repetition error, where the model repeats the same word or phrase. But a directional error could be more misleading to a user than a repetition error. The qualitative analysis helped them understand these types of subtleties, Boggust says.

    These sorts of errors also expose limitations of current models and raise ethical considerations that researchers must consider as they work to develop autocaptioning systems, she adds.

    Generative machine-learning models, such as those that power ChatGPT, have been shown to hallucinate or give incorrect information that can be misleading. While there is a clear benefit to using these models for autocaptioning existing charts, it could lead to the spread of misinformation if charts are captioned incorrectly.

    “Maybe this means that we don’t just caption everything in sight with AI. Instead, perhaps we provide these autocaptioning systems as authorship tools for people to edit. It is important to think about these ethical implications throughout the research process, not just at the end when we have a model to deploy,” she says.

    Boggust, Tang, and their colleagues want to continue optimizing the models to reduce some common errors. They also want to expand the VisText dataset to include more charts, and more complex charts, such as those with stacked bars or multiple lines. And they would also like to gain insights into what these autocaptioning models are actually learning about chart data.

    This research was supported, in part, by a Google Research Scholar Award, the National Science Foundation, the MLA@CSAIL Initiative, and the United States Air Force Research Laboratory. More

  • in

    Day of AI curriculum meets the moment

    MIT Responsible AI for Social Empowerment and Education (RAISE) recently celebrated the second annual Day of AI with two flagship local events. The Edward M. Kennedy Institute for the U.S. Senate in Boston hosted a human rights and data policy-focused event that was streamed worldwide. Dearborn STEM Academy in Roxbury, Massachusetts, hosted a student workshop in collaboration with Amazon Future Engineer. With over 8,000 registrations across all 50 U.S. states and 108 countries in 2023, participation in Day of AI has more than doubled since its inaugural year.

    Day of AI is a free curriculum of lessons and hands-on activities designed to teach kids of all ages and backgrounds the basics and responsible use of artificial intelligence, designed by researchers at MIT RAISE. This year, resources were available for educators to run at any time and in any increments they chose. The curriculum included five new modules to address timely topics like ChatGPT in School, Teachable Machines, AI and Social Media, Data Science and Me, and more. A collaboration with the International Society for Technology in Education also introduced modules for early elementary students. Educators across the world shared photos, videos, and stories of their students’ engagement, expressing excitement and even relief over the accessible lessons.

    Professor Cynthia Breazeal, director of RAISE, dean for digital learning at MIT, and head of the MIT Media Lab’s Personal Robots research group, said, “It’s been a year of extraordinary advancements in AI, and with that comes necessary conversations and concerns about who and what this technology is for. With our Day of AI events, we want to celebrate the teachers and students who are putting in the work to make sure that AI is for everyone.”

    Reflecting community values and protecting digital citizens

    Play video

    On May 18, 2023, MIT RAISE hosted a global Day of AI celebration featuring a flagship local event focused on human rights and data policy at the Edward M. Kennedy Institute for the U.S. Senate. Students from the Warren Prescott Middle School and New Mission High School heard from speakers the City of Boston, Liberty Mutual, and MIT to discuss the many benefits and challenges of artificial intelligence education. Video: MIT Open Learning

    MIT President Sally Kornbluth welcomed students from Warren Prescott Middle School and New Mission High School to the Day of AI program at the Edward M. Kennedy Institute. Kornbluth reflected on the exciting potential of AI, along with the ethical considerations society needs to be responsible for.

    “AI has the potential to do all kinds of fantastic things, including driving a car, helping us with the climate crisis, improving health care, and designing apps that we can’t even imagine yet. But what we have to make sure it doesn’t do is cause harm to individuals, to communities, to us — society as a whole,” she said.

    This theme resonated with each of the event speakers, whose jobs spanned the sectors of education, government, and business. Yo Deshpande, technologist for the public realm, and Michael Lawrence Evans, program director of new urban mechanics from the Boston Mayor’s Office, shared how Boston thinks about using AI to improve city life in ways that are “equitable, accessible, and delightful.” Deshpande said, “We have the opportunity to explore not only how AI works, but how using AI can line up with our values, the way we want to be in the world, and the way we want to be in our community.”

    Adam L’Italien, chief innovation officer at Liberty Mutual Insurance (one of Day of AI’s founding sponsors), compared our present moment with AI technologies to the early days of personal computers and internet connection. “Exposure to emerging technologies can accelerate progress in the world and in your own lives,” L’Italien said, while recognizing that the AI development process needs to be inclusive and mitigate biases.

    Human policies for artificial intelligence

    So how does society address these human rights concerns about AI? Marc Aidinoff ’21, former White House Office of Science and Technology Policy chief of staff, led a discussion on how government policy can influence the parameters of how technology is developed and used, like the Blueprint for an AI Bill of Rights. Aidinoff said, “The work of building the world you want to see is far harder than building the technical AI system … How do you work with other people and create a collective vision for what we want to do?” Warren Prescott Middle School students described how AI could be used to solve problems that humans couldn’t. But they also shared their concerns that AI could affect data privacy, learning deficits, social media addiction, job displacement, and propaganda.

    In a mock U.S. Senate trial activity designed by Daniella DiPaola, PhD student at the MIT Media Lab, the middle schoolers investigated what rights might be undermined by AI in schools, hospitals, law enforcement, and corporations. Meanwhile, New Mission High School students workshopped the ideas behind bill S.2314, the Social Media Addiction Reduction Technology (SMART) Act, in an activity designed by Raechel Walker, graduate research assistant in the Personal Robots Group, and Matt Taylor, research assistant at the Media Lab. They discussed what level of control could or should be introduced at the parental, educational, and governmental levels to reduce the risks of internet addiction.

    “Alexa, how do I program AI?”

    Play video

    The 2023 Day of AI celebration featured a flagship local event at the Dearborn STEM Academy in Roxbury in collaboration with Amazon Future Engineer. Students participated in a hands-on activity using MIT App Inventor as part of Day of AI’s Alexa lesson. Video: MIT Open Learning

    At Dearborn STEM Academy, Amazon Future Engineer helped students work through the Intro to Voice AI curriculum module in real-time. Students used MIT App Inventor to code basic commands for Alexa. In an interview with WCVB, Principal Darlene Marcano said, “It’s important that we expose our students to as many different experiences as possible. The students that are participating are on track to be future computer scientists and engineers.”

    Breazeal told Dearborn students, “We want you to have an informed voice about how you want AI to be used in society. We want you to feel empowered that you can shape the world. You can make things with AI to help make a better world and a better community.”

    Rohit Prasad ’08, senior vice president and head scientist for Alexa at Amazon, and Victor Reinoso ’97, global director of philanthropic education initiatives at Amazon, also joined the event. “Amazon and MIT share a commitment to helping students discover a world of possibilities through STEM and AI education,” said Reinoso. “There’s a lot of current excitement around the technological revolution with generative AI and large language models, so we’re excited to help students explore careers of the future and navigate the pathways available to them.” To highlight their continued investment in the local community and the school program, Amazon donated a $25,000 Innovation and Early College Pathways Program Grant to the Boston Public School system.

    Day of AI down under

    Not only was the Day of AI program widely adopted across the globe, Australian educators were inspired to adapt their own regionally specific curriculum. An estimated 161,000 AI professionals will be needed in Australia by 2030, according to the National Artificial Intelligence Center in the Commonwealth Scientific and Industrial Research Organization (CSIRO), an Australian government agency and Day of AI Australia project partner. CSIRO worked with the University of New South Wales to develop supplementary educational resources on AI ethics and machine learning. Day of AI Australia reached 85,000 students at 400-plus secondary schools this year, sparking curiosity in the next generation of AI experts.

    The interest in AI is accelerating as fast as the technology is being developed. Day of AI offers a unique opportunity for K-12 students to shape our world’s digital future and their own.

    “I hope that some of you will decide to be part of this bigger effort to help us figure out the best possible answers to questions that are raised by AI,” Kornbluth told students at the Edward M. Kennedy Institute. “We’re counting on you, the next generation, to learn how AI works and help make sure it’s for everyone.” More

  • in

    MIT-Pillar AI Collective announces first seed grant recipients

    The MIT-Pillar AI Collective has announced its first six grant recipients. Students, alumni, and postdocs working on a broad range of topics in artificial intelligence, machine learning, and data science will receive funding and support for research projects that could translate into commercially viable products or companies. These grants are intended to help students explore commercial applications for their research, and eventually drive that commercialization through the creation of a startup.

    “These tremendous students and postdocs are working on projects that have the potential to be truly transformative across a diverse range of industries. It’s thrilling to think that the novel research these teams are conducting could lead to the founding of startups that revolutionize everything from drug delivery to video conferencing,” says Anantha Chandrakasan, dean of the School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science.

    Launched in September 2022, the MIT-Pillar AI Collective is a pilot program funded by a $1 million gift from Pillar VC that aims to cultivate prospective entrepreneurs and drive innovation in areas related to AI. Administered by the MIT Deshpande Center for Technological Innovation, the AI Collective centers on the market discovery process, advancing projects through market research, customer discovery, and prototyping. Graduate students and postdocs supported by the program work toward the development of minimum viable products.

    “In addition to funding, the MIT-Pillar AI Collective provides grant recipients with mentorship and guidance. With the rapid advancement of AI technologies, this type of support is critical to ensure students and postdocs are able to access the resources required to move quickly in this fast-pace environment,” says Jinane Abounadi, managing director of the MIT-Pillar AI Collective.

    The six inaugural recipients will receive support in identifying key milestones and advice from experienced entrepreneurs. The AI Collective assists seed grant recipients in gathering feedback from potential end-users, as well as getting insights from early-stage investors. The program also organizes community events, including a “Founder Talks” speaker series, and other team-building activities.   

    “Each one of these grant recipients exhibits an entrepreneurial spirit. It is exciting to provide support and guidance as they start a journey that could one day see them as founders and leaders of successful companies,” adds Jamie Goldstein ’89, founder of Pillar VC.

    The first cohort of grant recipients include the following projects:

    Predictive query interface

    Abdullah Alomar SM ’21, a PhD candidate studying electrical engineering and computer science, is building a predictive query interface for time series databases to better forecast demand and financial data. This user-friendly interface can help alleviate some of the bottlenecks and issues related to unwieldy data engineering processes while providing state-of-the-art statistical accuracy. Alomar is advised by Devavrat Shah, the Andrew (1956) and Erna Viterbi Professor at MIT.

    Design of light-activated drugs

    Simon Axelrod, a PhD candidate studying chemical physics at Harvard University, is combining AI with physics simulations to design light-activated drugs that could reduce side effects and improve effectiveness. Patients would receive an inactive form of a drug, which is then activated by light in a specific area of the body containing diseased tissue. This localized use of photoactive drugs would minimize the side effects from drugs targeting healthy cells. Axelrod is developing novel computational models that predict properties of photoactive drugs with high speed and accuracy, allowing researchers to focus on only the highest-quality drug candidates. He is advised by Rafael Gomez-Bombarelli, the Jeffrey Cheah Career Development Chair in Engineering in the MIT Department of Materials Science and Engineering. 

    Low-cost 3D perception

    Arjun Balasingam, a PhD student in electrical engineering and computer science and a member of the Computer Science and Artificial Intelligence Laboratory’s (CSAIL) Networks and Mobile Systems group, is developing a technology, called MobiSee, that enables real-time 3D reconstruction in challenging dynamic environments. MobiSee uses self-supervised AI methods along with video and lidar to provide low-cost, state-of-the-art 3D perception on consumer mobile devices like smartphones. This technology could have far-reaching applications across mixed reality, navigation, safety, and sports streaming, in addition to unlocking opportunities for new real-time and immersive experiences. He is advised by Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence at MIT and member of CSAIL.

    Sleep therapeutics

    Guillermo Bernal SM ’14, PhD ’23, a recent PhD graduate in media arts and sciences, is developing a sleep therapeutic platform that would enable sleep specialists and researchers to conduct robust sleep studies and develop therapy plans remotely, while the patient is comfortable in their home. Called Fascia, the three-part system consists of a polysomnogram with a sleep mask form factor that collects data, a hub that enables researchers to provide stimulation and feedback via olfactory, auditory, and visual stimuli, and a web portal that enables researchers to read a patient’s signals in real time with machine learning analysis. Bernal was advised by Pattie Maes, professor of media arts and sciences at the MIT Media Lab.

    Autonomous manufacturing assembly with human-like tactile perception

    Michael Foshey, a mechanical engineer and project manager with MIT CSAIL’s Computational Design and Fabrication Group, is developing an AI-enabled tactile perception system that can be used to give robots human-like dexterity. With this new technology platform, Foshey and his team hope to enable industry-changing applications in manufacturing. Currently, assembly tasks in manufacturing are largely done by hand and are typically repetitive and tedious. As a result, these jobs are being largely left unfilled. These labor shortages can cause supply chain shortages and increases in the cost of production. Foshey’s new technology platform aims to address this by automating assembly tasks to reduce reliance on manual labor. Foshey is supervised by Wojciech Matusik, MIT professor of electrical engineering and computer science and member of CSAIL.  

    Generative AI for video conferencing

    Vibhaalakshmi Sivaraman SM ’19, a PhD candidate in electrical engineering and computer science who is a member of CSAIL’s Networking and Mobile Systems Group, is developing a generative technology, Gemino, to facilitate video conferencing in high-latency and low-bandwidth network environments. Gemino is a neural compression system for video conferencing that overcomes the robustness concerns and compute complexity challenges that limit current face-image-synthesis models. This technology could enable sustained video conferencing calls in regions and scenarios that cannot reliably support video calls today. Sivaraman is advised by Mohammad Alizadeh, MIT associate professor of electrical engineering and computer science and member of CSAIL.  More

  • in

    Bringing the social and ethical responsibilities of computing to the forefront

    There has been a remarkable surge in the use of algorithms and artificial intelligence to address a wide range of problems and challenges. While their adoption, particularly with the rise of AI, is reshaping nearly every industry sector, discipline, and area of research, such innovations often expose unexpected consequences that involve new norms, new expectations, and new rules and laws.

    To facilitate deeper understanding, the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Schwarzman College of Computing, recently brought together social scientists and humanists with computer scientists, engineers, and other computing faculty for an exploration of the ways in which the broad applicability of algorithms and AI has presented both opportunities and challenges in many aspects of society.

    “The very nature of our reality is changing. AI has the ability to do things that until recently were solely the realm of human intelligence — things that can challenge our understanding of what it means to be human,” remarked Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing, in his opening address at the inaugural SERC Symposium. “This poses philosophical, conceptual, and practical questions on a scale not experienced since the start of the Enlightenment. In the face of such profound change, we need new conceptual maps for navigating the change.”

    The symposium offered a glimpse into the vision and activities of SERC in both research and education. “We believe our responsibility with SERC is to educate and equip our students and enable our faculty to contribute to responsible technology development and deployment,” said Georgia Perakis, the William F. Pounds Professor of Management in the MIT Sloan School of Management, co-associate dean of SERC, and the lead organizer of the symposium. “We’re drawing from the many strengths and diversity of disciplines across MIT and beyond and bringing them together to gain multiple viewpoints.”

    Through a succession of panels and sessions, the symposium delved into a variety of topics related to the societal and ethical dimensions of computing. In addition, 37 undergraduate and graduate students from a range of majors, including urban studies and planning, political science, mathematics, biology, electrical engineering and computer science, and brain and cognitive sciences, participated in a poster session to exhibit their research in this space, covering such topics as quantum ethics, AI collusion in storage markets, computing waste, and empowering users on social platforms for better content credibility.

    Showcasing a diversity of work

    In three sessions devoted to themes of beneficent and fair computing, equitable and personalized health, and algorithms and humans, the SERC Symposium showcased work by 12 faculty members across these domains.

    One such project from a multidisciplinary team of archaeologists, architects, digital artists, and computational social scientists aimed to preserve endangered heritage sites in Afghanistan with digital twins. The project team produced highly detailed interrogable 3D models of the heritage sites, in addition to extended reality and virtual reality experiences, as learning resources for audiences that cannot access these sites.

    In a project for the United Network for Organ Sharing, researchers showed how they used applied analytics to optimize various facets of an organ allocation system in the United States that is currently undergoing a major overhaul in order to make it more efficient, equitable, and inclusive for different racial, age, and gender groups, among others.

    Another talk discussed an area that has not yet received adequate public attention: the broader implications for equity that biased sensor data holds for the next generation of models in computing and health care.

    A talk on bias in algorithms considered both human bias and algorithmic bias, and the potential for improving results by taking into account differences in the nature of the two kinds of bias.

    Other highlighted research included the interaction between online platforms and human psychology; a study on whether decision-makers make systemic prediction mistakes on the available information; and an illustration of how advanced analytics and computation can be leveraged to inform supply chain management, operations, and regulatory work in the food and pharmaceutical industries.

    Improving the algorithms of tomorrow

    “Algorithms are, without question, impacting every aspect of our lives,” said Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science, in kicking off a panel she moderated on the implications of data and algorithms.

    “Whether it’s in the context of social media, online commerce, automated tasks, and now a much wider range of creative interactions with the advent of generative AI tools and large language models, there’s little doubt that much more is to come,” Ozdaglar said. “While the promise is evident to all of us, there’s a lot to be concerned as well. This is very much time for imaginative thinking and careful deliberation to improve the algorithms of tomorrow.”

    Turning to the panel, Ozdaglar asked experts from computing, social science, and data science for insights on how to understand what is to come and shape it to enrich outcomes for the majority of humanity.

    Sarah Williams, associate professor of technology and urban planning at MIT, emphasized the critical importance of comprehending the process of how datasets are assembled, as data are the foundation for all models. She also stressed the need for research to address the potential implication of biases in algorithms that often find their way in through their creators and the data used in their development. “It’s up to us to think about our own ethical solutions to these problems,” she said. “Just as it’s important to progress with the technology, we need to start the field of looking at these questions of what biases are in the algorithms? What biases are in the data, or in that data’s journey?”

    Shifting focus to generative models and whether the development and use of these technologies should be regulated, the panelists — which also included MIT’s Srini Devadas, professor of electrical engineering and computer science, John Horton, professor of information technology, and Simon Johnson, professor of entrepreneurship — all concurred that regulating open-source algorithms, which are publicly accessible, would be difficult given that regulators are still catching up and struggling to even set guardrails for technology that is now 20 years old.

    Returning to the question of how to effectively regulate the use of these technologies, Johnson proposed a progressive corporate tax system as a potential solution. He recommends basing companies’ tax payments on their profits, especially for large corporations whose massive earnings go largely untaxed due to offshore banking. By doing so, Johnson said that this approach can serve as a regulatory mechanism that discourages companies from trying to “own the entire world” by imposing disincentives.

    The role of ethics in computing education

    As computing continues to advance with no signs of slowing down, it is critical to educate students to be intentional in the social impact of the technologies they will be developing and deploying into the world. But can one actually be taught such things? If so, how?

    Caspar Hare, professor of philosophy at MIT and co-associate dean of SERC, posed this looming question to faculty on a panel he moderated on the role of ethics in computing education. All experienced in teaching ethics and thinking about the social implications of computing, each panelist shared their perspective and approach.

    A strong advocate for the importance of learning from history, Eden Medina, associate professor of science, technology, and society at MIT, said that “often the way we frame computing is that everything is new. One of the things that I do in my teaching is look at how people have confronted these issues in the past and try to draw from them as a way to think about possible ways forward.” Medina regularly uses case studies in her classes and referred to a paper written by Yale University science historian Joanna Radin on the Pima Indian Diabetes Dataset that raised ethical issues on the history of that particular collection of data that many don’t consider as an example of how decisions around technology and data can grow out of very specific contexts.

    Milo Phillips-Brown, associate professor of philosophy at Oxford University, talked about the Ethical Computing Protocol that he co-created while he was a SERC postdoc at MIT. The protocol, a four-step approach to building technology responsibly, is designed to train computer science students to think in a better and more accurate way about the social implications of technology by breaking the process down into more manageable steps. “The basic approach that we take very much draws on the fields of value-sensitive design, responsible research and innovation, participatory design as guiding insights, and then is also fundamentally interdisciplinary,” he said.

    Fields such as biomedicine and law have an ethics ecosystem that distributes the function of ethical reasoning in these areas. Oversight and regulation are provided to guide front-line stakeholders and decision-makers when issues arise, as are training programs and access to interdisciplinary expertise that they can draw from. “In this space, we have none of that,” said John Basl, associate professor of philosophy at Northeastern University. “For current generations of computer scientists and other decision-makers, we’re actually making them do the ethical reasoning on their own.” Basl commented further that teaching core ethical reasoning skills across the curriculum, not just in philosophy classes, is essential, and that the goal shouldn’t be for every computer scientist be a professional ethicist, but for them to know enough of the landscape to be able to ask the right questions and seek out the relevant expertise and resources that exists.

    After the final session, interdisciplinary groups of faculty, students, and researchers engaged in animated discussions related to the issues covered throughout the day during a reception that marked the conclusion of the symposium. More

  • in

    MIT researchers make language models scalable self-learners

    Socrates once said: “It is not the size of a thing, but the quality that truly matters. For it is in the nature of substance, not its volume, that true value is found.”

    Does size always matter for large language models (LLMs)? In a technological landscape bedazzled by LLMs taking center stage, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers think smaller models shouldn’t be overlooked, especially for natural language understanding products widely deployed in the industry.

    To that end, the researchers cooked up an approach to long-standing problems of inefficiency and privacy associated with big, text-based AI models — a logic-aware model that outperforms 500-times-bigger counterparts on some language understanding tasks without human-generated annotations, while preserving privacy and robustness with high performance.

    LLMs, which have shown some promising skills in generating language, art, and code, are computationally expensive, and their data requirements can risk privacy leaks when using application programming interfaces for data upload. Smaller models have been historically less capable, particularly in multitasking and weakly supervised tasks, compared to their larger counterparts.

    So what’s helping these smaller models act so mighty, then? Something called “textual entailment,” a way to help these models understand a variety of language tasks, where if one sentence (the premise) is true, then the other sentence (the hypothesis) is likely to be true as well. For example, if the premise is, “all cats have tails” then the hypothesis “a tabby cat has a tail” would be entailed by the premise. This concept is used to train an “entailment model” that proved to be less biased than other language models, from the team’s previous research. They then created “prompts” that the models can use to figure out if certain information is entailed by a given sentence or phrase according to different tasks. This method improved the model’s ability to adapt to different tasks without any additional training, known as zero-shot adaptation.

    In the realm of “natural language understanding,” there are various applications that hinge on determining the relationship between two pieces of text. For example, in sentiment classification, a statement like “I think the movie is good” can be inferred or entailed from a movie review that says, “I like the story and the acting is great,” indicating a positive sentiment. Another is news classification, where the topic of a news article can be inferred from its content. For example, a statement like “the news article is about sports” can be entailed if the main content of the article reports on an NBA game. The key insight was that many existing natural language understanding tasks could be recast as an entailment (i.e., logical inference in natural language) task. 

    “Our research is about improving the ability of computer programs to understand and process natural language — the way humans speak and write. Our self-trained, 350-million-parameter entailment models, without human-generated labels, outperform supervised language models with 137 to 175 billion parameters,” says MIT CSAIL postdoc Hongyin Luo, lead author on a new paper about the study. “This has potential to reshape the landscape of AI and machine learning, providing a more scalable, trustworthy, and cost-effective solution to language modeling,” says Luo. “By proving that smaller models can perform at the same level as larger ones for language understanding, this work paves the way for more sustainable and privacy-preserving AI technologies.” 

    The team discovered that they could improve the model’s performance even more by using a technique called “self-training,” where the model uses its own predictions to teach itself, effectively learning without human supervision and additional annotated training data.The self-training method significantly improved performance on a bunch of downstream tasks, including sentiment analysis, question-answering, and news classification. It outperformed both Google’s LaMDA and FLAN in zero-shot capabilities, GPT models, and other supervised algorithms. 

    However, one challenge with self-training is that the model can sometimes generate incorrect or noisy labels that harm performance. To overcome this, they developed a new algorithm called ‘SimPLE’ (Simple Pseudo-Label Editing), a process to review and modify the pseudo-labels made in initial rounds of learning. By correcting any mislabeled instances, it improved the overall quality of the self-generated labels. This not only made the models more effective at understanding language, but more robust when faced with adversarial data. 

    As with most research, there are some limitations. The self-training on multi-class classification tasks didn’t perform as well as on binary natural language understanding tasks, indicating the challenge of applying entailment models to multi-choice tasks.“This research presents an efficient and effective way to train large language models (LLMs) by formulating natural language understanding tasks as contextual entailment problems and employing a pseudo-labeling self-training mechanism to incorporate large quantities of unlabelled text data in the training process,” adds CSAIL Senior Research Scientist James Glass, who is also an author on the paper. “While the field of LLMs is undergoing rapid and dramatic changes, this research shows that it is possible to produce relatively compact language models that perform very well on benchmark understanding tasks compared to their peers of roughly the same size, or even much larger language models.”

    “Entailment task is a popular proxy to evaluate “understanding” of a given context by an AI model,” says Leonid Karlinsky, research staff member at the MIT-IBM Watson AI Lab. “It is used in many areas analyzing models with unimodal, like LLMs, and and multi-modal, like VLMs [visual language models] inputs, simplifying the task of question-answering about a given input context to a binary classification problem — does this context entail a certain (e.g., text) conclusion or not? This paper makes two contributions in this space. First, it proposes a way to improve the zero-shot (without additional tuning) NLU performance and robustness to adversarial attacks via tuning with synthesized (specialized) entailment tasks generated for the primal NLU task. Second, it offers a self-supervised SimPLE method including pseudo-labeling and confidence-based filtering to further improve large LLMs’ NLU performance.”

    Luo and Glass wrote the paper with Yoon Kim, a CSAIL member and assistant professor in MIT’s Department of Electrical Engineering and Computer Science, and Jiaxin Ge of Peking University. Their work will be presented at the meeting of the Association for Computational Linguistics in Toronto, Ontario this July. This research was supported by a grant from the Hong Kong Innovation AI program. More