More stories

  • in

    Researchers develop novel AI-based estimator for manufacturing medicine

    When medical companies manufacture the pills and tablets that treat any number of illnesses, aches, and pains, they need to isolate the active pharmaceutical ingredient from a suspension and dry it. The process requires a human operator to monitor an industrial dryer, agitate the material, and watch for the compound to take on the right qualities for compressing into medicine. The job depends heavily on the operator’s observations.   

    Methods for making that process less subjective and a lot more efficient are the subject of a recent Nature Communications paper authored by researchers at MIT and Takeda. The paper’s authors devise a way to use physics and machine learning to categorize the rough surfaces that characterize particles in a mixture. The technique, which uses a physics-enhanced autocorrelation-based estimator (PEACE), could change pharmaceutical manufacturing processes for pills and powders, increasing efficiency and accuracy and resulting in fewer failed batches of pharmaceutical products.  

    “Failed batches or failed steps in the pharmaceutical process are very serious,” says Allan Myerson, a professor of practice in the MIT Department of Chemical Engineering and one of the study’s authors. “Anything that improves the reliability of the pharmaceutical manufacturing, reduces time, and improves compliance is a big deal.”

    The team’s work is part of an ongoing collaboration between Takeda and MIT, launched in 2020. The MIT-Takeda Program aims to leverage the experience of both MIT and Takeda to solve problems at the intersection of medicine, artificial intelligence, and health care.

    In pharmaceutical manufacturing, determining whether a compound is adequately mixed and dried ordinarily requires stopping an industrial-sized dryer and taking samples off the manufacturing line for testing. Researchers at Takeda thought artificial intelligence could improve the task and reduce stoppages that slow down production. Originally the research team planned to use videos to train a computer model to replace a human operator. But determining which videos to use to train the model still proved too subjective. Instead, the MIT-Takeda team decided to illuminate particles with a laser during filtration and drying, and measure particle size distribution using physics and machine learning. 

    “We just shine a laser beam on top of this drying surface and observe,” says Qihang Zhang, a doctoral student in MIT’s Department of Electrical Engineering and Computer Science and the study’s first author. 

    Play video

    A physics-derived equation describes the interaction between the laser and the mixture, while machine learning characterizes the particle sizes. The process doesn’t require stopping and starting the process, which means the entire job is more secure and more efficient than standard operating procedure, according to George Barbastathis, professor of mechanical engineering at MIT and corresponding author of the study.

    The machine learning algorithm also does not require many datasets to learn its job, because the physics allows for speedy training of the neural network.

    “We utilize the physics to compensate for the lack of training data, so that we can train the neural network in an efficient way,” says Zhang. “Only a tiny amount of experimental data is enough to get a good result.”

    Today, the only inline processes used for particle measurements in the pharmaceutical industry are for slurry products, where crystals float in a liquid. There is no method for measuring particles within a powder during mixing. Powders can be made from slurries, but when a liquid is filtered and dried its composition changes, requiring new measurements. In addition to making the process quicker and more efficient, using the PEACE mechanism makes the job safer because it requires less handling of potentially highly potent materials, the authors say. 

    The ramifications for pharmaceutical manufacturing could be significant, allowing drug production to be more efficient, sustainable, and cost-effective, by reducing the number of experiments companies need to conduct when making products. Monitoring the characteristics of a drying mixture is an issue the industry has long struggled with, according to Charles Papageorgiou, the director of Takeda’s Process Chemistry Development group and one of the study’s authors. 

    “It is a problem that a lot of people are trying to solve, and there isn’t a good sensor out there,” says Papageorgiou. “This is a pretty big step change, I think, with respect to being able to monitor, in real time, particle size distribution.”

    Papageorgiou said that the mechanism could have applications in other industrial pharmaceutical operations. At some point, the laser technology may be able to train video imaging, allowing manufacturers to use a camera for analysis rather than laser measurements. The company is now working to assess the tool on different compounds in its lab. 

    The results come directly from collaboration between Takeda and three MIT departments: Mechanical Engineering, Chemical Engineering, and Electrical Engineering and Computer Science. Over the last three years, researchers at MIT and Takeda have worked together on 19 projects focused on applying machine learning and artificial intelligence to problems in the health-care and medical industry as part of the MIT-Takeda Program. 

    Often, it can take years for academic research to translate to industrial processes. But researchers are hopeful that direct collaboration could shorten that timeline. Takeda is a walking distance away from MIT’s campus, which allowed researchers to set up tests in the company’s lab, and real-time feedback from Takeda helped MIT researchers structure their research based on the company’s equipment and operations. 

    Combining the expertise and mission of both entities helps researchers ensure their experimental results will have real-world implications. The team has already filed for two patents and has plans to file for a third.   More

  • in

    Driving toward data justice

    As a person with a mixed-race background who has lived in four different cities, Amelia Dogan describes her early life as “growing up in a lot of in-betweens.” Now an MIT senior, she continues to link different perspectives together, working at the intersection of urban planning, computer science, and social justice.

    Dogan was born in Canada but spent her high school years in Philadelphia, where she developed a strong affinity for the city.  

    “I love Philadelphia to death,” says Dogan. “It’s my favorite place in the world. The energy in the city is amazing — I’m so sad I wasn’t there for the Super Bowl this year — but it is a city with really big disparities. That drives me to do the research that I do and shapes the things that I care about.”

    Dogan is double-majoring in urban science and planning with computer science and in American studies. She decided on the former after participating in the pre-orientation program offered by the Department of Urban Studies and Planning, which provides an introduction to both the department and the city of Boston. She followed that up with a UROP research project with the West Philadelphia Landscape Project, putting together historical census data on housing and race to find patterns for use in community advocacy.

    After taking WGS.231 (Writing About Race), a course offered by the Program in Women’s and Gender Studies, her first year at MIT, Dogan realized there was a lot of crosstalk between urban planning, computer science, and the social sciences.

    “There’s a lot of critical social theory that I want to have background in to make me a better planner or a better computer scientist,” says Dogan. “There’s also a lot of issues around fairness and participation in computer science, and a lot of computer scientists are trying to reinvent the wheel when there’s already really good, critical social science research and theory behind this.”

    Data science and feminism

    Dogan’s first year at MIT was interrupted by the onset of the Covid-19 pandemic, but there was a silver lining. An influx of funding to keep students engaged while attending school virtually enabled her to join the Data + Feminism Lab to work on a case study examining three places in Philadelphia with historical names that were renamed after activist efforts.

    In her first year at MIT, Dogan worked several UROPs to hone her own skills and find the best research fit. Besides the West Philadelphia Land Project, she worked on two projects within the MIT Sloan School of Management. The first involved searching for connections between entrepreneurship and immigration among Fortune 500 founders. The second involved interviewing warehouse workers and writing a report on their quality of life.

    Dogan has now spent three years in the Data + Feminism Lab under Associate Professor Catherine D’Ignazio, where she is particularly interested in how technology can be used by marginalized communities to invert historical power imbalances. A key concept in the lab’s work is that of counterdata, which are produced by civil society groups or individuals in order to counter missing data or to challenge existing official data.

    Most recently, she completed a SuperUROP project investigating how femicide data activist organizations use social media. She analyzed 600 social media posts by organizations across the U.S. and Canada. The work built off the lab’s greater body of work with these groups, which Dogan has contributed to by annotating news articles for machine-learning models.

    “Catherine works a lot at the intersection of data issues and feminism. It just seemed like the right fit for me,” says Dogan. “She’s my academic advisor, she’s my research advisor, and is also a really good mentor.”

    Advocating for the student experience

    Outside of the classroom, Dogan is a strong advocate for improving the student experience, particularly when it intersects with identity. An executive board member of the Asian American Initiative (AAI), she also sits on the student advisory council for the Office of Minority Education.

    “Doing that institutional advocacy has been important to me, because it’s for things that I expected coming into college and had not come in prepared to fight for,” says Dogan. As a high schooler, she participated in programs run by the University of Pennsylvania’s Pan-Asian American Community House and was surprised to find that MIT did not have an equivalent organization.

    “Building community based upon identity is something that I’ve been really passionate about,” says Dogan. “For the past two years, I’ve been working with AAI on a list of recommendations for MIT. I’ve talked to alums from the ’90s who were a part of an Asian American caucus who were asking for the same things.”

    She also holds a leadership role with MIXED @ MIT, a student group focused on creating space for mixed-heritage students to explore and discuss their identities.

    Following graduation, Dogan plans to pursue a PhD in information science at the University of Washington. Her breadth of skills has given her a range of programs to choose from. No matter where she goes next, Dogan wants to pursue a career where she can continue to make a tangible impact.

    “I would love to be doing community-engaged research around data justice, using citizen science and counterdata for policy and social change,” she says. More

  • in

    Martin Wainwright named director of the Institute for Data, Systems, and Society

    Martin Wainwright, the Cecil H. Green Professor in MIT’s departments of Electrical Engineering and Computer Science (EECS) and Mathematics, has been named the new director of the Institute for Data, Systems, and Society (IDSS), effective July 1.

    “Martin is a widely recognized leader in statistics and machine learning — both in research and in education. In taking on this leadership role in the college, Martin will work to build up the human and institutional behavior component of IDSS, while strengthening initiatives in both policy and statistics, and collaborations within the institute, across MIT, and beyond,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “I look forward to working with him and supporting his efforts in this next chapter for IDSS.”

    “Martin holds a strong belief in the value of theoretical, experimental, and computational approaches to research and in facilitating connections between them. He also places much importance in having practical, as well as academic, impact,” says Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing, department head of EECS, and the MathWorks Professor of Electrical Engineering and Computer Science. “As the new director of IDSS, he will undoubtedly bring these tenets to the role in advancing the mission of IDSS and helping to shape its future.”

    A principal investigator in the Laboratory for Information and Decision Systems and the Statistics and Data Science Center, Wainwright joined the MIT faculty in July 2022 from the University of California at Berkeley, where he held the Howard Friesen Chair with a joint appointment between the departments of Electrical Engineering and Computer Science and Statistics.

    Wainwright received his bachelor’s degree in mathematics from the University of Waterloo, Canada, and doctoral degree in electrical engineering and computer science from MIT. He has received a number of awards and recognition, including an Alfred P. Sloan Foundation Fellowship, and best paper awards from the IEEE Signal Processing Society, IEEE Communications Society, and IEEE Information Theory and Communication Societies. He has also been honored with the Medallion Lectureship and Award from the Institute of Mathematical Statistics, and the COPSS Presidents’ Award from the Joint Statistical Societies. He was a section lecturer with the International Congress of Mathematicians in 2014 and received the Blackwell Award from the Institute of Mathematical Statistics in 2017.

    He is the author of “High-dimensional Statistics: A Non-Asymptotic Viewpoint” (Cambridge University Press, 2019), and is coauthor on several books, including on graphical models and on sparse statistical modeling.

    Wainwright succeeds Munther Dahleh, the William A. Coolidge Professor in EECS, who has helmed IDSS since its founding in 2015.

    “I am grateful to Munther and thank him for his leadership of IDSS. As the founding director, he has led the creation of a remarkable new part of MIT,” says Huttenlocher. More

  • in

    Learner in Afghanistan reaches beyond barriers to pursue career in data science

    Tahmina S. was a junior studying computer engineering at a top university in Afghanistan when a new government policy banned women from pursuing education. In August 2021, the Taliban prohibited girls from attending school beyond the sixth grade. While women were initially allowed to continue to attend universities, by October 2021, an order from the Ministry of Higher Education declared that all women in Afghanistan were suspended from attending public and private centers of higher education.

    Determined to continue her studies and pursue her ambitions, Tahmina found the MIT Refugee Action Hub (ReACT) and was accepted to its Certificate in Computer Science and Data Science program in 2022.

    “ReACT helped me realize that I can do big things and be a part of big things,” she says.

    MIT ReACT provides education and professional opportunities to learners from refugee and forcibly displaced communities worldwide. ReACT’s core pillars include academic development, human skills development, employment pathways, and network building. Since 2017, ReACT has offered its Certificate in Computer and Data Science (CDS) program free-of-cost to learners wherever they live. In 2022, ReACT welcomed its largest and most diverse cohort to date — 136 learners from 29 countries — including 25 learners from Afghanistan, more than half of whom are women.

    Tahmina was able to select her classes in the program, and especially valued learning Python — which has led to her studying other programming languages and gaining more skills in data science. She’s continuing to take online courses in hopes of completing her undergraduate degree, and someday pursuing a masters degree in computer science and becoming a data scientist.

    “It’s an important and fun career. I really love data,” she says. “If this is my only time for this experience, I will bring to the table what I have, and do my best.”

    In addition to the education ban, Tahmina also faced the challenge of accessing an internet connection, which is expensive where she lives. But she regularly studies between 12 and 14 hours a day to achieve her dreams.

    The ReACT program offers a blend of asynchronous and synchronous learning. Learners complete a curated series of online, rigorous MIT coursework through MITx with the support of teaching assistants and collaborators, and also participate in a series of interactive online workshops in interpersonal skills that are critical to success in education and careers.

    ReACT learners engage with MIT’s global network of experts including MIT staff, faculty, and alumni — as well as collaborators across technology, humanitarian, and government sectors.

    “I loved that experience a lot, it was a huge achievement. I’m grateful ReACT gave me a chance to be a part of that team of amazing people. I’m amazed I completed that program, because it was really challenging.”

    Theory into practice

    Tahmina was one of 10 students from the ReACT cohort accepted to the highly competitive MIT Innovation Leadership Bootcamp program. She worked on a team of five people who initiated a business proposal and took the project through each phase of the development process. Her team’s project was creating an app for finance management for users aged 23-51 — including all the graphic elements and a final presentation. One valuable aspect of the boot camp, Tahmina says, was presenting their project to real investors who then provided business insights and actionable feedback.

    As part of this ReACT cohort, Tahmina also participated in the Global Apprenticeship Program (GAP) pilot, an initiative led by Talanta and with the participation of MIT Open Learning as curriculum provider. The GAP initiative focuses on improving diverse emerging talent job preparedness and exploring how companies can successfully recruit, onboard, and retain this talent through remote, paid internships. Through the GAP pilot, Tahmina received training in professional skills, resume and interview preparation, and was matched with a financial sector firm for a four-month remote internship in data science.

    To prepare Tahmina and other learners for these professional experiences, ReACT trains its cohorts to work with people who have diverse backgrounds, experiences, and challenges. The nonprofit Na’amal offered workshops covering areas such as problem-solving, innovation and ideation, goal-setting, communication, teamwork, and infrastructure and info security. Tahmina was able to access English classes and learn valuable career skills, such as writing a resume.“This was an amazing part for me. There’s a huge difference going from theoretical to practical,” she says. “Not only do you have to have the theoretical experience, you have to have soft skills. You have to communicate everything you learn to other people, because other people in the business might not have that knowledge, so you have to tell the story in a way that they can understand.”

    ReACT wanted the women in the program to be mentored by women who were not only leaders in the tech field, but working in the same geographic region as learners. At the start of the internship, Na’amal connected Tahmina with a mentor, Maha Gad, who is head of talent development at Talabat and lives in Dubai. Tahmina met with Gad at the beginning and end of each month, giving her the opportunity to ask expansive questions. Tahmina says Gad encouraged her to research and plan first, and then worked with her to explore new tools, like Trello.

    Wanting to put her skills to use locally, Tahmina volunteered at the nonprofit Rumie, a community for Afghan women and girls, working as a learning designer, translator, team leader, and social media manager. She currently volunteers at Correspondents of the World as a story ambassador, helping Afghan people share stories, community, and culture — especially telling the stories of Afghan women and the changes they’ve made in the world.

    “It’s been the most beautiful journey of my life that I will never forget,” says Tahmina. “I found ReACT at a time when I had nothing, and I found the most valuable thing.” More

  • in

    Drones navigate unseen environments with liquid neural networks

    In the vast, expansive skies where birds once ruled supreme, a new crop of aviators is taking flight. These pioneers of the air are not living creatures, but rather a product of deliberate innovation: drones. But these aren’t your typical flying bots, humming around like mechanical bees. Rather, they’re avian-inspired marvels that soar through the sky, guided by liquid neural networks to navigate ever-changing and unseen environments with precision and ease.

    Inspired by the adaptable nature of organic brains, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have introduced a method for robust flight navigation agents to master vision-based fly-to-target tasks in intricate, unfamiliar environments. The liquid neural networks, which can continuously adapt to new data inputs, showed prowess in making reliable decisions in unknown domains like forests, urban landscapes, and environments with added noise, rotation, and occlusion. These adaptable models, which outperformed many state-of-the-art counterparts in navigation tasks, could enable potential real-world drone applications like search and rescue, delivery, and wildlife monitoring.

    The researchers’ recent study, published today in Science Robotics, details how this new breed of agents can adapt to significant distribution shifts, a long-standing challenge in the field. The team’s new class of machine-learning algorithms, however, captures the causal structure of tasks from high-dimensional, unstructured data, such as pixel inputs from a drone-mounted camera. These networks can then extract crucial aspects of a task (i.e., understand the task at hand) and ignore irrelevant features, allowing acquired navigation skills to transfer targets seamlessly to new environments.

    Play video

    Drones navigate unseen environments with liquid neural networks.

    “We are thrilled by the immense potential of our learning-based control approach for robots, as it lays the groundwork for solving problems that arise when training in one environment and deploying in a completely distinct environment without additional training,” says Daniela Rus, CSAIL director and the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT. “Our experiments demonstrate that we can effectively teach a drone to locate an object in a forest during summer, and then deploy the model in winter, with vastly different surroundings, or even in urban settings, with varied tasks such as seeking and following. This adaptability is made possible by the causal underpinnings of our solutions. These flexible algorithms could one day aid in decision-making based on data streams that change over time, such as medical diagnosis and autonomous driving applications.”

    A daunting challenge was at the forefront: Do machine-learning systems understand the task they are given from data when flying drones to an unlabeled object? And, would they be able to transfer their learned skill and task to new environments with drastic changes in scenery, such as flying from a forest to an urban landscape? What’s more, unlike the remarkable abilities of our biological brains, deep learning systems struggle with capturing causality, frequently over-fitting their training data and failing to adapt to new environments or changing conditions. This is especially troubling for resource-limited embedded systems, like aerial drones, that need to traverse varied environments and respond to obstacles instantaneously. 

    The liquid networks, in contrast, offer promising preliminary indications of their capacity to address this crucial weakness in deep learning systems. The team’s system was first trained on data collected by a human pilot, to see how they transferred learned navigation skills to new environments under drastic changes in scenery and conditions. Unlike traditional neural networks that only learn during the training phase, the liquid neural net’s parameters can change over time, making them not only interpretable, but more resilient to unexpected or noisy data. 

    In a series of quadrotor closed-loop control experiments, the drones underwent range tests, stress tests, target rotation and occlusion, hiking with adversaries, triangular loops between objects, and dynamic target tracking. They tracked moving targets, and executed multi-step loops between objects in never-before-seen environments, surpassing performance of other cutting-edge counterparts. 

    The team believes that the ability to learn from limited expert data and understand a given task while generalizing to new environments could make autonomous drone deployment more efficient, cost-effective, and reliable. Liquid neural networks, they noted, could enable autonomous air mobility drones to be used for environmental monitoring, package delivery, autonomous vehicles, and robotic assistants. 

    “The experimental setup presented in our work tests the reasoning capabilities of various deep learning systems in controlled and straightforward scenarios,” says MIT CSAIL Research Affiliate Ramin Hasani. “There is still so much room left for future research and development on more complex reasoning challenges for AI systems in autonomous navigation applications, which has to be tested before we can safely deploy them in our society.”

    “Robust learning and performance in out-of-distribution tasks and scenarios are some of the key problems that machine learning and autonomous robotic systems have to conquer to make further inroads in society-critical applications,” says Alessio Lomuscio, professor of AI safety in the Department of Computing at Imperial College London. “In this context, the performance of liquid neural networks, a novel brain-inspired paradigm developed by the authors at MIT, reported in this study is remarkable. If these results are confirmed in other experiments, the paradigm here developed will contribute to making AI and robotic systems more reliable, robust, and efficient.”

    Clearly, the sky is no longer the limit, but rather a vast playground for the boundless possibilities of these airborne marvels. 

    Hasani and PhD student Makram Chahine; Patrick Kao ’22, MEng ’22; and PhD student Aaron Ray SM ’21 wrote the paper with Ryan Shubert ’20, MEng ’22; MIT postdocs Mathias Lechner and Alexander Amini; and Rus.

    This research was supported, in part, by Schmidt Futures, the U.S. Air Force Research Laboratory, the U.S. Air Force Artificial Intelligence Accelerator, and the Boeing Co. More

  • in

    A method for designing neural networks optimally suited for certain tasks

    Neural networks, a type of machine-learning model, are being used to help humans complete a wide variety of tasks, from predicting if someone’s credit score is high enough to qualify for a loan to diagnosing whether a patient has a certain disease. But researchers still have only a limited understanding of how these models work. Whether a given model is optimal for certain task remains an open question.

    MIT researchers have found some answers. They conducted an analysis of neural networks and proved that they can be designed so they are “optimal,” meaning they minimize the probability of misclassifying borrowers or patients into the wrong category when the networks are given a lot of labeled training data. To achieve optimality, these networks must be built with a specific architecture.

    The researchers discovered that, in certain situations, the building blocks that enable a neural network to be optimal are not the ones developers use in practice. These optimal building blocks, derived through the new analysis, are unconventional and haven’t been considered before, the researchers say.

    In a paper published this week in the Proceedings of the National Academy of Sciences, they describe these optimal building blocks, called activation functions, and show how they can be used to design neural networks that achieve better performance on any dataset. The results hold even as the neural networks grow very large. This work could help developers select the correct activation function, enabling them to build neural networks that classify data more accurately in a wide range of application areas, explains senior author Caroline Uhler, a professor in the Department of Electrical Engineering and Computer Science (EECS).

    “While these are new activation functions that have never been used before, they are simple functions that someone could actually implement for a particular problem. This work really shows the importance of having theoretical proofs. If you go after a principled understanding of these models, that can actually lead you to new activation functions that you would otherwise never have thought of,” says Uhler, who is also co-director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS) and its Institute for Data, Systems and Society (IDSS).

    Joining Uhler on the paper are lead author Adityanarayanan Radhakrishnan, an EECS graduate student and an Eric and Wendy Schmidt Center Fellow, and Mikhail Belkin, a professor in the Halicioğlu Data Science Institute at the University of California at San Diego.

    Activation investigation

    A neural network is a type of machine-learning model that is loosely based on the human brain. Many layers of interconnected nodes, or neurons, process data. Researchers train a network to complete a task by showing it millions of examples from a dataset.

    For instance, a network that has been trained to classify images into categories, say dogs and cats, is given an image that has been encoded as numbers. The network performs a series of complex multiplication operations, layer by layer, until the result is just one number. If that number is positive, the network classifies the image a dog, and if it is negative, a cat.

    Activation functions help the network learn complex patterns in the input data. They do this by applying a transformation to the output of one layer before data are sent to the next layer. When researchers build a neural network, they select one activation function to use. They also choose the width of the network (how many neurons are in each layer) and the depth (how many layers are in the network.)

    “It turns out that, if you take the standard activation functions that people use in practice, and keep increasing the depth of the network, it gives you really terrible performance. We show that if you design with different activation functions, as you get more data, your network will get better and better,” says Radhakrishnan.

    He and his collaborators studied a situation in which a neural network is infinitely deep and wide — which means the network is built by continually adding more layers and more nodes — and is trained to perform classification tasks. In classification, the network learns to place data inputs into separate categories.

    “A clean picture”

    After conducting a detailed analysis, the researchers determined that there are only three ways this kind of network can learn to classify inputs. One method classifies an input based on the majority of inputs in the training data; if there are more dogs than cats, it will decide every new input is a dog. Another method classifies by choosing the label (dog or cat) of the training data point that most resembles the new input.

    The third method classifies a new input based on a weighted average of all the training data points that are similar to it. Their analysis shows that this is the only method of the three that leads to optimal performance. They identified a set of activation functions that always use this optimal classification method.

    “That was one of the most surprising things — no matter what you choose for an activation function, it is just going to be one of these three classifiers. We have formulas that will tell you explicitly which of these three it is going to be. It is a very clean picture,” he says.

    They tested this theory on a several classification benchmarking tasks and found that it led to improved performance in many cases. Neural network builders could use their formulas to select an activation function that yields improved classification performance, Radhakrishnan says.

    In the future, the researchers want to use what they’ve learned to analyze situations where they have a limited amount of data and for networks that are not infinitely wide or deep. They also want to apply this analysis to situations where data do not have labels.

    “In deep learning, we want to build theoretically grounded models so we can reliably deploy them in some mission-critical setting. This is a promising approach at getting toward something like that — building architectures in a theoretically grounded way that translates into better results in practice,” he says.

    This work was supported, in part, by the National Science Foundation, Office of Naval Research, the MIT-IBM Watson AI Lab, the Eric and Wendy Schmidt Center at the Broad Institute, and a Simons Investigator Award. More

  • in

    Strengthening trust in machine-learning models

    Probabilistic machine learning methods are becoming increasingly powerful tools in data analysis, informing a range of critical decisions across disciplines and applications, from forecasting election results to predicting the impact of microloans on addressing poverty.

    This class of methods uses sophisticated concepts from probability theory to handle uncertainty in decision-making. But the math is only one piece of the puzzle in determining their accuracy and effectiveness. In a typical data analysis, researchers make many subjective choices, or potentially introduce human error, that must also be assessed in order to cultivate users’ trust in the quality of decisions based on these methods.

    To address this issue, MIT computer scientist Tamara Broderick, associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems (LIDS), and a team of researchers have developed a classification system — a “taxonomy of trust” — that defines where trust might break down in a data analysis and identifies strategies to strengthen trust at each step. The other researchers on the project are Professor Anna Smith at the University of Kentucky, professors Tian Zheng and Andrew Gelman at Columbia University, and Professor Rachael Meager at the London School of Economics. The team’s hope is to highlight concerns that are already well-studied and those that need more attention.

    In their paper, published in February in Science Advances, the researchers begin by detailing the steps in the data analysis process where trust might break down: Analysts make choices about what data to collect and which models, or mathematical representations, most closely mirror the real-life problem or question they are aiming to answer. They select algorithms to fit the model and use code to run those algorithms. Each of these steps poses unique challenges around building trust. Some components can be checked for accuracy in measurable ways. “Does my code have bugs?”, for example, is a question that can be tested against objective criteria. Other times, problems are more subjective, with no clear-cut answers; analysts are confronted with numerous strategies to gather data and decide whether a model reflects the real world.

    “What I think is nice about making this taxonomy, is that it really highlights where people are focusing. I think a lot of research naturally focuses on this level of ‘are my algorithms solving a particular mathematical problem?’ in part because it’s very objective, even if it’s a hard problem,” Broderick says.

    “I think it’s really hard to answer ‘is it reasonable to mathematize an important applied problem in a certain way?’ because it’s somehow getting into a harder space, it’s not just a mathematical problem anymore.”

    Capturing real life in a model

    The researchers’ work in categorizing where trust breaks down, though it may seem abstract, is rooted in real-world application.

    Meager, a co-author on the paper, analyzed whether microfinances can have a positive effect in a community. The project became a case study for where trust could break down, and ways to reduce this risk.

    At first look, measuring the impact of microfinancing might seem like a straightforward endeavor. But like any analysis, researchers meet challenges at each step in the process that can affect trust in the outcome. Microfinancing — in which individuals or small businesses receive small loans and other financial services in lieu of conventional banking — can offer different services, depending on the program. For the analysis, Meager gathered datasets from microfinance programs in countries across the globe, including in Mexico, Mongolia, Bosnia, and the Philippines.

    When combining conspicuously distinct datasets, in this case from multiple countries and across different cultures and geographies, researchers must evaluate whether specific case studies can reflect broader trends. It is also important to contextualize the data on hand. For example, in rural Mexico, owning goats may be counted as an investment.

    “It’s hard to measure the quality of life of an individual. People measure things like, ‘What’s the business profit of the small business?’ Or ‘What’s the consumption level of a household?’ There’s this potential for mismatch between what you ultimately really care about, and what you’re measuring,” Broderick says. “Before we get to the mathematical level, what data and what assumptions are we leaning on?”

    With data on hand, analysts must define the real-world questions they seek to answer. In the case of evaluating the benefits of microfinancing, analysts must define what they consider a positive outcome. It is standard in economics, for example, to measure the average financial gain per business in communities where a microfinance program is introduced. But reporting an average might suggest a net positive effect even if only a few (or even one) person benefited, instead of the community as a whole.

    “What you really wanted was that a lot of people are benefiting,” Broderick says. “It sounds simple. Why didn’t we measure the thing that we cared about? But I think it’s really common that practitioners use standard machine learning tools, for a lot of reasons. And these tools might report a proxy that doesn’t always agree with the quantity of interest.”

    Analysts may consciously or subconsciously favor models they are familiar with, especially after investing a great deal of time learning their ins and outs. “Someone might be hesitant to try a nonstandard method because they might be less certain they will use it correctly. Or peer review might favor certain familiar methods, even if a researcher might like to use nonstandard methods,” Broderick says. “There are a lot of reasons, sociologically. But this can be a concern for trust.”

    Final step, checking the code 

    While distilling a real-life problem into a model can be a big-picture, amorphous problem, checking the code that runs an algorithm can feel “prosaic,” Broderick says. But it is another potentially overlooked area where trust can be strengthened.

    In some cases, checking a coding pipeline that executes an algorithm might be considered outside the purview of an analyst’s job, especially when there is the option to use standard software packages.

    One way to catch bugs is to test whether code is reproducible. Depending on the field, however, sharing code alongside published work is not always a requirement or the norm. As models increase in complexity over time, it becomes harder to recreate code from scratch. Reproducing a model becomes difficult or even impossible.

    “Let’s just start with every journal requiring you to release your code. Maybe it doesn’t get totally double-checked, and everything isn’t absolutely perfect, but let’s start there,” Broderick says, as one step toward building trust.

    Paper co-author Gelman worked on an analysis that forecast the 2020 U.S. presidential election using state and national polls in real-time. The team published daily updates in The Economist magazine, while also publishing their code online for anyone to download and run themselves. Throughout the season, outsiders pointed out both bugs and conceptual problems in the model, ultimately contributing to a stronger analysis.

    The researchers acknowledge that while there is no single solution to create a perfect model, analysts and scientists have the opportunity to reinforce trust at nearly every turn.

    “I don’t think we expect any of these things to be perfect,” Broderick says, “but I think we can expect them to be better or to be as good as possible.” More

  • in

    Festival of Learning 2023 underscores importance of well-designed learning environments

    During its first in-person gathering since 2020, MIT’s Festival of Learning 2023 explored how the learning sciences can inform the Institute on how to best support students. Co-sponsored by MIT Open Learning and the Office of the Vice Chancellor (OVC), this annual event celebrates teaching and learning innovations with MIT instructors, students, and staff.

    Bror Saxberg SM ’85, PhD ’89, founder of LearningForge LLC and former chief learning officer at Kaplan, Inc., was invited as keynote speaker, with opening remarks by MIT Chancellor Melissa Nobles and Vice President for Open Learning Eric Grimson, and discussion moderated by Senior Associate Dean of Open Learning Christopher Capozzola. This year’s festival focused on how creating well-designed learning environments using learning engineering can increase learning success.

    Play video

    2023 Festival of Learning: Highlights

    Well-designed learning environments are key

    In his keynote speech “Learning Engineering: What We Know, What We Can Do,” Saxberg defined “learning engineering” as the practical application of learning sciences to real-world problems at scale. He said, “High levels can be reached by all learners, given access to well-designed instruction and motivation for enough practice opportunities.”

    Informed by decades of empirical evidence from the field of learning science, Saxberg’s own research, and insights from Kaplan, Inc., Saxberg finds that a hands-on strategy he calls “prepare, practice, perform” delivers better learning outcomes than a traditional “read, write, discuss” approach. Saxberg recommends educators devote at least 60 percent of learning time to hands-on approaches, such as producing, creating, and engaging. Only 20-30 percent of learning time should be spent in the more passive “knowledge acquisition” modes of listening and reading.

    “Here at MIT, a place that relies on data to make informed decisions, learning engineering can provide a framework for us to center in on the learner to identify the challenges associated with learning, and to apply the learning sciences in data-driven ways to improve instructional approaches,” said Nobles. During their opening remarks, Nobles and Grimson both emphasized how learning engineering at MIT is informed by the Institute’s commitment to educating the whole student, which encompasses student well-being and belonging in addition to academic rigor. “What lessons can we take away to change the way we think about education moving forward? This is a chance to iterate,” said Grimson.

    Well-designed learning environments are informed by understanding motivation, considering the connection between long-term and working memory, identifying the range of learners’ prior experience, grounding practice in authentic contexts (i.e., work environments), and using data-driven instructional approaches to iterate and improve.

    Play video

    2023 Festival of Learning: Keynote by Bror Saxberg

    Understand learner motivation

    Saxberg asserted that before developing course structures and teaching approaches known to encourage learning, educators must first examine learner motivation. Motivation doesn’t require enjoyment of the subject or task to spur engagement. Similar to how a well-designed physical training program can change your muscle cells, if a learner starts, persists, and exerts mental effort in a well-designed learning environment, they can change their neurons — they learn. Saxberg described four main barriers to learner motivation, and solutions for each:

    The learner doesn’t see the value of the lesson. Ways to address this include helping the learners find value; leveraging the learner’s expertise in another area to better understand the topic at hand; and making the activity itself enjoyable. “Finding value” could be as simple as explaining the practical applications of this knowledge in their future work in the field, or how this lesson prepares learners for their advanced level courses. 
    Self-efficacy for learners who don’t think they’re capable. Educators can point to parallel experiences with similar goals that students may have already achieved in another context. Alternatively, educators can share stories of professionals who have successfully transitioned from one area of expertise to another. 
    “Something” in the learner’s way, such as not having the time, space, or correct materials. This is an opportunity to demonstrate how a learner can use problem-solving skills to find a solution to their perceived problem. As with the barrier of self-efficacy, educators can assure learners that they are in control of the situation by sharing similar stories of those who’ve encountered the same problem and the solution they devised.
    The learner’s emotional state. This is no small barrier to motivation. If a learner is angry, depressed, scared, or grieving, it will be challenging for them to switch their mindset into learning mode. A wide array of emotions require a wide array of possible solutions, from structured conversation techniques to recommending professional help.
    Consider the cognitive load

    Saxberg has found that learning occurs when we use working memory to problem-solve, but our working memory can only process three to five verbal or conscious thoughts at a time. Long-term memory stores knowledge that can be accessed non-verbally and non-consciously, which is why experts appear to remember information effortlessly. Until a learner develops that expertise, extraneous information in a lesson will occupy space in their working memory, running the risk of distracting the learner from the desired learning outcome.

    To accommodate learners’ finite cognitive load, Saxberg suggested the solution of reevaluating which material is essential, then simplifying the exercise or removing unnecessary material accordingly. “That notion of, ‘what do we really need students to be able to do?’ helps you focus,” said Saxberg.

    Another solution is to leverage the knowledge, skills, and interests learners already bring to the course — these long-term memories can scaffold the new material. “What do you have in your head already, what do you love, what’s easy to draw from long-term memory? That would be the starting point for challenging new skills. It’s not the ending point because you want to use your new skills to then find out new things,” Saxberg said. Finally, consider how your course engages with the syllabi. Do you explain the reasoning behind the course structure? Do you show how the exercises or material will be applied to future courses or the field? Do you share best practices for engaging working memory and learning? By acknowledging and empathizing with the practical challenges that learners face, you can remove a barrier from their cognitive load.

    Ground practice in authentic contexts

    Saxberg stated that few experts read textbooks to learn new information — they discover what they need to know while working in the field, using those relevant facts in context. As such, students will have an easier time remembering facts if they’re practicing in relevant or similar environments to their future work.

    If students can practice classifying problems in real work contexts rather than theoretical practice problems, they can build a framework to classify what’s important. That helps students recognize the type of problem they’re trying to solve before trying to solve the problem itself. With enough hands-on practice and examples of how experts use processes and identify which principles are relevant, learners can holistically learn entire procedures. And that learning continues once learners graduate to the workforce: professionals often meet to exchange knowledge at conferences, charrettes, and other gatherings.

    Enhancing teaching at MIT

    The Festival of Learning furthers the Office of the Chancellor’s mission to advance academic innovation that will foster the growth of MIT students. The festival also aligns with the MIT Open Learning’s Residential Education team’s goal of making MIT education more effective and efficient. Throughout the year, their team offers continuous support to MIT faculty and instructors using digital technologies to augment and transform how they teach.

    “We are doubling down on our commitment to continuous growth in how we teach,” said Nobles. More