More stories

  • in

    Learning on the edge

    Microcontrollers, miniature computers that can run simple commands, are the basis for billions of connected devices, from internet-of-things (IoT) devices to sensors in automobiles. But cheap, low-power microcontrollers have extremely limited memory and no operating system, making it challenging to train artificial intelligence models on “edge devices” that work independently from central computing resources.

    Training a machine-learning model on an intelligent edge device allows it to adapt to new data and make better predictions. For instance, training a model on a smart keyboard could enable the keyboard to continually learn from the user’s writing. However, the training process requires so much memory that it is typically done using powerful computers at a data center, before the model is deployed on a device. This is more costly and raises privacy issues since user data must be sent to a central server.

    To address this problem, researchers at MIT and the MIT-IBM Watson AI Lab developed a new technique that enables on-device training using less than a quarter of a megabyte of memory. Other training solutions designed for connected devices can use more than 500 megabytes of memory, greatly exceeding the 256-kilobyte capacity of most microcontrollers (there are 1,024 kilobytes in one megabyte).

    The intelligent algorithms and framework the researchers developed reduce the amount of computation required to train a model, which makes the process faster and more memory efficient. Their technique can be used to train a machine-learning model on a microcontroller in a matter of minutes.

    This technique also preserves privacy by keeping data on the device, which could be especially beneficial when data are sensitive, such as in medical applications. It also could enable customization of a model based on the needs of users. Moreover, the framework preserves or improves the accuracy of the model when compared to other training approaches.

    “Our study enables IoT devices to not only perform inference but also continuously update the AI models to newly collected data, paving the way for lifelong on-device learning. The low resource utilization makes deep learning more accessible and can have a broader reach, especially for low-power edge devices,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and senior author of the paper describing this innovation.

    Joining Han on the paper are co-lead authors and EECS PhD students Ji Lin and Ligeng Zhu, as well as MIT postdocs Wei-Ming Chen and Wei-Chen Wang, and Chuang Gan, a principal research staff member at the MIT-IBM Watson AI Lab. The research will be presented at the Conference on Neural Information Processing Systems.

    Han and his team previously addressed the memory and computational bottlenecks that exist when trying to run machine-learning models on tiny edge devices, as part of their TinyML initiative.

    Lightweight training

    A common type of machine-learning model is known as a neural network. Loosely based on the human brain, these models contain layers of interconnected nodes, or neurons, that process data to complete a task, such as recognizing people in photos. The model must be trained first, which involves showing it millions of examples so it can learn the task. As it learns, the model increases or decreases the strength of the connections between neurons, which are known as weights.

    The model may undergo hundreds of updates as it learns, and the intermediate activations must be stored during each round. In a neural network, activation is the middle layer’s intermediate results. Because there may be millions of weights and activations, training a model requires much more memory than running a pre-trained model, Han explains.

    Han and his collaborators employed two algorithmic solutions to make the training process more efficient and less memory-intensive. The first, known as sparse update, uses an algorithm that identifies the most important weights to update at each round of training. The algorithm starts freezing the weights one at a time until it sees the accuracy dip to a set threshold, then it stops. The remaining weights are updated, while the activations corresponding to the frozen weights don’t need to be stored in memory.

    “Updating the whole model is very expensive because there are a lot of activations, so people tend to update only the last layer, but as you can imagine, this hurts the accuracy. For our method, we selectively update those important weights and make sure the accuracy is fully preserved,” Han says.

    Their second solution involves quantized training and simplifying the weights, which are typically 32 bits. An algorithm rounds the weights so they are only eight bits, through a process known as quantization, which cuts the amount of memory for both training and inference. Inference is the process of applying a model to a dataset and generating a prediction. Then the algorithm applies a technique called quantization-aware scaling (QAS), which acts like a multiplier to adjust the ratio between weight and gradient, to avoid any drop in accuracy that may come from quantized training.

    The researchers developed a system, called a tiny training engine, that can run these algorithmic innovations on a simple microcontroller that lacks an operating system. This system changes the order of steps in the training process so more work is completed in the compilation stage, before the model is deployed on the edge device.

    “We push a lot of the computation, such as auto-differentiation and graph optimization, to compile time. We also aggressively prune the redundant operators to support sparse updates. Once at runtime, we have much less workload to do on the device,” Han explains.

    A successful speedup

    Their optimization only required 157 kilobytes of memory to train a machine-learning model on a microcontroller, whereas other techniques designed for lightweight training would still need between 300 and 600 megabytes.

    They tested their framework by training a computer vision model to detect people in images. After only 10 minutes of training, it learned to complete the task successfully. Their method was able to train a model more than 20 times faster than other approaches.

    Now that they have demonstrated the success of these techniques for computer vision models, the researchers want to apply them to language models and different types of data, such as time-series data. At the same time, they want to use what they’ve learned to shrink the size of larger models without sacrificing accuracy, which could help reduce the carbon footprint of training large-scale machine-learning models.

    “AI model adaptation/training on a device, especially on embedded controllers, is an open challenge. This research from MIT has not only successfully demonstrated the capabilities, but also opened up new possibilities for privacy-preserving device personalization in real-time,” says Nilesh Jain, a principal engineer at Intel who was not involved with this work. “Innovations in the publication have broader applicability and will ignite new systems-algorithm co-design research.”

    “On-device learning is the next major advance we are working toward for the connected intelligent edge. Professor Song Han’s group has shown great progress in demonstrating the effectiveness of edge devices for training,” adds Jilei Hou, vice president and head of AI research at Qualcomm. “Qualcomm has awarded his team an Innovation Fellowship for further innovation and advancement in this area.”

    This work is funded by the National Science Foundation, the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, Amazon, Intel, Qualcomm, Ford Motor Company, and Google. More

  • in

    Neurodegenerative disease can progress in newly identified patterns

    Neurodegenerative diseases — like amyotrophic lateral sclerosis (ALS, or Lou Gehrig’s disease), Alzheimer’s, and Parkinson’s — are complicated, chronic ailments that can present with a variety of symptoms, worsen at different rates, and have many underlying genetic and environmental causes, some of which are unknown. ALS, in particular, affects voluntary muscle movement and is always fatal, but while most people survive for only a few years after diagnosis, others live with the disease for decades. Manifestations of ALS can also vary significantly; often slower disease development correlates with onset in the limbs and affecting fine motor skills, while the more serious, bulbar ALS impacts swallowing, speaking, breathing, and mobility. Therefore, understanding the progression of diseases like ALS is critical to enrollment in clinical trials, analysis of potential interventions, and discovery of root causes.

    However, assessing disease evolution is far from straightforward. Current clinical studies typically assume that health declines on a downward linear trajectory on a symptom rating scale, and use these linear models to evaluate whether drugs are slowing disease progression. However, data indicate that ALS often follows nonlinear trajectories, with periods where symptoms are stable alternating with periods when they are rapidly changing. Since data can be sparse, and health assessments often rely on subjective rating metrics measured at uneven time intervals, comparisons across patient populations are difficult. These heterogenous data and progression, in turn, complicate analyses of invention effectiveness and potentially mask disease origin.

    Now, a new machine-learning method developed by researchers from MIT, IBM Research, and elsewhere aims to better characterize ALS disease progression patterns to inform clinical trial design.

    “There are groups of individuals that share progression patterns. For example, some seem to have really fast-progressing ALS and others that have slow-progressing ALS that varies over time,” says Divya Ramamoorthy PhD ’22, a research specialist at MIT and lead author of a new paper on the work that was published this month in Nature Computational Science. “The question we were asking is: can we use machine learning to identify if, and to what extent, those types of consistent patterns across individuals exist?”

    Their technique, indeed, identified discrete and robust clinical patterns in ALS progression, many of which are non-linear. Further, these disease progression subtypes were consistent across patient populations and disease metrics. The team additionally found that their method can be applied to Alzheimer’s and Parkinson’s diseases as well.

    Joining Ramamoorthy on the paper are MIT-IBM Watson AI Lab members Ernest Fraenkel, a professor in the MIT Department of Biological Engineering; Research Scientist Soumya Ghosh of IBM Research; and Principal Research Scientist Kenney Ng, also of IBM Research. Additional authors include Kristen Severson PhD ’18, a senior researcher at Microsoft Research and former member of the Watson Lab and of IBM Research; Karen Sachs PhD ’06 of Next Generation Analytics; a team of researchers with Answer ALS; Jonathan D. Glass and Christina N. Fournier of the Emory University School of Medicine; the Pooled Resource Open-Access ALS Clinical Trials Consortium; ALS/MND Natural History Consortium; Todd M. Herrington of Massachusetts General Hospital (MGH) and Harvard Medical School; and James D. Berry of MGH.

    Play video

    MIT Professor Ernest Fraenkel describes early stages of his research looking at root causes of amyotrophic lateral sclerosis (ALS).

    Reshaping health decline

    After consulting with clinicians, the team of machine learning researchers and neurologists let the data speak for itself. They designed an unsupervised machine-learning model that employed two methods: Gaussian process regression and Dirichlet process clustering. These inferred the health trajectories directly from patient data and automatically grouped similar trajectories together without prescribing the number of clusters or the shape of the curves, forming ALS progression “subtypes.” Their method incorporated prior clinical knowledge in the way of a bias for negative trajectories — consistent with expectations for neurodegenerative disease progressions — but did not assume any linearity. “We know that linearity is not reflective of what’s actually observed,” says Ng. “The methods and models that we use here were more flexible, in the sense that, they capture what was seen in the data,” without the need for expensive labeled data and prescription of parameters.

    Primarily, they applied the model to five longitudinal datasets from ALS clinical trials and observational studies. These used the gold standard to measure symptom development: the ALS functional rating scale revised (ALSFRS-R), which captures a global picture of patient neurological impairment but can be a bit of a “messy metric.” Additionally, performance on survivability probabilities, forced vital capacity (a measurement of respiratory function), and subscores of ALSFRS-R, which looks at individual bodily functions, were incorporated.

    New regimes of progression and utility

    When their population-level model was trained and tested on these metrics, four dominant patterns of disease popped out of the many trajectories — sigmoidal fast progression, stable slow progression, unstable slow progression, and unstable moderate progression — many with strong nonlinear characteristics. Notably, it captured trajectories where patients experienced a sudden loss of ability, called a functional cliff, which would significantly impact treatments, enrollment in clinical trials, and quality of life.

    The researchers compared their method against other commonly used linear and nonlinear approaches in the field to separate the contribution of clustering and linearity to the model’s accuracy. The new work outperformed them, even patient-specific models, and found that subtype patterns were consistent across measures. Impressively, when data were withheld, the model was able to interpolate missing values, and, critically, could forecast future health measures. The model could also be trained on one ALSFRS-R dataset and predict cluster membership in others, making it robust, generalizable, and accurate with scarce data. So long as 6-12 months of data were available, health trajectories could be inferred with higher confidence than conventional methods.

    The researchers’ approach also provided insights into Alzheimer’s and Parkinson’s diseases, both of which can have a range of symptom presentations and progression. For Alzheimer’s, the new technique could identify distinct disease patterns, in particular variations in the rates of conversion of mild to severe disease. The Parkinson’s analysis demonstrated a relationship between progression trajectories for off-medication scores and disease phenotypes, such as the tremor-dominant or postural instability/gait difficulty forms of Parkinson’s disease.

    The work makes significant strides to find the signal amongst the noise in the time-series of complex neurodegenerative disease. “The patterns that we see are reproducible across studies, which I don’t believe had been shown before, and that may have implications for how we subtype the [ALS] disease,” says Fraenkel. As the FDA has been considering the impact of non-linearity in clinical trial designs, the team notes that their work is particularly pertinent.

    As new ways to understand disease mechanisms come online, this model provides another tool to pick apart illnesses like ALS, Alzheimer’s, and Parkinson’s from a systems biology perspective.

    “We have a lot of molecular data from the same patients, and so our long-term goal is to see whether there are subtypes of the disease,” says Fraenkel, whose lab looks at cellular changes to understand the etiology of diseases and possible targets for cures. “One approach is to start with the symptoms … and see if people with different patterns of disease progression are also different at the molecular level. That might lead you to a therapy. Then there’s the bottom-up approach, where you start with the molecules” and try to reconstruct biological pathways that might be affected. “We’re going [to be tackling this] from both ends … and finding if something meets in the middle.”

    This research was supported, in part, by the MIT-IBM Watson AI Lab, the Muscular Dystrophy Association, Department of Veterans Affairs of Research and Development, the Department of Defense, NSF Gradate Research Fellowship Program, Siebel Scholars Fellowship, Answer ALS, the United States Army Medical Research Acquisition Activity, National Institutes of Health, and the NIH/NINDS. More

  • in

    Q&A: Global challenges surrounding the deployment of AI

    The AI Policy Forum (AIPF) is an initiative of the MIT Schwarzman College of Computing to move the global conversation about the impact of artificial intelligence from principles to practical policy implementation. Formed in late 2020, AIPF brings together leaders in government, business, and academia to develop approaches to address the societal challenges posed by the rapid advances and increasing applicability of AI.

    The co-chairs of the AI Policy Forum are Aleksander Madry, the Cadence Design Systems Professor; Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing and head of the Department of Electrical Engineering and Computer Science; and Luis Videgaray, senior lecturer at MIT Sloan School of Management and director of MIT AI Policy for the World Project. Here, they discuss talk some of the key issues facing the AI policy landscape today and the challenges surrounding the deployment of AI. The three are co-organizers of the upcoming AI Policy Forum Summit on Sept. 28, which will further explore the issues discussed here.

    Q: Can you talk about the ­ongoing work of the AI Policy Forum and the AI policy landscape generally?

    Ozdaglar: There is no shortage of discussion about AI at different venues, but conversations are often high-level, focused on questions of ethics and principles, or on policy problems alone. The approach the AIPF takes to its work is to target specific questions with actionable policy solutions and engage with the stakeholders working directly in these areas. We work “behind the scenes” with smaller focus groups to tackle these challenges and aim to bring visibility to some potential solutions alongside the players working directly on them through larger gatherings.

    Q: AI impacts many sectors, which makes us naturally worry about its trustworthiness. Are there any emerging best practices for development and deployment of trustworthy AI?

    Madry: The most important thing to understand regarding deploying trustworthy AI is that AI technology isn’t some natural, preordained phenomenon. It is something built by people. People who are making certain design decisions.

    We thus need to advance research that can guide these decisions as well as provide more desirable solutions. But we also need to be deliberate and think carefully about the incentives that drive these decisions. 

    Now, these incentives stem largely from the business considerations, but not exclusively so. That is, we should also recognize that proper laws and regulations, as well as establishing thoughtful industry standards have a big role to play here too.

    Indeed, governments can put in place rules that prioritize the value of deploying AI while being keenly aware of the corresponding downsides, pitfalls, and impossibilities. The design of such rules will be an ongoing and evolving process as the technology continues to improve and change, and we need to adapt to socio-political realities as well.

    Q: Perhaps one of the most rapidly evolving domains in AI deployment is in the financial sector. From a policy perspective, how should governments, regulators, and lawmakers make AI work best for consumers in finance?

    Videgaray: The financial sector is seeing a number of trends that present policy challenges at the intersection of AI systems. For one, there is the issue of explainability. By law (in the U.S. and in many other countries), lenders need to provide explanations to customers when they take actions deleterious in whatever way, like denial of a loan, to a customer’s interest. However, as financial services increasingly rely on automated systems and machine learning models, the capacity of banks to unpack the “black box” of machine learning to provide that level of mandated explanation becomes tenuous. So how should the finance industry and its regulators adapt to this advance in technology? Perhaps we need new standards and expectations, as well as tools to meet these legal requirements.

    Meanwhile, economies of scale and data network effects are leading to a proliferation of AI outsourcing, and more broadly, AI-as-a-service is becoming increasingly common in the finance industry. In particular, we are seeing fintech companies provide the tools for underwriting to other financial institutions — be it large banks or small, local credit unions. What does this segmentation of the supply chain mean for the industry? Who is accountable for the potential problems in AI systems deployed through several layers of outsourcing? How can regulators adapt to guarantee their mandates of financial stability, fairness, and other societal standards?

    Q: Social media is one of the most controversial sectors of the economy, resulting in many societal shifts and disruptions around the world. What policies or reforms might be needed to best ensure social media is a force for public good and not public harm?

    Ozdaglar: The role of social media in society is of growing concern to many, but the nature of these concerns can vary quite a bit — with some seeing social media as not doing enough to prevent, for example, misinformation and extremism, and others seeing it as unduly silencing certain viewpoints. This lack of unified view on what the problem is impacts the capacity to enact any change. All of that is additionally coupled with the complexities of the legal framework in the U.S. spanning the First Amendment, Section 230 of the Communications Decency Act, and trade laws.

    However, these difficulties in regulating social media do not mean that there is nothing to be done. Indeed, regulators have begun to tighten their control over social media companies, both in the United States and abroad, be it through antitrust procedures or other means. In particular, Ofcom in the U.K. and the European Union is already introducing new layers of oversight to platforms. Additionally, some have proposed taxes on online advertising to address the negative externalities caused by current social media business model. So, the policy tools are there, if the political will and proper guidance exists to implement them. More

  • in

    In-home wireless device tracks disease progression in Parkinson’s patients

    Parkinson’s disease is the fastest-growing neurological disease, now affecting more than 10 million people worldwide, yet clinicians still face huge challenges in tracking its severity and progression.

    Clinicians typically evaluate patients by testing their motor skills and cognitive functions during clinic visits. These semisubjective measurements are often skewed by outside factors — perhaps a patient is tired after a long drive to the hospital. More than 40 percent of individuals with Parkinson’s are never treated by a neurologist or Parkinson’s specialist, often because they live too far from an urban center or have difficulty traveling.

    In an effort to address these problems, researchers from MIT and elsewhere demonstrated an in-home device that can monitor a patient’s movement and gait speed, which can be used to evaluate Parkinson’s severity, the progression of the disease, and the patient’s response to medication.

    The device, which is about the size of a Wi-Fi router, gathers data passively using radio signals that reflect off the patient’s body as they move around their home. The patient does not need to wear a gadget or change their behavior. (A recent study, for example, showed that this type of device could be used to detect Parkinson’s from a person’s breathing patterns while sleeping.)

    The researchers used these devices to conduct a one-year at-home study with 50 participants. They showed that, by using machine-learning algorithms to analyze the troves of data they passively gathered (more than 200,000 gait speed measurements), a clinician could track Parkinson’s progression and medication response more effectively than they would with periodic, in-clinic evaluations.

    “By being able to have a device in the home that can monitor a patient and tell the doctor remotely about the progression of the disease, and the patient’s medication response so they can attend to the patient even if the patient can’t come to the clinic — now they have real, reliable information — that actually goes a long way toward improving equity and access,” says senior author Dina Katabi, the Thuan and Nicole Pham Professor in the Department of Electrical Engineering and Computer Science (EECS), and a principle investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Jameel Clinic.

    The co-lead authors are EECS graduate students Yingcheng Liu and Guo Zhang. The research is published today in Science Translational Medicine.

    A human radar

    This work utilizes a wireless device previously developed in the Katabi lab that analyzes radio signals that bounce off people’s bodies. It transmits signals that use a tiny fraction of the power of a Wi-Fi router — these super-low-power signals don’t interfere with other wireless devices in the home. While radio signals pass through walls and other solid objects, they are reflected off humans due to the water in our bodies.  

    This creates a “human radar” that can track the movement of a person in a room. Radio waves always travel at the same speed, so the length of time it takes the signals to reflect back to the device indicates how the person is moving.

    The device incorporates a machine-learning classifier that can pick out the precise radio signals reflected off the patient even when there are other people moving around the room. Advanced algorithms use these movement data to compute gait speed — how fast the person is walking.

    Because the device operates in the background and runs all day, every day, it can collect a massive amount of data. The researchers wanted to see if they could apply machine learning to these datasets to gain insights about the disease over time.

    They gathered 50 participants, 34 of whom had Parkinson’s, and conducted a one-year study of in-home gait measurements Through the study, the researchers collected more than 200,000 individual measurements that they averaged to smooth out variability due to the conditions irrelevant to the disease. (For example, a patient may hurry up to answer an alarm or walk slower when talking on the phone.)

    They used statistical methods to analyze the data and found that in-home gait speed can be used to effectively track Parkinson’s progression and severity. For instance, they showed that gait speed declined almost twice as fast for individuals with Parkinson’s, compared to those without. 

    “Monitoring the patient continuously as they move around the room enabled us to get really good measurements of their gait speed. And with so much data, we were able to perform aggregation that allowed us to see very small differences,” Zhang says.

    Better, faster results

    Drilling down on these variabilities offered some key insights. For instance, the researchers showed that daily fluctuations in a patient’s walking speed correspond with how they are responding to their medication — walking speed may improve after a dose and then begin to decline after a few hours, as the medication impact wears off.

    “This enables us to objectively measure how your mobility responds to your medication. Previously, this was very cumbersome to do because this medication effect could only be measured by having the patient keep a journal,” Liu says.

    A clinician could use these data to adjust medication dosage more effectively and accurately. This is especially important since drugs used to treat disease symptoms can cause serious side effects if the patient receives too much.

    The researchers were able to demonstrate statistically significant results regarding Parkinson’s progression after studying 50 people for just one year. By contrast, an often-cited study by the Michael J. Fox Foundation involved more than 500 individuals and monitored them for more than five years, Katabi says.

    “For a pharmaceutical company or a biotech company trying to develop medicines for this disease, this could greatly reduce the burden and cost and speed up the development of new therapies,” she adds.

    Katabi credits much of the study’s success to the dedicated team of scientists and clinicians who worked together to tackle the many difficulties that arose along the way. For one, they began the study before the Covid-19 pandemic, so team members initially visited people’s homes to set up the devices. When that was no longer possible, they developed a user-friendly phone app to remotely help participants as they deployed the device at home.

    Through the course of the study, they learned to automate processes and reduce effort, especially for the participants and clinical team.

    This knowledge will prove useful as they look to deploy devices in at-home studies of other neurological disorders, such as Alzheimer’s, ALS, and Huntington’s. They also want to explore how these methods could be used, in conjunction with other work from the Katabi lab showing that Parkinson’s can be diagnosed by monitoring breathing, to collect a holistic set of markers that could diagnose the disease early and then be used to track and treat it.

    “This radio-wave sensor can enable more care (and research) to migrate from hospitals to the home where it is most desired and needed,” says Ray Dorsey, a professor of neurology at the University of Rochester Medical Center, co-author of Ending Parkinson’s, and a co-author of this research paper. “Its potential is just beginning to be seen. We are moving toward a day where we can diagnose and predict disease at home. In the future, we may even be able to predict and ideally prevent events like falls and heart attacks.”

    This work is supported, in part, by the National Institutes of Health and the Michael J. Fox Foundation. More

  • in

    Empowering Cambridge youth through data activism

    For over 40 years, the Mayor’s Summer Youth Employment Program (MSYEP, or the Mayor’s Program) in Cambridge, Massachusetts, has been providing teenagers with their first work experience, but 2022 brought a new offering. Collaborating with MIT’s Personal Robots research group (PRG) and Responsible AI for Social Empowerment and Education (RAISE) this summer, MSYEP created a STEAM-focused learning site at the Institute. Eleven students joined the program to learn coding and programming skills through the lens of “Data Activism.”

    MSYEP’s partnership with MIT provides an opportunity for Cambridge high schoolers to gain exposure to more pathways for their future careers and education. The Mayor’s Program aims to respect students’ time and show the value of their work, so participants are compensated with an hourly wage as they learn workforce skills at MSYEP worksites. In conjunction with two ongoing research studies at MIT, PRG and RAISE developed the six-week Data Activism curriculum to equip students with critical-thinking skills so they feel prepared to utilize data science to challenge social injustice and empower their community.

    Rohan Kundargi, K-12 Community Outreach Administrator for MIT Office of Government and Community Relations (OGCR), says, “I see this as a model for a new type of partnership between MIT and Cambridge MSYEP. Specifically, an MIT research project that involves students from Cambridge getting paid to learn, research, and develop their own skills!”

    Cross-Cambridge collaboration

    Cambridge’s Office of Workforce Development initially contacted MIT OGCR about hosting a potential MSYEP worksite that taught Cambridge teens how to code. When Kundargi reached out to MIT pK-12 collaborators, MIT PRG’s graduate research assistant Raechel Walker proposed the Data Activism curriculum. Walker defines “data activism” as utilizing data, computing, and art to analyze how power operates in the world, challenge power, and empathize with people who are oppressed.

    Walker says, “I wanted students to feel empowered to incorporate their own expertise, talents, and interests into every activity. In order for students to fully embrace their academic abilities, they must remain comfortable with bringing their full selves into data activism.”

    As Kundargi and Walker recruited students for the Data Activism learning site, they wanted to make sure the cohort of students — the majority of whom are individuals of color — felt represented at MIT and felt they had the agency for their voice to be heard. “The pioneers in this field are people who look like them,” Walker says, speaking of well-known data activists Timnit Gebru, Rediet Abebe, and Joy Buolamwini.

    When the program began this summer, some of the students were not aware of the ways data science and artificial intelligence exacerbate systemic oppression in society, or some of the tools currently being used to mitigate those societal harms. As a result, Walker says, the students wanted to learn more about discriminatory design in every aspect of life. They were also interested in creating responsible machine learning algorithms and AI fairness metrics.

    A different side of STEAM

    The development and execution of the Data Activism curriculum contributed to Walker’s and postdoc Xiaoxue Du’s respective research at PRG. Walker is studying AI education, specifically creating and teaching data activism curricula for minoritized communities. Du’s research explores processes, assessments, and curriculum design that prepares educators to use, adapt, and integrate AI literacy curricula. Additionally, her research targets how to leverage more opportunities for students with diverse learning needs.

    The Data Activism curriculum utilizes a “libertatory computing” framework, a term Walker coined in her position paper with Professor Cynthia Breazeal, director of MIT RAISE, dean for digital learning, and head of PRG, and Eman Sherif, a then-undergraduate researcher from University of California at San Diego, titled “Liberty Computing for African American Students.” This framework ensures that students, especially minoritized students, acquire a sound racial identity, critical consciousness, collective obligation, liberation centered academic/achievement identity, as well as the activism skills to use computing to transform a multi-layered system of barriers in which racism persists. Walker says, “We encouraged students to demonstrate competency in every pillar because all of the pillars are interconnected and build upon each other.”

    Walker developed a series of interactive coding and project-based activities that focused on understanding systemic racism, utilizing data science to analyze systemic oppression, data drawing, responsible machine learning, how racism can be embedded into AI, and different AI fairness metrics.

    This was the students’ first time learning how to create data visualizations using the programming language Python and the data analysis tool Pandas. In one project meant to examine how different systems of oppression can affect different aspects of students’ own identities, students created datasets with data from their respective intersectional identities. Another activity highlighted African American achievements, where students analyzed two datasets about African American scientists, activists, artists, scholars, and athletes. Using the data visualizations, students then created zines about the African Americans who inspired them.

    RAISE hired Olivia Dias, Sophia Brady, Lina Henriquez, and Zeynep Yalcin through the MIT Undergraduate Research Opportunity Program (UROP) and PRG hired freelancer Matt Taylor to work with Walker on developing the curriculum and designing interdisciplinary experience projects. Walker and the four undergraduate researchers constructed an intersectional data analysis activity about different examples of systemic oppression. PRG also hired three high school students to test activities and offer insights about making the curriculum engaging for program participants. Throughout the program, the Data Activism team taught students in small groups, continually asked students how to improve each activity, and structured each lesson based on the students’ interests. Walker says Dias, Brady, Henriquez, and Yalcin were invaluable to cultivating a supportive classroom environment and helping students complete their projects.

    Cambridge Rindge and Latin School senior Nina works on her rubber block stamp that depicts the importance of representation in media and greater representation in the tech industry.

    Photo: Katherine Ouellette

    Previous item
    Next item

    Student Nina says, “It’s opened my eyes to a different side of STEM. I didn’t know what ‘data’ meant before this program, or how intersectionality can affect AI and data.” Before MSYEP, Nina took Intro to Computer Science and AP Computer Science, but she has been coding since Girls Who Code first sparked her interest in middle school. “The community was really nice. I could talk with other girls. I saw there needs to be more women in STEM, especially in coding.” Now she’s interested in applying to colleges with strong computer science programs so she can pursue a coding-related career.

    From MSYEP to the mayor’s office

    Mayor Sumbul Siddiqui visited the Data Activism learning site on Aug. 9, accompanied by Breazeal. A graduate of MSYEP herself, Siddiqui says, “Through hands-on learning through computer programming, Cambridge high school students have the unique opportunity to see themselves as data scientists. Students were able learn ways to combat discrimination that occurs through artificial intelligence.” In an Instagram post, Siddiqui also said, “I had a blast visiting the students and learning about their projects.”

    Students worked on an activity that asked them to envision how data science might be used to support marginalized communities. They transformed their answers into block-printed T-shirt designs, carving pictures of their hopes into rubber block stamps. Some students focused on the importance of data privacy, like Jacob T., who drew a birdcage to represent data stored and locked away by third party apps. He says, “I want to open that cage and restore my data to myself and see what can be done with it.”

    The subject of Cambridge Community Charter School student Jacob T.’s project was the importance of data privacy. For his T-shirt design, he drew a birdcage to represent data stored and locked away by third party apps. (From right to left:) Breazeal, Jacob T. Kiki, Raechel Walker, and Zeynep Yalcin.

    Photo: Katherine Ouellette

    Previous item
    Next item

    Many students wanted to see more representation in both the media they consume and across various professional fields. Nina talked about the importance of representation in media and how that could contribute to greater representation in the tech industry, while Kiki talked about encouraging more women to pursue STEM fields. Jesmin said, “I wanted to show that data science is accessible to everyone, no matter their origin or language you speak. I wrote ‘hello’ in Bangla, Arabic, and English, because I speak all three languages and they all resonate with me.”

    Student Jesmin (left) explains the concept of her T-shirt design to Mayor Siddiqui. She wants data science to be accessible to everyone, no matter their origin or language, so she drew a globe and wrote ‘hello’ in the three languages she speaks: Bangla, Arabic, and English.

    Photo: Katherine Ouellette

    Previous item
    Next item

    “Overall, I hope the students continue to use their data activism skills to re-envision a society that supports marginalized groups,” says Walker. “Moreover, I hope they are empowered to become data scientists and understand how their race can be a positive part of their identity.” More

  • in

    Caspar Hare, Georgia Perakis named associate deans of Social and Ethical Responsibilities of Computing

    Caspar Hare and Georgia Perakis have been appointed the new associate deans of the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative in the MIT Stephen A. Schwarzman College of Computing. Their new roles will take effect on Sept. 1.

    “Infusing social and ethical aspects of computing in academic research and education is a critical component of the college mission,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “I look forward to working with Caspar and Georgia on continuing to develop and advance SERC and its reach across MIT. Their complementary backgrounds and their broad connections across MIT will be invaluable to this next chapter of SERC.”

    Caspar Hare

    Hare is a professor of philosophy in the Department of Linguistics and Philosophy. A member of the MIT faculty since 2003, his main interests are in ethics, metaphysics, and epistemology. The general theme of his recent work has been to bring ideas about practical rationality and metaphysics to bear on issues in normative ethics and epistemology. He is the author of two books: “On Myself, and Other, Less Important Subjects” (Princeton University Press 2009), about the metaphysics of perspective, and “The Limits of Kindness” (Oxford University Press 2013), about normative ethics.

    Georgia Perakis

    Perakis is the William F. Pounds Professor of Management and professor of operations research, statistics, and operations management at the MIT Sloan School of Management, where she has been a faculty member since 1998. She investigates the theory and practice of analytics and its role in operations problems and is particularly interested in how to solve complex and practical problems in pricing, revenue management, supply chains, health care, transportation, and energy applications, among other areas. Since 2019, she has been the co-director of the Operations Research Center, an interdepartmental PhD program that jointly reports to MIT Sloan and the MIT Schwarzman College of Computing, a role in which she will remain. Perakis will also assume an associate dean role at MIT Sloan in recognition of her leadership.

    Hare and Perakis succeed David Kaiser, the Germeshausen Professor of the History of Science and professor of physics, and Julie Shah, the H.N. Slater Professor of Aeronautics and Astronautics, who will be stepping down from their roles at the conclusion of their three-year term on Aug. 31.

    “My deepest thanks to Dave and Julie for their tremendous leadership of SERC and contributions to the college as associate deans,” says Huttenlocher.

    SERC impact

    As the inaugural associate deans of SERC, Kaiser and Shah have been responsible for advancing a mission to incorporate humanist, social science, social responsibility, and civic perspectives into MIT’s teaching, research, and implementation of computing. In doing so, they have engaged dozens of faculty members and thousands of students from across MIT during these first three years of the initiative.

    They have brought together people from a broad array of disciplines to collaborate on crafting original materials such as active learning projects, homework assignments, and in-class demonstrations. A collection of these materials was recently published and is now freely available to the world via MIT OpenCourseWare.

    In February 2021, they launched the MIT Case Studies in Social and Ethical Responsibilities of Computing for undergraduate instruction across a range of classes and fields of study. The specially commissioned and peer-reviewed cases are based on original research and are brief by design. Three issues have been published to date and a fourth will be released later this summer. Kaiser will continue to oversee the successful new series as editor.

    Last year, 60 undergraduates, graduate students, and postdocs joined a community of SERC Scholars to help advance SERC efforts in the college. The scholars participate in unique opportunities throughout, such as the summer Experiential Ethics program. A multidisciplinary team of graduate students last winter worked with the instructors and teaching assistants of class 6.036 (Introduction to Machine Learning), MIT’s largest machine learning course, to infuse weekly labs with material covering ethical computing, data and model bias, and fairness in machine learning through SERC.

    Through efforts such as these, SERC has had a substantial impact at MIT and beyond. Over the course of their tenure, Kaiser and Shah have engaged about 80 faculty members, and more than 2,100 students took courses that included new SERC content in the last year alone. SERC’s reach extended well beyond engineering students, with about 500 exposed to SERC content through courses offered in the School of Humanities, Arts, and Social Sciences, the MIT Sloan School of Management, and the School of Architecture and Planning. More

  • in

    Researchers discover major roadblock in alleviating network congestion

    When users want to send data over the internet faster than the network can handle, congestion can occur — the same way traffic congestion snarls the morning commute into a big city.

    Computers and devices that transmit data over the internet break the data down into smaller packets and use a special algorithm to decide how fast to send those packets. These congestion control algorithms seek to fully discover and utilize available network capacity while sharing it fairly with other users who may be sharing the same network. These algorithms try to minimize delay caused by data waiting in queues in the network.

    Over the past decade, researchers in industry and academia have developed several algorithms that attempt to achieve high rates while controlling delays. Some of these, such as the BBR algorithm developed by Google, are now widely used by many websites and applications.

    But a team of MIT researchers has discovered that these algorithms can be deeply unfair. In a new study, they show there will always be a network scenario where at least one sender receives almost no bandwidth compared to other senders; that is, a problem known as “starvation” cannot be avoided.

    “What is really surprising about this paper and the results is that when you take into account the real-world complexity of network paths and all the things they can do to data packets, it is basically impossible for delay-controlling congestion control algorithms to avoid starvation using current methods,” says Mohammad Alizadeh, associate professor of electrical engineering and computer science (EECS).

    While Alizadeh and his co-authors weren’t able to find a traditional congestion control algorithm that could avoid starvation, there may be algorithms in a different class that could prevent this problem. Their analysis also suggests that changing how these algorithms work, so that they allow for larger variations in delay, could help prevent starvation in some network situations.

    Alizadeh wrote the paper with first author and EECS graduate student Venkat Arun and senior author Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence. The research will be presented at the ACM Special Interest Group on Data Communications (SIGCOMM) conference.

    Controlling congestion

    Congestion control is a fundamental problem in networking that researchers have been trying to tackle since the 1980s.

    A user’s computer does not know how fast to send data packets over the network because it lacks information, such as the quality of the network connection or how many other senders are using the network. Sending packets too slowly makes poor use of the available bandwidth. But sending them too quickly can overwhelm the network, and in doing so, packets will start to get dropped. These packets must be resent, which leads to longer delays. Delays can also be caused by packets waiting in queues for a long time.

    Congestion control algorithms use packet losses and delays as signals to infer congestion and decide how fast to send data. But the internet is complicated, and packets can be delayed and lost for reasons unrelated to network congestion. For instance, data could be held up in a queue along the way and then released with a burst of other packets, or the receiver’s acknowledgement might be delayed. The authors call delays that are not caused by congestion “jitter.”

    Even if a congestion control algorithm measures delay perfectly, it can’t tell the difference between delay caused by congestion and delay caused by jitter. Delay caused by jitter is unpredictable and confuses the sender. Because of this ambiguity, users start estimating delay differently, which causes them to send packets at unequal rates. Eventually, this leads to a situation where starvation occurs and someone gets shut out completely, Arun explains.

    “We started the project because we lacked a theoretical understanding of congestion control behavior in the presence of jitter. To place it on a firmer theoretical footing, we built a mathematical model that was simple enough to think about, yet able to capture some of the complexities of the internet. It has been very rewarding to have math tell us things we didn’t know and that have practical relevance,” he says.

    Studying starvation

    The researchers fed their mathematical model to a computer, gave it a series of commonly used congestion control algorithms, and asked the computer to find an algorithm that could avoid starvation, using their model.

    “We couldn’t do it. We tried every algorithm that we are aware of, and some new ones we made up. Nothing worked. The computer always found a situation where some people get all the bandwidth and at least one person gets basically nothing,” Arun says.

    The researchers were surprised by this result, especially since these algorithms are widely believed to be reasonably fair. They started suspecting that it may not be possible to avoid starvation, an extreme form of unfairness. This motivated them to define a class of algorithms they call “delay-convergent algorithms” that they proved will always suffer from starvation under their network model. All existing congestion control algorithms that control delay (that the researchers are aware of) are delay-convergent.

    The fact that such simple failure modes of these widely used algorithms remained unknown for so long illustrates how difficult it is to understand algorithms through empirical testing alone, Arun adds. It underscores the importance of a solid theoretical foundation.

    But all hope is not lost. While all the algorithms they tested failed, there may be other algorithms which are not delay-convergent that might be able to avoid starvation This suggests that one way to fix the problem might be to design congestion control algorithms that vary the delay range more widely, so the range is larger than any delay that might occur due to jitter in the network.

    “To control delays, algorithms have tried to also bound the variations in delay about a desired equilibrium, but there is nothing wrong in potentially creating greater delay variation to get better measurements of congestive delays. It is just a new design philosophy you would have to adopt,” Balakrishnan adds.

    Now, the researchers want to keep pushing to see if they can find or build an algorithm that will eliminate starvation. They also want to apply this approach of mathematical modeling and computational proofs to other thorny, unsolved problems in networked systems.

    “We are increasingly reliant on computer systems for very critical things, and we need to put their reliability on a firmer conceptual footing. We’ve shown the surprising things you can discover when you put in the time to come up with these formal specifications of what the problem actually is,” says Alizadeh.

    The NASA University Leadership Initiative (grant #80NSSC20M0163) provided funds to assist the authors with their research, but the research paper solely reflects the opinions and conclusions of its authors and not any NASA entity. This work was also partially funded by the National Science Foundation, award number 1751009. More

  • in

    A technique to improve both fairness and accuracy in artificial intelligence

    For workers who use machine-learning models to help them make decisions, knowing when to trust a model’s predictions is not always an easy task, especially since these models are often so complex that their inner workings remain a mystery.

    Users sometimes employ a technique, known as selective regression, in which the model estimates its confidence level for each prediction and will reject predictions when its confidence is too low. Then a human can examine those cases, gather additional information, and make a decision about each one manually.

    But while selective regression has been shown to improve the overall performance of a model, researchers at MIT and the MIT-IBM Watson AI Lab have discovered that the technique can have the opposite effect for underrepresented groups of people in a dataset. As the model’s confidence increases with selective regression, its chance of making the right prediction also increases, but this does not always happen for all subgroups.

    For instance, a model suggesting loan approvals might make fewer errors on average, but it may actually make more wrong predictions for Black or female applicants. One reason this can occur is due to the fact that the model’s confidence measure is trained using overrepresented groups and may not be accurate for these underrepresented groups.

    Once they had identified this problem, the MIT researchers developed two algorithms that can remedy the issue. Using real-world datasets, they show that the algorithms reduce performance disparities that had affected marginalized subgroups.

    “Ultimately, this is about being more intelligent about which samples you hand off to a human to deal with. Rather than just minimizing some broad error rate for the model, we want to make sure the error rate across groups is taken into account in a smart way,” says senior MIT author Greg Wornell, the Sumitomo Professor in Engineering in the Department of Electrical Engineering and Computer Science (EECS) who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory of Electronics (RLE) and is a member of the MIT-IBM Watson AI Lab.

    Joining Wornell on the paper are co-lead authors Abhin Shah, an EECS graduate student, and Yuheng Bu, a postdoc in RLE; as well as Joshua Ka-Wing Lee SM ’17, ScD ’21 and Subhro Das, Rameswar Panda, and Prasanna Sattigeri, research staff members at the MIT-IBM Watson AI Lab. The paper will be presented this month at the International Conference on Machine Learning.

    To predict or not to predict

    Regression is a technique that estimates the relationship between a dependent variable and independent variables. In machine learning, regression analysis is commonly used for prediction tasks, such as predicting the price of a home given its features (number of bedrooms, square footage, etc.) With selective regression, the machine-learning model can make one of two choices for each input — it can make a prediction or abstain from a prediction if it doesn’t have enough confidence in its decision.

    When the model abstains, it reduces the fraction of samples it is making predictions on, which is known as coverage. By only making predictions on inputs that it is highly confident about, the overall performance of the model should improve. But this can also amplify biases that exist in a dataset, which occur when the model does not have sufficient data from certain subgroups. This can lead to errors or bad predictions for underrepresented individuals.

    The MIT researchers aimed to ensure that, as the overall error rate for the model improves with selective regression, the performance for every subgroup also improves. They call this monotonic selective risk.

    “It was challenging to come up with the right notion of fairness for this particular problem. But by enforcing this criteria, monotonic selective risk, we can make sure the model performance is actually getting better across all subgroups when you reduce the coverage,” says Shah.

    Focus on fairness

    The team developed two neural network algorithms that impose this fairness criteria to solve the problem.

    One algorithm guarantees that the features the model uses to make predictions contain all information about the sensitive attributes in the dataset, such as race and sex, that is relevant to the target variable of interest. Sensitive attributes are features that may not be used for decisions, often due to laws or organizational policies. The second algorithm employs a calibration technique to ensure the model makes the same prediction for an input, regardless of whether any sensitive attributes are added to that input.

    The researchers tested these algorithms by applying them to real-world datasets that could be used in high-stakes decision making. One, an insurance dataset, is used to predict total annual medical expenses charged to patients using demographic statistics; another, a crime dataset, is used to predict the number of violent crimes in communities using socioeconomic information. Both datasets contain sensitive attributes for individuals.

    When they implemented their algorithms on top of a standard machine-learning method for selective regression, they were able to reduce disparities by achieving lower error rates for the minority subgroups in each dataset. Moreover, this was accomplished without significantly impacting the overall error rate.

    “We see that if we don’t impose certain constraints, in cases where the model is really confident, it could actually be making more errors, which could be very costly in some applications, like health care. So if we reverse the trend and make it more intuitive, we will catch a lot of these errors. A major goal of this work is to avoid errors going silently undetected,” Sattigeri says.

    The researchers plan to apply their solutions to other applications, such as predicting house prices, student GPA, or loan interest rate, to see if the algorithms need to be calibrated for those tasks, says Shah. They also want to explore techniques that use less sensitive information during the model training process to avoid privacy issues.

    And they hope to improve the confidence estimates in selective regression to prevent situations where the model’s confidence is low, but its prediction is correct. This could reduce the workload on humans and further streamline the decision-making process, Sattigeri says.

    This research was funded, in part, by the MIT-IBM Watson AI Lab and its member companies Boston Scientific, Samsung, and Wells Fargo, and by the National Science Foundation. More