More stories

  • in

    System helps severely motor-impaired individuals type more quickly and accurately

    In 1995, French fashion magazine editor Jean-Dominique Bauby suffered a seizure while driving a car, which left him with a condition known as locked-in syndrome, a neurological disease in which the patient is completely paralyzed and can only move muscles that control the eyes.

    Bauby, who had signed a book contract shortly before his accident, wrote the memoir “The Diving Bell and the Butterfly” using a dictation system in which his speech therapist recited the alphabet and he would blink when she said the correct letter. They wrote the 130-page book one blink at a time.

    Technology has come a long way since Bauby’s accident. Many individuals with severe motor impairments caused by locked-in syndrome, cerebral palsy, amyotrophic lateral sclerosis, or other conditions can communicate using computer interfaces where they select letters or words in an onscreen grid by activating a single switch, often by pressing a button, releasing a puff of air, or blinking.

    But these row-column scanning systems are very rigid, and, similar to the technique used by Bauby’s speech therapist, they highlight each option one at a time, making them frustratingly slow for some users. And they are not suitable for tasks where options can’t be arranged in a grid, like drawing, browsing the web, or gaming.

    A more flexible system being developed by researchers at MIT places individual selection indicators next to each option on a computer screen. The indicators can be placed anywhere — next to anything someone might click with a mouse — so a user does not need to cycle through a grid of choices to make selections. The system, called Nomon, incorporates probabilistic reasoning to learn how users make selections, and then adjusts the interface to improve their speed and accuracy.

    Participants in a user study were able to type faster using Nomon than with a row-column scanning system. The users also performed better on a picture selection task, demonstrating how Nomon could be used for more than typing.

    “It is so cool and exciting to be able to develop software that has the potential to really help people. Being able to find those signals and turn them into communication as we are used to it is a really interesting problem,” says senior author Tamara Broderick, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.

    Joining Broderick on the paper are lead author Nicholas Bonaker, an EECS graduate student; Emli-Mari Nel, head of innovation and machine learning at Averly and a visiting lecturer at the University of Witwatersrand in South Africa; and Keith Vertanen, an associate professor at Michigan Tech. The research is being presented at the ACM Conference on Human Factors in Computing Systems.

    On the clock

    In the Nomon interface, a small analog clock is placed next to every option the user can select. (A gnomon is the part of a sundial that casts a shadow.) The user looks at one option and then clicks their switch when that clock’s hand passes a red “noon” line. After each click, the system changes the phases of the clocks to separate the most probable next targets. The user clicks repeatedly until their target is selected.

    When used as a keyboard, Nomon’s machine-learning algorithms try to guess the next word based on previous words and each new letter as the user makes selections.

    Broderick developed a simplified version of Nomon several years ago but decided to revisit it to make the system easier for motor-impaired individuals to use. She enlisted the help of then-undergraduate Bonaker to redesign the interface.

    They first consulted nonprofit organizations that work with motor-impaired individuals, as well as a motor-impaired switch user, to gather feedback on the Nomon design.

    Then they designed a user study that would better represent the abilities of motor-impaired individuals. They wanted to make sure to thoroughly vet the system before using much of the valuable time of motor-impaired users, so they first tested on non-switch users, Broderick explains.

    Switching up the switch

    To gather more representative data, Bonaker devised a webcam-based switch that was harder to use than simply clicking a key. The non-switch users had to lean their bodies to one side of the screen and then back to the other side to register a click.

    “And they have to do this at precisely the right time, so it really slows them down. We did some empirical studies which showed that they were much closer to the response times of motor-impaired individuals,” Broderick says.

    They ran a 10-session user study with 13 non-switch participants and one single-switch user with an advanced form of spinal muscular dystrophy. In the first nine sessions, participants used Nomon and a row-column scanning interface for 20 minutes each to perform text entry, and in the 10th session they used the two systems for a picture selection task.

    Non-switch users typed 15 percent faster using Nomon, while the motor-impaired user typed even faster than the non-switch users. When typing unfamiliar words, the users were 20 percent faster overall and made half as many errors. In their final session, they were able to complete the picture selection task 36 percent faster using Nomon.

    “Nomon is much more forgiving than row-column scanning. With row-column scanning, even if you are just slightly off, now you’ve chosen B instead of A and that’s an error,” Broderick says.

    Adapting to noisy clicks

    With its probabilistic reasoning, Nomon incorporates everything it knows about where a user is likely to click to make the process faster, easier, and less error-prone. For instance, if the user selects “Q,” Nomon will make it as easy as possible for the user to select “U” next.

    Nomon also learns how a user clicks. So, if the user always clicks a little after the clock’s hand strikes noon, the system adapts to that in real time. It also adapts to noisiness. If a user’s click is often off the mark, the system requires extra clicks to ensure accuracy.

    This probabilistic reasoning makes Nomon powerful but also requires a higher click-load than row-column scanning systems. Clicking multiple times can be a trying task for severely motor-impaired users.

    Broderick hopes to reduce the click-load by incorporating gaze tracking into Nomon, which would give the system more robust information about what a user might choose next based on which part of the screen they are looking at. The researchers also want to find a better way to automatically adjust the clock speeds to help users be more accurate and efficient.

    They are working on a new series of studies in which they plan to partner with more motor-impaired users.

    “So far, the feedback from motor-impaired users has been invaluable to us; we’re very grateful to the motor-impaired user who commented on our initial interface and the separate motor-impaired user who participated in our study. We’re currently extending our study to work with a bigger and more diverse group of our target population. With their help, we’re already making further improvements to our interface and working to better understand the performance of Nomon,” she says.

    “Nonspeaking individuals with motor disabilities are currently not provided with efficient communication solutions for interacting with either speaking partners or computer systems. This ‘communication gap’ is a known unresolved problem in human-computer interaction, and so far there are no good solutions. This paper demonstrates that a highly creative approach underpinned by a statistical model can provide tangible performance gains to the users who need it the most: nonspeaking individuals reliant on a single switch to communicate,” says Per Ola Kristensson, professor of interactive systems engineering at Cambridge University, who was not involved with this research. “The paper also demonstrates the value of complementing insights from computational experiments with the involvement of end-users and other stakeholders in the design process. I find this a highly creative and important paper in an area where it is notoriously difficult to make significant progress.”

    This research was supported, in part, by the Seth Teller Memorial Fund to Advanced Technology for People with Disabilities, a Peter J. Eloranta Summer Undergraduate Research Fellowship, the MIT Quest for Intelligence, and the National Science Foundation. More

  • in

    A tool for predicting the future

    Whether someone is trying to predict tomorrow’s weather, forecast future stock prices, identify missed opportunities for sales in retail, or estimate a patient’s risk of developing a disease, they will likely need to interpret time-series data, which are a collection of observations recorded over time.

    Making predictions using time-series data typically requires several data-processing steps and the use of complex machine-learning algorithms, which have such a steep learning curve they aren’t readily accessible to nonexperts.

    To make these powerful tools more user-friendly, MIT researchers developed a system that directly integrates prediction functionality on top of an existing time-series database. Their simplified interface, which they call tspDB (time series predict database), does all the complex modeling behind the scenes so a nonexpert can easily generate a prediction in only a few seconds.

    The new system is more accurate and more efficient than state-of-the-art deep learning methods when performing two tasks: predicting future values and filling in missing data points.

    One reason tspDB is so successful is that it incorporates a novel time-series-prediction algorithm, explains electrical engineering and computer science (EECS) graduate student Abdullah Alomar, an author of a recent research paper in which he and his co-authors describe the algorithm. This algorithm is especially effective at making predictions on multivariate time-series data, which are data that have more than one time-dependent variable. In a weather database, for instance, temperature, dew point, and cloud cover each depend on their past values.

    The algorithm also estimates the volatility of a multivariate time series to provide the user with a confidence level for its predictions.

    “Even as the time-series data becomes more and more complex, this algorithm can effectively capture any time-series structure out there. It feels like we have found the right lens to look at the model complexity of time-series data,” says senior author Devavrat Shah, the Andrew and Erna Viterbi Professor in EECS and a member of the Institute for Data, Systems, and Society and of the Laboratory for Information and Decision Systems.

    Joining Alomar and Shah on the paper is lead author Anish Agrawal, a former EECS graduate student who is currently a postdoc at the Simons Institute at the University of California at Berkeley. The research will be presented at the ACM SIGMETRICS conference.

    Adapting a new algorithm

    Shah and his collaborators have been working on the problem of interpreting time-series data for years, adapting different algorithms and integrating them into tspDB as they built the interface.

    About four years ago, they learned about a particularly powerful classical algorithm, called singular spectrum analysis (SSA), that imputes and forecasts single time series. Imputation is the process of replacing missing values or correcting past values. While this algorithm required manual parameter selection, the researchers suspected it could enable their interface to make effective predictions using time series data. In earlier work, they removed this need to manually intervene for algorithmic implementation.  

    The algorithm for single time series transformed it into a matrix and utilized matrix estimation procedures. The key intellectual challenge was how to adapt it to utilize multiple time series.  After a few years of struggle, they realized the answer was something very simple: “Stack” the matrices for each individual time series, treat it as a one big matrix, and then apply the single time-series algorithm on it.

    This utilizes information across multiple time series naturally — both across the time series and across time, which they describe in their new paper.

    This recent publication also discusses interesting alternatives, where instead of transforming the multivariate time series into a big matrix, it is viewed as a three-dimensional tensor. A tensor is a multi-dimensional array, or grid, of numbers. This established a promising connection between the classical field of time series analysis and the growing field of tensor estimation, Alomar says.

    “The variant of mSSA that we introduced actually captures all of that beautifully. So, not only does it provide the most likely estimation, but a time-varying confidence interval, as well,” Shah says.

    The simpler, the better

    They tested the adapted mSSA against other state-of-the-art algorithms, including deep-learning methods, on real-world time-series datasets with inputs drawn from the electricity grid, traffic patterns, and financial markets.

    Their algorithm outperformed all the others on imputation and it outperformed all but one of the other algorithms when it came to forecasting future values. The researchers also demonstrated that their tweaked version of mSSA can be applied to any kind of time-series data.

    “One reason I think this works so well is that the model captures a lot of time series dynamics, but at the end of the day, it is still a simple model. When you are working with something simple like this, instead of a neural network that can easily overfit the data, you can actually perform better,” Alomar says.

    The impressive performance of mSSA is what makes tspDB so effective, Shah explains. Now, their goal is to make this algorithm accessible to everyone.

    One a user installs tspDB on top of an existing database, they can run a prediction query with just a few keystrokes in about 0.9 milliseconds, as compared to 0.5 milliseconds for a standard search query. The confidence intervals are also designed to help nonexperts to make a more informed decision by incorporating the degree of uncertainty of the predictions into their decision making.

    For instance, the system could enable a nonexpert to predict future stock prices with high accuracy in just a few minutes, even if the time-series dataset contains missing values.

    Now that the researchers have shown why mSSA works so well, they are targeting new algorithms that can be incorporated into tspDB. One of these algorithms utilizes the same model to automatically enable change point detection, so if the user believes their time series will change its behavior at some point, the system will automatically detect that change and incorporate that into its predictions.

    They also want to continue gathering feedback from current tspDB users to see how they can improve the system’s functionality and user-friendliness, Shah says.

    “Our interest at the highest level is to make tspDB a success in the form of a broadly utilizable, open-source system. Time-series data are very important, and this is a beautiful concept of actually building prediction functionalities directly into the database. It has never been done before, and so we want to make sure the world uses it,” he says.

    “This work is very interesting for a number of reasons. It provides a practical variant of mSSA which requires no hand tuning, they provide the first known analysis of mSSA, and the authors demonstrate the real-world value of their algorithm by being competitive with or out-performing several known algorithms for imputations and predictions in (multivariate) time series for several real-world data sets,” says Vishal Misra, a professor of computer science at Columbia University who was not involved with this research. “At the heart of it all is the beautiful modeling work where they cleverly exploit correlations across time (within a time series) and space (across time series) to create a low-rank spatiotemporal factor representation of a multivariate time series. Importantly this model connects the field of time series analysis to that of the rapidly evolving topic of tensor completion, and I expect a lot of follow-on research spurred by this paper.” More

  • in

    Systems scientists find clues to why false news snowballs on social media

    The spread of misinformation on social media is a pressing societal problem that tech companies and policymakers continue to grapple with, yet those who study this issue still don’t have a deep understanding of why and how false news spreads.

    To shed some light on this murky topic, researchers at MIT developed a theoretical model of a Twitter-like social network to study how news is shared and explore situations where a non-credible news item will spread more widely than the truth. Agents in the model are driven by a desire to persuade others to take on their point of view: The key assumption in the model is that people bother to share something with their followers if they think it is persuasive and likely to move others closer to their mindset. Otherwise they won’t share.

    The researchers found that in such a setting, when a network is highly connected or the views of its members are sharply polarized, news that is likely to be false will spread more widely and travel deeper into the network than news with higher credibility.

    This theoretical work could inform empirical studies of the relationship between news credibility and the size of its spread, which might help social media companies adapt networks to limit the spread of false information.

    “We show that, even if people are rational in how they decide to share the news, this could still lead to the amplification of information with low credibility. With this persuasion motive, no matter how extreme my beliefs are — given that the more extreme they are the more I gain by moving others’ opinions — there is always someone who would amplify [the information],” says senior author Ali Jadbabaie, professor and head of the Department of Civil and Environmental Engineering and a core faculty member of the Institute for Data, Systems, and Society (IDSS) and a principal investigator in the Laboratory for Information and Decision Systems (LIDS).

    Joining Jadbabaie on the paper are first author Chin-Chia Hsu, a graduate student in the Social and Engineering Systems program in IDSS, and Amir Ajorlou, a LIDS research scientist. The research will be presented this week at the IEEE Conference on Decision and Control.

    Pondering persuasion

    This research draws on a 2018 study by Sinan Aral, the David Austin Professor of Management at the MIT Sloan School of Management; Deb Roy, an associate professor of media arts and sciences at the Media Lab; and former postdoc Soroush Vosoughi (now an assistant professor of computer science at Dartmouth University). Their empirical study of data from Twitter found that false news spreads wider, faster, and deeper than real news.

    Jadbabaie and his collaborators wanted to drill down on why this occurs.

    They hypothesized that persuasion might be a strong motive for sharing news — perhaps agents in the network want to persuade others to take on their point of view — and decided to build a theoretical model that would let them explore this possibility.

    In their model, agents have some prior belief about a policy, and their goal is to persuade followers to move their beliefs closer to the agent’s side of the spectrum.

    A news item is initially released to a small, random subgroup of agents, which must decide whether to share this news with their followers. An agent weighs the newsworthiness of the item and its credibility, and updates its belief based on how surprising or convincing the news is. 

    “They will make a cost-benefit analysis to see if, on average, this piece of news will move people closer to what they think or move them away. And we include a nominal cost for sharing. For instance, taking some action, if you are scrolling on social media, you have to stop to do that. Think of that as a cost. Or a reputation cost might come if I share something that is embarrassing. Everyone has this cost, so the more extreme and the more interesting the news is, the more you want to share it,” Jadbabaie says.

    If the news affirms the agent’s perspective and has persuasive power that outweighs the nominal cost, the agent will always share the news. But if an agent thinks the news item is something others may have already seen, the agent is disincentivized to share it.

    Since an agent’s willingness to share news is a product of its perspective and how persuasive the news is, the more extreme an agent’s perspective or the more surprising the news, the more likely the agent will share it.

    The researchers used this model to study how information spreads during a news cascade, which is an unbroken sharing chain that rapidly permeates the network.

    Connectivity and polarization

    The team found that when a network has high connectivity and the news is surprising, the credibility threshold for starting a news cascade is lower. High connectivity means that there are multiple connections between many users in the network.

    Likewise, when the network is largely polarized, there are plenty of agents with extreme views who want to share the news item, starting a news cascade. In both these instances, news with low credibility creates the largest cascades.

    “For any piece of news, there is a natural network speed limit, a range of connectivity, that facilitates good transmission of information where the size of the cascade is maximized by true news. But if you exceed that speed limit, you will get into situations where inaccurate news or news with low credibility has a larger cascade size,” Jadbabaie says.

    If the views of users in the network become more diverse, it is less likely that a poorly credible piece of news will spread more widely than the truth.

    Jadbabaie and his colleagues designed the agents in the network to behave rationally, so the model would better capture actions real humans might take if they want to persuade others.

    “Someone might say that is not why people share, and that is valid. Why people do certain things is a subject of intense debate in cognitive science, social psychology, neuroscience, economics, and political science,” he says. “Depending on your assumptions, you end up getting different results. But I feel like this assumption of persuasion being the motive is a natural assumption.”

    Their model also shows how costs can be manipulated to reduce the spread of false information. Agents make a cost-benefit analysis and won’t share news if the cost to do so outweighs the benefit of sharing.

    “We don’t make any policy prescriptions, but one thing this work suggests is that, perhaps, having some cost associated with sharing news is not a bad idea. The reason you get lots of these cascades is because the cost of sharing the news is actually very low,” he says.

    This work was supported by an Army Research Office Multidisciplinary University Research Initiative grant and a Vannevar Bush Fellowship from the Office of the Secretary of Defense. More

  • in

    Machine learning speeds up vehicle routing

    Waiting for a holiday package to be delivered? There’s a tricky math problem that needs to be solved before the delivery truck pulls up to your door, and MIT researchers have a strategy that could speed up the solution.

    The approach applies to vehicle routing problems such as last-mile delivery, where the goal is to deliver goods from a central depot to multiple cities while keeping travel costs down. While there are algorithms designed to solve this problem for a few hundred cities, these solutions become too slow when applied to a larger set of cities.

    To remedy this, Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in Civil and Environmental Engineering and the Institute for Data, Systems, and Society, and her students have come up with a machine-learning strategy that accelerates some of the strongest algorithmic solvers by 10 to 100 times.

    The solver algorithms work by breaking up the problem of delivery into smaller subproblems to solve — say, 200 subproblems for routing vehicles between 2,000 cities. Wu and her colleagues augment this process with a new machine-learning algorithm that identifies the most useful subproblems to solve, instead of solving all the subproblems, to increase the quality of the solution while using orders of magnitude less compute.

    Their approach, which they call “learning-to-delegate,” can be used across a variety of solvers and a variety of similar problems, including scheduling and pathfinding for warehouse robots, the researchers say.

    The work pushes the boundaries on rapidly solving large-scale vehicle routing problems, says Marc Kuo, founder and CEO of Routific, a smart logistics platform for optimizing delivery routes. Some of Routific’s recent algorithmic advances were inspired by Wu’s work, he notes.

    “Most of the academic body of research tends to focus on specialized algorithms for small problems, trying to find better solutions at the cost of processing times. But in the real-world, businesses don’t care about finding better solutions, especially if they take too long for compute,” Kuo explains. “In the world of last-mile logistics, time is money, and you cannot have your entire warehouse operations wait for a slow algorithm to return the routes. An algorithm needs to be hyper-fast for it to be practical.”

    Wu, social and engineering systems doctoral student Sirui Li, and electrical engineering and computer science doctoral student Zhongxia Yan presented their research this week at the 2021 NeurIPS conference.

    Selecting good problems

    Vehicle routing problems are a class of combinatorial problems, which involve using heuristic algorithms to find “good-enough solutions” to the problem. It’s typically not possible to come up with the one “best” answer to these problems, because the number of possible solutions is far too huge.

    “The name of the game for these types of problems is to design efficient algorithms … that are optimal within some factor,” Wu explains. “But the goal is not to find optimal solutions. That’s too hard. Rather, we want to find as good of solutions as possible. Even a 0.5% improvement in solutions can translate to a huge revenue increase for a company.”

    Over the past several decades, researchers have developed a variety of heuristics to yield quick solutions to combinatorial problems. They usually do this by starting with a poor but valid initial solution and then gradually improving the solution — by trying small tweaks to improve the routing between nearby cities, for example. For a large problem like a 2,000-plus city routing challenge, however, this approach just takes too much time.

    More recently, machine-learning methods have been developed to solve the problem, but while faster, they tend to be more inaccurate, even at the scale of a few dozen cities. Wu and her colleagues decided to see if there was a beneficial way to combine the two methods to find speedy but high-quality solutions.

    “For us, this is where machine learning comes in,” Wu says. “Can we predict which of these subproblems, that if we were to solve them, would lead to more improvement in the solution, saving computing time and expense?”

    Traditionally, a large-scale vehicle routing problem heuristic might choose the subproblems to solve in which order either randomly or by applying yet another carefully devised heuristic. In this case, the MIT researchers ran sets of subproblems through a neural network they created to automatically find the subproblems that, when solved, would lead to the greatest gain in quality of the solutions. This process sped up subproblem selection process by 1.5 to 2 times, Wu and colleagues found.

    “We don’t know why these subproblems are better than other subproblems,” Wu notes. “It’s actually an interesting line of future work. If we did have some insights here, these could lead to designing even better algorithms.”

    Surprising speed-up

    Wu and colleagues were surprised by how well the approach worked. In machine learning, the idea of garbage-in, garbage-out applies — that is, the quality of a machine-learning approach relies heavily on the quality of the data. A combinatorial problem is so difficult that even its subproblems can’t be optimally solved. A neural network trained on the “medium-quality” subproblem solutions available as the input data “would typically give medium-quality results,” says Wu. In this case, however, the researchers were able to leverage the medium-quality solutions to achieve high-quality results, significantly faster than state-of-the-art methods.

    For vehicle routing and similar problems, users often must design very specialized algorithms to solve their specific problem. Some of these heuristics have been in development for decades.

    The learning-to-delegate method offers an automatic way to accelerate these heuristics for large problems, no matter what the heuristic or — potentially — what the problem.

    Since the method can work with a variety of solvers, it may be useful for a variety of resource allocation problems, says Wu. “We may unlock new applications that now will be possible because the cost of solving the problem is 10 to 100 times less.”

    The research was supported by MIT Indonesia Seed Fund, U.S. Department of Transportation Dwight David Eisenhower Transportation Fellowship Program, and the MIT-IBM Watson AI Lab. More

  • in

    Avoiding shortcut solutions in artificial intelligence

    If your Uber driver takes a shortcut, you might get to your destination faster. But if a machine learning model takes a shortcut, it might fail in unexpected ways.

    In machine learning, a shortcut solution occurs when the model relies on a simple characteristic of a dataset to make a decision, rather than learning the true essence of the data, which can lead to inaccurate predictions. For example, a model might learn to identify images of cows by focusing on the green grass that appears in the photos, rather than the more complex shapes and patterns of the cows.  

    A new study by researchers at MIT explores the problem of shortcuts in a popular machine-learning method and proposes a solution that can prevent shortcuts by forcing the model to use more data in its decision-making.

    By removing the simpler characteristics the model is focusing on, the researchers force it to focus on more complex features of the data that it hadn’t been considering. Then, by asking the model to solve the same task two ways — once using those simpler features, and then also using the complex features it has now learned to identify — they reduce the tendency for shortcut solutions and boost the performance of the model.

    One potential application of this work is to enhance the effectiveness of machine learning models that are used to identify disease in medical images. Shortcut solutions in this context could lead to false diagnoses and have dangerous implications for patients.

    “It is still difficult to tell why deep networks make the decisions that they do, and in particular, which parts of the data these networks choose to focus upon when making a decision. If we can understand how shortcuts work in further detail, we can go even farther to answer some of the fundamental but very practical questions that are really important to people who are trying to deploy these networks,” says Joshua Robinson, a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper.

    Robinson wrote the paper with his advisors, senior author Suvrit Sra, the Esther and Harold E. Edgerton Career Development Associate Professor in the Department of Electrical Engineering and Computer Science (EECS) and a core member of the Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems; and Stefanie Jegelka, the X-Consortium Career Development Associate Professor in EECS and a member of CSAIL and IDSS; as well as University of Pittsburgh assistant professor Kayhan Batmanghelich and PhD students Li Sun and Ke Yu. The research will be presented at the Conference on Neural Information Processing Systems in December. 

    The long road to understanding shortcuts

    The researchers focused their study on contrastive learning, which is a powerful form of self-supervised machine learning. In self-supervised machine learning, a model is trained using raw data that do not have label descriptions from humans. It can therefore be used successfully for a larger variety of data.

    A self-supervised learning model learns useful representations of data, which are used as inputs for different tasks, like image classification. But if the model takes shortcuts and fails to capture important information, these tasks won’t be able to use that information either.

    For example, if a self-supervised learning model is trained to classify pneumonia in X-rays from a number of hospitals, but it learns to make predictions based on a tag that identifies the hospital the scan came from (because some hospitals have more pneumonia cases than others), the model won’t perform well when it is given data from a new hospital.     

    For contrastive learning models, an encoder algorithm is trained to discriminate between pairs of similar inputs and pairs of dissimilar inputs. This process encodes rich and complex data, like images, in a way that the contrastive learning model can interpret.

    The researchers tested contrastive learning encoders with a series of images and found that, during this training procedure, they also fall prey to shortcut solutions. The encoders tend to focus on the simplest features of an image to decide which pairs of inputs are similar and which are dissimilar. Ideally, the encoder should focus on all the useful characteristics of the data when making a decision, Jegelka says.

    So, the team made it harder to tell the difference between the similar and dissimilar pairs, and found that this changes which features the encoder will look at to make a decision.

    “If you make the task of discriminating between similar and dissimilar items harder and harder, then your system is forced to learn more meaningful information in the data, because without learning that it cannot solve the task,” she says.

    But increasing this difficulty resulted in a tradeoff — the encoder got better at focusing on some features of the data but became worse at focusing on others. It almost seemed to forget the simpler features, Robinson says.

    To avoid this tradeoff, the researchers asked the encoder to discriminate between the pairs the same way it had originally, using the simpler features, and also after the researchers removed the information it had already learned. Solving the task both ways simultaneously caused the encoder to improve across all features.

    Their method, called implicit feature modification, adaptively modifies samples to remove the simpler features the encoder is using to discriminate between the pairs. The technique does not rely on human input, which is important because real-world data sets can have hundreds of different features that could combine in complex ways, Sra explains.

    From cars to COPD

    The researchers ran one test of this method using images of vehicles. They used implicit feature modification to adjust the color, orientation, and vehicle type to make it harder for the encoder to discriminate between similar and dissimilar pairs of images. The encoder improved its accuracy across all three features — texture, shape, and color — simultaneously.

    To see if the method would stand up to more complex data, the researchers also tested it with samples from a medical image database of chronic obstructive pulmonary disease (COPD). Again, the method led to simultaneous improvements across all features they evaluated.

    While this work takes some important steps forward in understanding the causes of shortcut solutions and working to solve them, the researchers say that continuing to refine these methods and applying them to other types of self-supervised learning will be key to future advancements.

    “This ties into some of the biggest questions about deep learning systems, like ‘Why do they fail?’ and ‘Can we know in advance the situations where your model will fail?’ There is still a lot farther to go if you want to understand shortcut learning in its full generality,” Robinson says.

    This research is supported by the National Science Foundation, National Institutes of Health, and the Pennsylvania Department of Health’s SAP SE Commonwealth Universal Research Enhancement (CURE) program. More

  • in

    Making machine learning more useful to high-stakes decision makers

    The U.S. Centers for Disease Control and Prevention estimates that one in seven children in the United States experienced abuse or neglect in the past year. Child protective services agencies around the nation receive a high number of reports each year (about 4.4 million in 2019) of alleged neglect or abuse. With so many cases, some agencies are implementing machine learning models to help child welfare specialists screen cases and determine which to recommend for further investigation.

    But these models don’t do any good if the humans they are intended to help don’t understand or trust their outputs.

    Researchers at MIT and elsewhere launched a research project to identify and tackle machine learning usability challenges in child welfare screening. In collaboration with a child welfare department in Colorado, the researchers studied how call screeners assess cases, with and without the help of machine learning predictions. Based on feedback from the call screeners, they designed a visual analytics tool that uses bar graphs to show how specific factors of a case contribute to the predicted risk that a child will be removed from their home within two years.

    The researchers found that screeners are more interested in seeing how each factor, like the child’s age, influences a prediction, rather than understanding the computational basis of how the model works. Their results also show that even a simple model can cause confusion if its features are not described with straightforward language.

    These findings could be applied to other high-risk fields where humans use machine learning models to help them make decisions, but lack data science experience, says senior author Kalyan Veeramachaneni, principal research scientist in the Laboratory for Information and Decision Systems (LIDS) and senior author of the paper.

    “Researchers who study explainable AI, they often try to dig deeper into the model itself to explain what the model did. But a big takeaway from this project is that these domain experts don’t necessarily want to learn what machine learning actually does. They are more interested in understanding why the model is making a different prediction than what their intuition is saying, or what factors it is using to make this prediction. They want information that helps them reconcile their agreements or disagreements with the model, or confirms their intuition,” he says.

    Co-authors include electrical engineering and computer science PhD student Alexandra Zytek, who is the lead author; postdoc Dongyu Liu; and Rhema Vaithianathan, professor of economics and director of the Center for Social Data Analytics at the Auckland University of Technology and professor of social data analytics at the University of Queensland. The research will be presented later this month at the IEEE Visualization Conference.

    Real-world research

    The researchers began the study more than two years ago by identifying seven factors that make a machine learning model less usable, including lack of trust in where predictions come from and disagreements between user opinions and the model’s output.

    With these factors in mind, Zytek and Liu flew to Colorado in the winter of 2019 to learn firsthand from call screeners in a child welfare department. This department is implementing a machine learning system developed by Vaithianathan that generates a risk score for each report, predicting the likelihood the child will be removed from their home. That risk score is based on more than 100 demographic and historic factors, such as the parents’ ages and past court involvements.

    “As you can imagine, just getting a number between one and 20 and being told to integrate this into your workflow can be a bit challenging,” Zytek says.

    They observed how teams of screeners process cases in about 10 minutes and spend most of that time discussing the risk factors associated with the case. That inspired the researchers to develop a case-specific details interface, which shows how each factor influenced the overall risk score using color-coded, horizontal bar graphs that indicate the magnitude of the contribution in a positive or negative direction.

    Based on observations and detailed interviews, the researchers built four additional interfaces that provide explanations of the model, including one that compares a current case to past cases with similar risk scores. Then they ran a series of user studies.

    The studies revealed that more than 90 percent of the screeners found the case-specific details interface to be useful, and it generally increased their trust in the model’s predictions. On the other hand, the screeners did not like the case comparison interface. While the researchers thought this interface would increase trust in the model, screeners were concerned it could lead to decisions based on past cases rather than the current report.   

    “The most interesting result to me was that, the features we showed them — the information that the model uses — had to be really interpretable to start. The model uses more than 100 different features in order to make its prediction, and a lot of those were a bit confusing,” Zytek says.

    Keeping the screeners in the loop throughout the iterative process helped the researchers make decisions about what elements to include in the machine learning explanation tool, called Sibyl.

    As they refined the Sibyl interfaces, the researchers were careful to consider how providing explanations could contribute to some cognitive biases, and even undermine screeners’ trust in the model.

    For instance, since explanations are based on averages in a database of child abuse and neglect cases, having three past abuse referrals may actually decrease the risk score of a child, since averages in this database may be far higher. A screener may see that explanation and decide not to trust the model, even though it is working correctly, Zytek explains. And because humans tend to put more emphasis on recent information, the order in which the factors are listed could also influence decisions.

    Improving interpretability

    Based on feedback from call screeners, the researchers are working to tweak the explanation model so the features that it uses are easier to explain.

    Moving forward, they plan to enhance the interfaces they’ve created based on additional feedback and then run a quantitative user study to track the effects on decision making with real cases. Once those evaluations are complete, they can prepare to deploy Sibyl, Zytek says.

    “It was especially valuable to be able to work so actively with these screeners. We got to really understand the problems they faced. While we saw some reservations on their part, what we saw more of was excitement about how useful these explanations were in certain cases. That was really rewarding,” she says.

    This work is supported, in part, by the National Science Foundation. More

  • in

    3 Questions: Kalyan Veeramachaneni on hurdles preventing fully automated machine learning

    The proliferation of big data across domains, from banking to health care to environmental monitoring, has spurred increasing demand for machine learning tools that help organizations make decisions based on the data they gather.

    That growing industry demand has driven researchers to explore the possibilities of automated machine learning (AutoML), which seeks to automate the development of machine learning solutions in order to make them accessible for nonexperts, improve their efficiency, and accelerate machine learning research. For example, an AutoML system might enable doctors to use their expertise interpreting electroencephalography (EEG) results to build a model that can predict which patients are at higher risk for epilepsy — without requiring the doctors to have a background in data science.

    Yet, despite more than a decade of work, researchers have been unable to fully automate all steps in the machine learning development process. Even the most efficient commercial AutoML systems still require a prolonged back-and-forth between a domain expert, like a marketing manager or mechanical engineer, and a data scientist, making the process inefficient.

    Kalyan Veeramachaneni, a principal research scientist in the MIT Laboratory for Information and Decision Systems who has been studying AutoML since 2010, has co-authored a paper in the journal ACM Computing Surveys that details a seven-tiered schematic to evaluate AutoML tools based on their level of autonomy.

    A system at level zero has no automation and requires a data scientist to start from scratch and build models by hand, while a tool at level six is completely automated and can be easily and effectively used by a nonexpert. Most commercial systems fall somewhere in the middle.

    Veeramachaneni spoke with MIT News about the current state of AutoML, the hurdles that prevent truly automatic machine learning systems, and the road ahead for AutoML researchers.

    Q: How has automatic machine learning evolved over the past decade, and what is the current state of AutoML systems?

    A: In 2010, we started to see a shift, with enterprises wanting to invest in getting value out of their data beyond just business intelligence. So then came the question, maybe there are certain things in the development of machine learning-based solutions that we can automate? The first iteration of AutoML was to make our own jobs as data scientists more efficient. Can we take away the grunt work that we do on a day-to-day basis and automate that by using a software system? That area of research ran its course until about 2015, when we realized we still weren’t able to speed up this development process.

    Then another thread emerged. There are a lot of problems that could be solved with data, and they come from experts who know those problems, who live with them on a daily basis. These individuals have very little to do with machine learning or software engineering. How do we bring them into the fold? That is really the next frontier.

    There are three areas where these domain experts have strong input in a machine learning system. The first is defining the problem itself and then helping to formulate it as a prediction task to be solved by a machine learning model. Second, they know how the data have been collected, so they also know intuitively how to process that data. And then third, at the end, machine learning models only give you a very tiny part of a solution — they just give you a prediction. The output of a machine learning model is just one input to help a domain expert get to a decision or action.

    Q: What steps of the machine learning pipeline are the most difficult to automate, and why has automating them been so challenging?

    A: The problem-formulation part is extremely difficult to automate. For example, if I am a researcher who wants to get more government funding, and I have a lot of data about the content of the research proposals that I write and whether or not I receive funding, can machine learning help there? We don’t know yet. In problem formulation, I use my domain expertise to translate the problem into something that is more tangible to predict, and that requires somebody who knows the domain very well. And he or she also knows how to use that information post-prediction. That problem is refusing to be automated.

    There is one part of problem-formulation that could be automated. It turns out that we can look at the data and mathematically express several possible prediction tasks automatically. Then we can share those prediction tasks with the domain expert to see if any of them would help in the larger problem they are trying to tackle. Then once you pick the prediction task, there are a lot of intermediate steps you do, including feature engineering, modeling, etc., that are very mechanical steps and easy to automate.

    But defining the prediction tasks has typically been a collaborative effort between data scientists and domain experts because, unless you know the domain, you can’t translate the domain problem into a prediction task. And then sometimes domain experts don’t know what is meant by “prediction.” That leads to the major, significant back and forth in the process. If you automate that step, then machine learning penetration and the use of data to create meaningful predictions will increase tremendously.

    Then what happens after the machine learning model gives a prediction? We can automate the software and technology part of it, but at the end of the day, it is root cause analysis and human intuition and decision making. We can augment them with a lot of tools, but we can’t fully automate that.

    Q: What do you hope to achieve with the seven-tiered framework for evaluating AutoML systems that you outlined in your paper?

    A: My hope is that people start to recognize that some levels of automation have already been achieved and some still need to be tackled. In the research community, we tend to focus on what we are comfortable with. We have gotten used to automating certain steps, and then we just stick to it. Automating these other parts of the machine learning solution development is very important, and that is where the biggest bottlenecks remain.

    My second hope is that researchers will very clearly understand what domain expertise means. A lot of this AutoML work is still being conducted by academics, and the problem is that we often don’t do applied work. There is not a crystal-clear definition of what a domain expert is and in itself, “domain expert,” is a very nebulous phrase. What we mean by domain expert is the expert in the problem you are trying to solve with machine learning. And I am hoping that everyone unifies around that because that would make things so much clearer.

    I still believe that we are not able to build that many models for that many problems, but even for the ones that we are building, the majority of them are not getting deployed and used in day-to-day life. The output of machine learning is just going to be another data point, an augmented data point, in someone’s decision making. How they make those decisions, based on that input, how that will change their behavior, and how they will adapt their style of working, that is still a big, open question. Once we automate everything, that is what’s next.

    We have to determine what has to fundamentally change in the day-to-day workflow of someone giving loans at a bank, or an educator trying to decide whether he or she should change the assignments in an online class. How are they going to use machine learning’s outputs? We need to focus on the fundamental things we have to build out to make machine learning more usable. More