More stories

  • in

    Learning on the edge

    Microcontrollers, miniature computers that can run simple commands, are the basis for billions of connected devices, from internet-of-things (IoT) devices to sensors in automobiles. But cheap, low-power microcontrollers have extremely limited memory and no operating system, making it challenging to train artificial intelligence models on “edge devices” that work independently from central computing resources.

    Training a machine-learning model on an intelligent edge device allows it to adapt to new data and make better predictions. For instance, training a model on a smart keyboard could enable the keyboard to continually learn from the user’s writing. However, the training process requires so much memory that it is typically done using powerful computers at a data center, before the model is deployed on a device. This is more costly and raises privacy issues since user data must be sent to a central server.

    To address this problem, researchers at MIT and the MIT-IBM Watson AI Lab developed a new technique that enables on-device training using less than a quarter of a megabyte of memory. Other training solutions designed for connected devices can use more than 500 megabytes of memory, greatly exceeding the 256-kilobyte capacity of most microcontrollers (there are 1,024 kilobytes in one megabyte).

    The intelligent algorithms and framework the researchers developed reduce the amount of computation required to train a model, which makes the process faster and more memory efficient. Their technique can be used to train a machine-learning model on a microcontroller in a matter of minutes.

    This technique also preserves privacy by keeping data on the device, which could be especially beneficial when data are sensitive, such as in medical applications. It also could enable customization of a model based on the needs of users. Moreover, the framework preserves or improves the accuracy of the model when compared to other training approaches.

    “Our study enables IoT devices to not only perform inference but also continuously update the AI models to newly collected data, paving the way for lifelong on-device learning. The low resource utilization makes deep learning more accessible and can have a broader reach, especially for low-power edge devices,” says Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and senior author of the paper describing this innovation.

    Joining Han on the paper are co-lead authors and EECS PhD students Ji Lin and Ligeng Zhu, as well as MIT postdocs Wei-Ming Chen and Wei-Chen Wang, and Chuang Gan, a principal research staff member at the MIT-IBM Watson AI Lab. The research will be presented at the Conference on Neural Information Processing Systems.

    Han and his team previously addressed the memory and computational bottlenecks that exist when trying to run machine-learning models on tiny edge devices, as part of their TinyML initiative.

    Lightweight training

    A common type of machine-learning model is known as a neural network. Loosely based on the human brain, these models contain layers of interconnected nodes, or neurons, that process data to complete a task, such as recognizing people in photos. The model must be trained first, which involves showing it millions of examples so it can learn the task. As it learns, the model increases or decreases the strength of the connections between neurons, which are known as weights.

    The model may undergo hundreds of updates as it learns, and the intermediate activations must be stored during each round. In a neural network, activation is the middle layer’s intermediate results. Because there may be millions of weights and activations, training a model requires much more memory than running a pre-trained model, Han explains.

    Han and his collaborators employed two algorithmic solutions to make the training process more efficient and less memory-intensive. The first, known as sparse update, uses an algorithm that identifies the most important weights to update at each round of training. The algorithm starts freezing the weights one at a time until it sees the accuracy dip to a set threshold, then it stops. The remaining weights are updated, while the activations corresponding to the frozen weights don’t need to be stored in memory.

    “Updating the whole model is very expensive because there are a lot of activations, so people tend to update only the last layer, but as you can imagine, this hurts the accuracy. For our method, we selectively update those important weights and make sure the accuracy is fully preserved,” Han says.

    Their second solution involves quantized training and simplifying the weights, which are typically 32 bits. An algorithm rounds the weights so they are only eight bits, through a process known as quantization, which cuts the amount of memory for both training and inference. Inference is the process of applying a model to a dataset and generating a prediction. Then the algorithm applies a technique called quantization-aware scaling (QAS), which acts like a multiplier to adjust the ratio between weight and gradient, to avoid any drop in accuracy that may come from quantized training.

    The researchers developed a system, called a tiny training engine, that can run these algorithmic innovations on a simple microcontroller that lacks an operating system. This system changes the order of steps in the training process so more work is completed in the compilation stage, before the model is deployed on the edge device.

    “We push a lot of the computation, such as auto-differentiation and graph optimization, to compile time. We also aggressively prune the redundant operators to support sparse updates. Once at runtime, we have much less workload to do on the device,” Han explains.

    A successful speedup

    Their optimization only required 157 kilobytes of memory to train a machine-learning model on a microcontroller, whereas other techniques designed for lightweight training would still need between 300 and 600 megabytes.

    They tested their framework by training a computer vision model to detect people in images. After only 10 minutes of training, it learned to complete the task successfully. Their method was able to train a model more than 20 times faster than other approaches.

    Now that they have demonstrated the success of these techniques for computer vision models, the researchers want to apply them to language models and different types of data, such as time-series data. At the same time, they want to use what they’ve learned to shrink the size of larger models without sacrificing accuracy, which could help reduce the carbon footprint of training large-scale machine-learning models.

    “AI model adaptation/training on a device, especially on embedded controllers, is an open challenge. This research from MIT has not only successfully demonstrated the capabilities, but also opened up new possibilities for privacy-preserving device personalization in real-time,” says Nilesh Jain, a principal engineer at Intel who was not involved with this work. “Innovations in the publication have broader applicability and will ignite new systems-algorithm co-design research.”

    “On-device learning is the next major advance we are working toward for the connected intelligent edge. Professor Song Han’s group has shown great progress in demonstrating the effectiveness of edge devices for training,” adds Jilei Hou, vice president and head of AI research at Qualcomm. “Qualcomm has awarded his team an Innovation Fellowship for further innovation and advancement in this area.”

    This work is funded by the National Science Foundation, the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, Amazon, Intel, Qualcomm, Ford Motor Company, and Google. More

  • in

    Neurodegenerative disease can progress in newly identified patterns

    Neurodegenerative diseases — like amyotrophic lateral sclerosis (ALS, or Lou Gehrig’s disease), Alzheimer’s, and Parkinson’s — are complicated, chronic ailments that can present with a variety of symptoms, worsen at different rates, and have many underlying genetic and environmental causes, some of which are unknown. ALS, in particular, affects voluntary muscle movement and is always fatal, but while most people survive for only a few years after diagnosis, others live with the disease for decades. Manifestations of ALS can also vary significantly; often slower disease development correlates with onset in the limbs and affecting fine motor skills, while the more serious, bulbar ALS impacts swallowing, speaking, breathing, and mobility. Therefore, understanding the progression of diseases like ALS is critical to enrollment in clinical trials, analysis of potential interventions, and discovery of root causes.

    However, assessing disease evolution is far from straightforward. Current clinical studies typically assume that health declines on a downward linear trajectory on a symptom rating scale, and use these linear models to evaluate whether drugs are slowing disease progression. However, data indicate that ALS often follows nonlinear trajectories, with periods where symptoms are stable alternating with periods when they are rapidly changing. Since data can be sparse, and health assessments often rely on subjective rating metrics measured at uneven time intervals, comparisons across patient populations are difficult. These heterogenous data and progression, in turn, complicate analyses of invention effectiveness and potentially mask disease origin.

    Now, a new machine-learning method developed by researchers from MIT, IBM Research, and elsewhere aims to better characterize ALS disease progression patterns to inform clinical trial design.

    “There are groups of individuals that share progression patterns. For example, some seem to have really fast-progressing ALS and others that have slow-progressing ALS that varies over time,” says Divya Ramamoorthy PhD ’22, a research specialist at MIT and lead author of a new paper on the work that was published this month in Nature Computational Science. “The question we were asking is: can we use machine learning to identify if, and to what extent, those types of consistent patterns across individuals exist?”

    Their technique, indeed, identified discrete and robust clinical patterns in ALS progression, many of which are non-linear. Further, these disease progression subtypes were consistent across patient populations and disease metrics. The team additionally found that their method can be applied to Alzheimer’s and Parkinson’s diseases as well.

    Joining Ramamoorthy on the paper are MIT-IBM Watson AI Lab members Ernest Fraenkel, a professor in the MIT Department of Biological Engineering; Research Scientist Soumya Ghosh of IBM Research; and Principal Research Scientist Kenney Ng, also of IBM Research. Additional authors include Kristen Severson PhD ’18, a senior researcher at Microsoft Research and former member of the Watson Lab and of IBM Research; Karen Sachs PhD ’06 of Next Generation Analytics; a team of researchers with Answer ALS; Jonathan D. Glass and Christina N. Fournier of the Emory University School of Medicine; the Pooled Resource Open-Access ALS Clinical Trials Consortium; ALS/MND Natural History Consortium; Todd M. Herrington of Massachusetts General Hospital (MGH) and Harvard Medical School; and James D. Berry of MGH.

    Play video

    MIT Professor Ernest Fraenkel describes early stages of his research looking at root causes of amyotrophic lateral sclerosis (ALS).

    Reshaping health decline

    After consulting with clinicians, the team of machine learning researchers and neurologists let the data speak for itself. They designed an unsupervised machine-learning model that employed two methods: Gaussian process regression and Dirichlet process clustering. These inferred the health trajectories directly from patient data and automatically grouped similar trajectories together without prescribing the number of clusters or the shape of the curves, forming ALS progression “subtypes.” Their method incorporated prior clinical knowledge in the way of a bias for negative trajectories — consistent with expectations for neurodegenerative disease progressions — but did not assume any linearity. “We know that linearity is not reflective of what’s actually observed,” says Ng. “The methods and models that we use here were more flexible, in the sense that, they capture what was seen in the data,” without the need for expensive labeled data and prescription of parameters.

    Primarily, they applied the model to five longitudinal datasets from ALS clinical trials and observational studies. These used the gold standard to measure symptom development: the ALS functional rating scale revised (ALSFRS-R), which captures a global picture of patient neurological impairment but can be a bit of a “messy metric.” Additionally, performance on survivability probabilities, forced vital capacity (a measurement of respiratory function), and subscores of ALSFRS-R, which looks at individual bodily functions, were incorporated.

    New regimes of progression and utility

    When their population-level model was trained and tested on these metrics, four dominant patterns of disease popped out of the many trajectories — sigmoidal fast progression, stable slow progression, unstable slow progression, and unstable moderate progression — many with strong nonlinear characteristics. Notably, it captured trajectories where patients experienced a sudden loss of ability, called a functional cliff, which would significantly impact treatments, enrollment in clinical trials, and quality of life.

    The researchers compared their method against other commonly used linear and nonlinear approaches in the field to separate the contribution of clustering and linearity to the model’s accuracy. The new work outperformed them, even patient-specific models, and found that subtype patterns were consistent across measures. Impressively, when data were withheld, the model was able to interpolate missing values, and, critically, could forecast future health measures. The model could also be trained on one ALSFRS-R dataset and predict cluster membership in others, making it robust, generalizable, and accurate with scarce data. So long as 6-12 months of data were available, health trajectories could be inferred with higher confidence than conventional methods.

    The researchers’ approach also provided insights into Alzheimer’s and Parkinson’s diseases, both of which can have a range of symptom presentations and progression. For Alzheimer’s, the new technique could identify distinct disease patterns, in particular variations in the rates of conversion of mild to severe disease. The Parkinson’s analysis demonstrated a relationship between progression trajectories for off-medication scores and disease phenotypes, such as the tremor-dominant or postural instability/gait difficulty forms of Parkinson’s disease.

    The work makes significant strides to find the signal amongst the noise in the time-series of complex neurodegenerative disease. “The patterns that we see are reproducible across studies, which I don’t believe had been shown before, and that may have implications for how we subtype the [ALS] disease,” says Fraenkel. As the FDA has been considering the impact of non-linearity in clinical trial designs, the team notes that their work is particularly pertinent.

    As new ways to understand disease mechanisms come online, this model provides another tool to pick apart illnesses like ALS, Alzheimer’s, and Parkinson’s from a systems biology perspective.

    “We have a lot of molecular data from the same patients, and so our long-term goal is to see whether there are subtypes of the disease,” says Fraenkel, whose lab looks at cellular changes to understand the etiology of diseases and possible targets for cures. “One approach is to start with the symptoms … and see if people with different patterns of disease progression are also different at the molecular level. That might lead you to a therapy. Then there’s the bottom-up approach, where you start with the molecules” and try to reconstruct biological pathways that might be affected. “We’re going [to be tackling this] from both ends … and finding if something meets in the middle.”

    This research was supported, in part, by the MIT-IBM Watson AI Lab, the Muscular Dystrophy Association, Department of Veterans Affairs of Research and Development, the Department of Defense, NSF Gradate Research Fellowship Program, Siebel Scholars Fellowship, Answer ALS, the United States Army Medical Research Acquisition Activity, National Institutes of Health, and the NIH/NINDS. More

  • in

    AI that can learn the patterns of human language

    Human languages are notoriously complex, and linguists have long thought it would be impossible to teach a machine how to analyze speech sounds and word structures in the way human investigators do.

    But researchers at MIT, Cornell University, and McGill University have taken a step in this direction. They have demonstrated an artificial intelligence system that can learn the rules and patterns of human languages on its own.

    When given words and examples of how those words change to express different grammatical functions (like tense, case, or gender) in one language, this machine-learning model comes up with rules that explain why the forms of those words change. For instance, it might learn that the letter “a” must be added to end of a word to make the masculine form feminine in Serbo-Croatian.

    This model can also automatically learn higher-level language patterns that can apply to many languages, enabling it to achieve better results.

    The researchers trained and tested the model using problems from linguistics textbooks that featured 58 different languages. Each problem had a set of words and corresponding word-form changes. The model was able to come up with a correct set of rules to describe those word-form changes for 60 percent of the problems.

    This system could be used to study language hypotheses and investigate subtle similarities in the way diverse languages transform words. It is especially unique because the system discovers models that can be readily understood by humans, and it acquires these models from small amounts of data, such as a few dozen words. And instead of using one massive dataset for a single task, the system utilizes many small datasets, which is closer to how scientists propose hypotheses — they look at multiple related datasets and come up with models to explain phenomena across those datasets.

    “One of the motivations of this work was our desire to study systems that learn models of datasets that is represented in a way that humans can understand. Instead of learning weights, can the model learn expressions or rules? And we wanted to see if we could build this system so it would learn on a whole battery of interrelated datasets, to make the system learn a little bit about how to better model each one,” says Kevin Ellis ’14, PhD ’20, an assistant professor of computer science at Cornell University and lead author of the paper.

    Joining Ellis on the paper are MIT faculty members Adam Albright, a professor of linguistics; Armando Solar-Lezama, a professor and associate director of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences and a member of CSAIL; as well as senior author

    Timothy J. O’Donnell, assistant professor in the Department of Linguistics at McGill University, and Canada CIFAR AI Chair at the Mila – Quebec Artificial Intelligence Institute.

    The research is published today in Nature Communications.

    Looking at language 

    In their quest to develop an AI system that could automatically learn a model from multiple related datasets, the researchers chose to explore the interaction of phonology (the study of sound patterns) and morphology (the study of word structure).

    Data from linguistics textbooks offered an ideal testbed because many languages share core features, and textbook problems showcase specific linguistic phenomena. Textbook problems can also be solved by college students in a fairly straightforward way, but those students typically have prior knowledge about phonology from past lessons they use to reason about new problems.

    Ellis, who earned his PhD at MIT and was jointly advised by Tenenbaum and Solar-Lezama, first learned about morphology and phonology in an MIT class co-taught by O’Donnell, who was a postdoc at the time, and Albright.

    “Linguists have thought that in order to really understand the rules of a human language, to empathize with what it is that makes the system tick, you have to be human. We wanted to see if we can emulate the kinds of knowledge and reasoning that humans (linguists) bring to the task,” says Albright.

    To build a model that could learn a set of rules for assembling words, which is called a grammar, the researchers used a machine-learning technique known as Bayesian Program Learning. With this technique, the model solves a problem by writing a computer program.

    In this case, the program is the grammar the model thinks is the most likely explanation of the words and meanings in a linguistics problem. They built the model using Sketch, a popular program synthesizer which was developed at MIT by Solar-Lezama.

    But Sketch can take a lot of time to reason about the most likely program. To get around this, the researchers had the model work one piece at a time, writing a small program to explain some data, then writing a larger program that modifies that small program to cover more data, and so on.

    They also designed the model so it learns what “good” programs tend to look like. For instance, it might learn some general rules on simple Russian problems that it would apply to a more complex problem in Polish because the languages are similar. This makes it easier for the model to solve the Polish problem.

    Tackling textbook problems

    When they tested the model using 70 textbook problems, it was able to find a grammar that matched the entire set of words in the problem in 60 percent of cases, and correctly matched most of the word-form changes in 79 percent of problems.

    The researchers also tried pre-programming the model with some knowledge it “should” have learned if it was taking a linguistics course, and showed that it could solve all problems better.

    “One challenge of this work was figuring out whether what the model was doing was reasonable. This isn’t a situation where there is one number that is the single right answer. There is a range of possible solutions which you might accept as right, close to right, etc.,” Albright says.

    The model often came up with unexpected solutions. In one instance, it discovered the expected answer to a Polish language problem, but also another correct answer that exploited a mistake in the textbook. This shows that the model could “debug” linguistics analyses, Ellis says.

    The researchers also conducted tests that showed the model was able to learn some general templates of phonological rules that could be applied across all problems.

    “One of the things that was most surprising is that we could learn across languages, but it didn’t seem to make a huge difference,” says Ellis. “That suggests two things. Maybe we need better methods for learning across problems. And maybe, if we can’t come up with those methods, this work can help us probe different ideas we have about what knowledge to share across problems.”

    In the future, the researchers want to use their model to find unexpected solutions to problems in other domains. They could also apply the technique to more situations where higher-level knowledge can be applied across interrelated datasets. For instance, perhaps they could develop a system to infer differential equations from datasets on the motion of different objects, says Ellis.

    “This work shows that we have some methods which can, to some extent, learn inductive biases. But I don’t think we’ve quite figured out, even for these textbook problems, the inductive bias that lets a linguist accept the plausible grammars and reject the ridiculous ones,” he adds.

    “This work opens up many exciting venues for future research. I am particularly intrigued by the possibility that the approach explored by Ellis and colleagues (Bayesian Program Learning, BPL) might speak to how infants acquire language,” says T. Florian Jaeger, a professor of brain and cognitive sciences and computer science at the University of Rochester, who was not an author of this paper. “Future work might ask, for example, under what additional induction biases (assumptions about universal grammar) the BPL approach can successfully achieve human-like learning behavior on the type of data infants observe during language acquisition. I think it would be fascinating to see whether inductive biases that are even more abstract than those considered by Ellis and his team — such as biases originating in the limits of human information processing (e.g., memory constraints on dependency length or capacity limits in the amount of information that can be processed per time) — would be sufficient to induce some of the patterns observed in human languages.”

    This work was funded, in part, by the Air Force Office of Scientific Research, the Center for Brains, Minds, and Machines, the MIT-IBM Watson AI Lab, the Natural Science and Engineering Research Council of Canada, the Fonds de Recherche du Québec – Société et Culture, the Canada CIFAR AI Chairs Program, the National Science Foundation (NSF), and an NSF graduate fellowship. More

  • in

    A technique to improve both fairness and accuracy in artificial intelligence

    For workers who use machine-learning models to help them make decisions, knowing when to trust a model’s predictions is not always an easy task, especially since these models are often so complex that their inner workings remain a mystery.

    Users sometimes employ a technique, known as selective regression, in which the model estimates its confidence level for each prediction and will reject predictions when its confidence is too low. Then a human can examine those cases, gather additional information, and make a decision about each one manually.

    But while selective regression has been shown to improve the overall performance of a model, researchers at MIT and the MIT-IBM Watson AI Lab have discovered that the technique can have the opposite effect for underrepresented groups of people in a dataset. As the model’s confidence increases with selective regression, its chance of making the right prediction also increases, but this does not always happen for all subgroups.

    For instance, a model suggesting loan approvals might make fewer errors on average, but it may actually make more wrong predictions for Black or female applicants. One reason this can occur is due to the fact that the model’s confidence measure is trained using overrepresented groups and may not be accurate for these underrepresented groups.

    Once they had identified this problem, the MIT researchers developed two algorithms that can remedy the issue. Using real-world datasets, they show that the algorithms reduce performance disparities that had affected marginalized subgroups.

    “Ultimately, this is about being more intelligent about which samples you hand off to a human to deal with. Rather than just minimizing some broad error rate for the model, we want to make sure the error rate across groups is taken into account in a smart way,” says senior MIT author Greg Wornell, the Sumitomo Professor in Engineering in the Department of Electrical Engineering and Computer Science (EECS) who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory of Electronics (RLE) and is a member of the MIT-IBM Watson AI Lab.

    Joining Wornell on the paper are co-lead authors Abhin Shah, an EECS graduate student, and Yuheng Bu, a postdoc in RLE; as well as Joshua Ka-Wing Lee SM ’17, ScD ’21 and Subhro Das, Rameswar Panda, and Prasanna Sattigeri, research staff members at the MIT-IBM Watson AI Lab. The paper will be presented this month at the International Conference on Machine Learning.

    To predict or not to predict

    Regression is a technique that estimates the relationship between a dependent variable and independent variables. In machine learning, regression analysis is commonly used for prediction tasks, such as predicting the price of a home given its features (number of bedrooms, square footage, etc.) With selective regression, the machine-learning model can make one of two choices for each input — it can make a prediction or abstain from a prediction if it doesn’t have enough confidence in its decision.

    When the model abstains, it reduces the fraction of samples it is making predictions on, which is known as coverage. By only making predictions on inputs that it is highly confident about, the overall performance of the model should improve. But this can also amplify biases that exist in a dataset, which occur when the model does not have sufficient data from certain subgroups. This can lead to errors or bad predictions for underrepresented individuals.

    The MIT researchers aimed to ensure that, as the overall error rate for the model improves with selective regression, the performance for every subgroup also improves. They call this monotonic selective risk.

    “It was challenging to come up with the right notion of fairness for this particular problem. But by enforcing this criteria, monotonic selective risk, we can make sure the model performance is actually getting better across all subgroups when you reduce the coverage,” says Shah.

    Focus on fairness

    The team developed two neural network algorithms that impose this fairness criteria to solve the problem.

    One algorithm guarantees that the features the model uses to make predictions contain all information about the sensitive attributes in the dataset, such as race and sex, that is relevant to the target variable of interest. Sensitive attributes are features that may not be used for decisions, often due to laws or organizational policies. The second algorithm employs a calibration technique to ensure the model makes the same prediction for an input, regardless of whether any sensitive attributes are added to that input.

    The researchers tested these algorithms by applying them to real-world datasets that could be used in high-stakes decision making. One, an insurance dataset, is used to predict total annual medical expenses charged to patients using demographic statistics; another, a crime dataset, is used to predict the number of violent crimes in communities using socioeconomic information. Both datasets contain sensitive attributes for individuals.

    When they implemented their algorithms on top of a standard machine-learning method for selective regression, they were able to reduce disparities by achieving lower error rates for the minority subgroups in each dataset. Moreover, this was accomplished without significantly impacting the overall error rate.

    “We see that if we don’t impose certain constraints, in cases where the model is really confident, it could actually be making more errors, which could be very costly in some applications, like health care. So if we reverse the trend and make it more intuitive, we will catch a lot of these errors. A major goal of this work is to avoid errors going silently undetected,” Sattigeri says.

    The researchers plan to apply their solutions to other applications, such as predicting house prices, student GPA, or loan interest rate, to see if the algorithms need to be calibrated for those tasks, says Shah. They also want to explore techniques that use less sensitive information during the model training process to avoid privacy issues.

    And they hope to improve the confidence estimates in selective regression to prevent situations where the model’s confidence is low, but its prediction is correct. This could reduce the workload on humans and further streamline the decision-making process, Sattigeri says.

    This research was funded, in part, by the MIT-IBM Watson AI Lab and its member companies Boston Scientific, Samsung, and Wells Fargo, and by the National Science Foundation. More

  • in

    Hallucinating to better text translation

    As babies, we babble and imitate our way to learning languages. We don’t start off reading raw text, which requires fundamental knowledge and understanding about the world, as well as the advanced ability to interpret and infer descriptions and relationships. Rather, humans begin our language journey slowly, by pointing and interacting with our environment, basing our words and perceiving their meaning through the context of the physical and social world. Eventually, we can craft full sentences to communicate complex ideas.

    Similarly, when humans begin learning and translating into another language, the incorporation of other sensory information, like multimedia, paired with the new and unfamiliar words, like flashcards with images, improves language acquisition and retention. Then, with enough practice, humans can accurately translate new, unseen sentences in context without the accompanying media; however, imagining a picture based on the original text helps.

    This is the basis of a new machine learning model, called VALHALLA, by researchers from MIT, IBM, and the University of California at San Diego, in which a trained neural network sees a source sentence in one language, hallucinates an image of what it looks like, and then uses both to translate into a target language. The team found that their method demonstrates improved accuracy of machine translation over text-only translation. Further, it provided an additional boost for cases with long sentences, under-resourced languages, and instances where part of the source sentence is inaccessible to the machine translator.

    As a core task within the AI field of natural language processing (NLP), machine translation is an “eminently practical technology that’s being used by millions of people every day,” says study co-author Yoon Kim, assistant professor in MIT’s Department of Electrical Engineering and Computer Science with affiliations in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT-IBM Watson AI Lab. With recent, significant advances in deep learning, “there’s been an interesting development in how one might use non-text information — for example, images, audio, or other grounding information — to tackle practical tasks involving language” says Kim, because “when humans are performing language processing tasks, we’re doing so within a grounded, situated world.” The pairing of hallucinated images and text during inference, the team postulated, imitates that process, providing context for improved performance over current state-of-the-art techniques, which utilize text-only data.

    This research will be presented at the IEEE / CVF Computer Vision and Pattern Recognition Conference this month. Kim’s co-authors are UC San Diego graduate student Yi Li and Professor Nuno Vasconcelos, along with research staff members Rameswar Panda, Chun-fu “Richard” Chen, Rogerio Feris, and IBM Director David Cox of IBM Research and the MIT-IBM Watson AI Lab.

    Learning to hallucinate from images

    When we learn new languages and to translate, we’re often provided with examples and practice before venturing out on our own. The same is true for machine-translation systems; however, if images are used during training, these AI methods also require visual aids for testing, limiting their applicability, says Panda.

    “In real-world scenarios, you might not have an image with respect to the source sentence. So, our motivation was basically: Instead of using an external image during inference as input, can we use visual hallucination — the ability to imagine visual scenes — to improve machine translation systems?” says Panda.

    To do this, the team used an encoder-decoder architecture with two transformers, a type of neural network model that’s suited for sequence-dependent data, like language, that can pay attention key words and semantics of a sentence. One transformer generates a visual hallucination, and the other performs multimodal translation using outputs from the first transformer.

    During training, there are two streams of translation: a source sentence and a ground-truth image that is paired with it, and the same source sentence that is visually hallucinated to make a text-image pair. First the ground-truth image and sentence are tokenized into representations that can be handled by transformers; for the case of the sentence, each word is a token. The source sentence is tokenized again, but this time passed through the visual hallucination transformer, outputting a hallucination, a discrete image representation of the sentence. The researchers incorporated an autoregression that compares the ground-truth and hallucinated representations for congruency — e.g., homonyms: a reference to an animal “bat” isn’t hallucinated as a baseball bat. The hallucination transformer then uses the difference between them to optimize its predictions and visual output, making sure the context is consistent.

    The two sets of tokens are then simultaneously passed through the multimodal translation transformer, each containing the sentence representation and either the hallucinated or ground-truth image. The tokenized text translation outputs are compared with the goal of being similar to each other and to the target sentence in another language. Any differences are then relayed back to the translation transformer for further optimization.

    For testing, the ground-truth image stream drops off, since images likely wouldn’t be available in everyday scenarios.

    “To the best of our knowledge, we haven’t seen any work which actually uses a hallucination transformer jointly with a multimodal translation system to improve machine translation performance,” says Panda.

    Visualizing the target text

    To test their method, the team put VALHALLA up against other state-of-the-art multimodal and text-only translation methods. They used public benchmark datasets containing ground-truth images with source sentences, and a dataset for translating text-only news articles. The researchers measured its performance over 13 tasks, ranging from translation on well-resourced languages (like English, German, and French), under-resourced languages (like English to Romanian) and non-English (like Spanish to French). The group also tested varying transformer model sizes, how accuracy changes with the sentence length, and translation under limited textual context, where portions of the text were hidden from the machine translators.

    The team observed significant improvements over text-only translation methods, improving data efficiency, and that smaller models performed better than the larger base model. As sentences became longer, VALHALLA’s performance over other methods grew, which the researchers attributed to the addition of more ambiguous words. In cases where part of the sentence was masked, VALHALLA could recover and translate the original text, which the team found surprising.

    Further unexpected findings arose: “Where there weren’t as many training [image and] text pairs, [like for under-resourced languages], improvements were more significant, which indicates that grounding in images helps in low-data regimes,” says Kim. “Another thing that was quite surprising to me was this improved performance, even on types of text that aren’t necessarily easily connectable to images. For example, maybe it’s not so surprising if this helps in translating visually salient sentences, like the ‘there is a red car in front of the house.’ [However], even in text-only [news article] domains, the approach was able to improve upon text-only systems.”

    While VALHALLA performs well, the researchers note that it does have limitations, requiring pairs of sentences to be annotated with an image, which could make it more expensive to obtain. It also performs better in its ground domain and not the text-only news articles. Moreover, Kim and Panda note, a technique like VALHALLA is still a black box, with the assumption that hallucinated images are providing helpful information, and the team plans to investigate what and how the model is learning in order to validate their methods.

    In the future, the team plans to explore other means of improving translation. “Here, we only focus on images, but there are other types of a multimodal information — for example, speech, video or touch, or other sensory modalities,” says Panda. “We believe such multimodal grounding can lead to even more efficient machine translation models, potentially benefiting translation across many low-resource languages spoken in the world.”

    This research was supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation. More

  • in

    Making data visualization more accessible for blind and low-vision individuals

    Data visualizations on the web are largely inaccessible for blind and low-vision individuals who use screen readers, an assistive technology that reads on-screen elements as text-to-speech. This excludes millions of people from the opportunity to probe and interpret insights that are often presented through charts, such as election results, health statistics, and economic indicators. 

    When a designer attempts to make a visualization accessible, best practices call for including a few sentences of text that describe the chart and a link to the underlying data table — a far cry from the rich reading experience available to sighted users.

    An interdisciplinary team of researchers from MIT and elsewhere is striving to create screen-reader-friendly data visualizations that offer a similarly rich experience. They prototyped several visualization structures that provide text descriptions at varying levels of detail, enabling a screen-reader user to drill down from high-level data to more detailed information using just a few keystrokes.

    The MIT team embarked on an iterative co-design process with collaborator Daniel Hajas, a researcher at University College London who works with the Global Disability Innovation Hub and lost his sight at age 16. They collaborated to develop prototypes and ran a detailed user study with blind and low-vision individuals to gather feedback.

    “Researchers might see some connections between problems and be aware of potential solutions, but very often they miss it by a little bit. Insights from people who have the lived experience of a certain specific, measurable problem are really important for a lot of disability-related solutions. I think we found a really nice fit,” says Hajas.

    They created a framework to help designers think systematically about how to develop accessible visualizations. In the future, they plan to use their prototypes and design framework to build a user-friendly tool that could convert visualizations into accessible formats.

    MIT collaborators include co-lead authors and Computer Science and Artificial Intelligence Laboratory (CSAIL) graduate students Jonathan Zong, Crystal Lee, and Alan Lundgard, as well as JiWoong Jang, an undergraduate at Carnegie Mellon University who worked on this project during MIT’s Summer Research Program (MSRP), and senior author Arvind Satyanarayan, assistant professor of computer science who leads the Visualization Group in CSAIL. The research paper, which will be presented at the Eurographics Conference on Visualization, won a best paper honorable mention award.

    “Push what is possible”

    The researchers defined three design dimensions as key to making accessible visualizations: structure, navigation, and description. Structure involves arranging the information into a hierarchy. Navigation refers to how the user moves through different levels of detail. Description is how the information is spoken, including how much information is conveyed.

    Using these design dimensions, they developed several visualization prototypes that emphasized ease-of-navigation for screen-reader users. One prototype, known as multiview, enabled individuals to use the up and down arrows to navigate between different levels of information (like the chart title as the top level, the legend as the second level, etc.), and the right and left arrow keys to cycle through information on the same level (such as adjacent scatterplots). Another prototype, known as target, included the same arrow key navigation but also a drop-down menu of key chart locations so the user could quickly jump to an area of interest.

    “Our goal is not just to work within existing standards to make them serviceable. We really set out to do grounded speculation and imagine where we can push what is possible with these existing standards. We didn’t want to limit ourselves to refitting tools that were designed for images,” says Zong.

    They tested these prototypes and an accessible data table, the existing best practice for accessible visualizations, with 13 blind and visually impaired screen-reader users. They asked users to rate each tool on several criteria, including how easy it was to learn and how easy it was to locate data or answer questions.

    “One thing I thought was really interesting was how much people were constantly testing their own hypotheses or trying to make specific patterns as they moved through the visualization. The implication for navigation is that you want to be able to orient yourself within the visualization so you know where the limits are,” says Lee. “Can you accurately and easily know where the walls are in the room you are exploring?”

    Improved insights

    Users said both prototypes enabled them to more rapidly identify patterns in the data. Scrolling from a high level to deeper levels of information helped them gain insights more easily than when browsing the data table, they said. They also enjoyed faster navigation using the menu in the target prototype.

    But the data table got top marks for ease of use.

    “I expected people to be disappointed with the everyday tools when compared to the new prototypes, but they still clung to the data table a bit, likely because of their familiarity with it. That shows that principles like familiarity, learnability, and usability still matter. No matter how ‘good’ our new invention is, if it is not easy enough to learn, people might stick with an older version,” Hajas says.

    Drawing on these insights, the researchers are refining the prototypes and using them to build a software package that can be used with existing design tools to give visualizations an accessible, navigable structure.

    They also want to explore multimodal solutions. Some study participants used different devices together, like screen readers and braille displays, or data sonification tools that convey information using non-speech audio. How these tools can complement each other when applied to a visualization is still an open question, Zong says.

    In the long-run, they hope their work might lead to careful rethinking of web accessibility standards.

    “There is no one-size-fits-all solution for accessibility. While existing standards don’t presume that, they only offer simple approaches, like data tables and alt text. One of the key benefits of our research contribution is that we are proposing a framework — different preferences and data representations are situated at different points in this design space,” says Lundgard.

    “We have been working hard toward reducing the inequities that screen-reader users face when extracting information from online data visualizations for the past few years. So, we are really appreciative of this work and the knowledge that it adds to the existing literature,” says Ather Sharif, a graduate student who researches accessibility and visualization in the labs of professors Jacob Wobbrock and Katharina Reinecke at the Paul G. Allen School of Computer Science and Engineering of the University of Washington at Seattle, and who was not involved with this work.

    “I like to think of it as a movement where we’re all finally coming together and improving the experiences of a demographic that has been largely ignored, especially when presenting data through visualizations. Kudos to Jonathan, Arvind, and their team for this insightful and timely work! I am looking forward to what’s next,” adds Sharif, who is lead author of several recent papers related to accessible data visualizations.

    Amy Bower, a senior scientist in the Department of Physical Oceanography at the Woods Hole Oceanographic Institution who suffers from a degenerative retinal disease and uses a screen reader extensively in her work as a researcher and also for basic living tasks, found the researchers’ explanations of the importance of co-design to be powerful and compelling.  

    “As a blind scientist, I’m constantly searching for effective tools that will allow me to access the information conveyed in data visualizations. The layered approach taken by these researchers, which provides the option to get the ‘big picture’ from the data as well as drill down into the data points themselves, allows the user to choose how they want to explore the data,” says Bower, who also was not involved with this work. “I think the ability to freely explore the data is necessary not just to learn the ‘story’ that the data are telling, but to allow a blind researcher such as myself to formulate the next questions that need to be tackled to advance understanding in any field of study.”

    This work was supported, in part, by the National Science Foundation.   More

  • in

    Is it topological? A new materials database has the answer

    What will it take to make our electronics smarter, faster, and more resilient? One idea is to build them from materials that are topological.

    Topology stems from a branch of mathematics that studies shapes that can be manipulated or deformed without losing certain core properties. A donut is a common example: If it were made of rubber, a donut could be twisted and squeezed into a completely new shape, such as a coffee mug, while retaining a key trait — namely, its center hole, which takes the form of the cup’s handle. The hole, in this case, is a topological trait, robust against certain deformations.

    In recent years, scientists have applied concepts of topology to the discovery of materials with similarly robust electronic properties. In 2007, researchers predicted the first electronic topological insulators — materials in which electrons that behave in ways that are “topologically protected,” or persistent in the face of certain disruptions.

    Since then, scientists have searched for more topological materials with the aim of building better, more robust electronic devices. Until recently, only a handful of such materials were identified, and were therefore assumed to be a rarity.

    Now researchers at MIT and elsewhere have discovered that, in fact, topological materials are everywhere, if you know how to look for them.

    In a paper published today in Science, the team, led by Nicolas Regnault of Princeton University and the École Normale Supérieure Paris, reports harnessing the power of multiple supercomputers to map the electronic structure of more than 96,000 natural and synthetic crystalline materials. They applied sophisticated filters to determine whether and what kind of topological traits exist in each structure.

    Overall, they found that 90 percent of all known crystalline structures contain at least one topological property, and more than 50 percent of all naturally occurring materials exhibit some sort of topological behavior.

    “We found there’s a ubiquity — topology is everywhere,” says Benjamin Wieder, the study’s co-lead, and a postdoc in MIT’s Department of Physics.

    The team has compiled the newly identified materials into a new, freely accessible Topological Materials Database resembling a periodic table of topology. With this new library, scientists can quickly search materials of interest for any topological properties they might hold, and harness them to build ultra-low-power transistors, new magnetic memory storage, and other devices with robust electronic properties.

    The paper includes co-lead author Maia Vergniory of the Donostia International Physics Center, Luis Elcoro of the University of Basque Country, Stuart Parkin and Claudia Felser of the Max Planck Institute, and Andrei Bernevig of Princeton University.

    Beyond intuition

    The new study was motivated by a desire to speed up the traditional search for topological materials.

    “The way the original materials were found was through chemical intuition,” Wieder says. “That approach had a lot of early successes. But as we theoretically predicted more kinds of topological phases, it seemed intuition wasn’t getting us very far.”

    Wieder and his colleagues instead utilized an efficient and systematic method to root out signs of topology, or robust electronic behavior, in all known crystalline structures, also known as inorganic solid-state materials.

    For their study, the researchers looked to the Inorganic Crystal Structure Database, or ICSD, a repository into which researchers enter the atomic and chemical structures of crystalline materials that they have studied. The database includes materials found in nature, as well as those that have been synthesized and manipulated in the lab. The ICSD is currently the largest materials database in the world, containing over 193,000 crystals whose structures have been mapped and characterized.

    The team downloaded the entire ICSD, and after performing some data cleaning to weed out structures with corrupted files or incomplete data, the researchers were left with just over 96,000 processable structures. For each of these structures, they performed a set of calculations based on fundamental knowledge of the relation between chemical constituents, to produce a map of the material’s electronic structure, also known as the electron band structure.

    The team was able to efficiently carry out the complicated calculations for each structure using multiple supercomputers, which they then employed to perform a second set of operations, this time to screen for various known topological phases, or persistent electrical behavior in each crystal material.

    “We’re looking for signatures in the electronic structure in which certain robust phenomena should occur in this material,” explains Wieder, whose previous work involved refining and expanding the screening technique, known as topological quantum chemistry.

    From their high-throughput analysis, the team quickly discovered a surprisingly large number of materials that are naturally topological, without any experimental manipulation, as well as materials that can be manipulated, for instance with light or chemical doping, to exhibit some sort of robust electronic behavior. They also discovered a handful of materials that contained more than one topological state when exposed to certain conditions.

    “Topological phases of matter in 3D solid-state materials have been proposed as venues for observing and manipulating exotic effects, including the interconversion of electrical current and electron spin, the tabletop simulation of exotic theories from high-energy physics, and even, under the right conditions, the storage and manipulation of quantum information,” Wieder notes. 

    For experimentalists who are studying such effects, Wieder says the team’s new database now reveals a menagerie of new materials to explore.

    This research was funded, in part, by the U.S. Department of Energy, the National Science Foundation, and the Office of Naval Research. More

  • in

    System helps severely motor-impaired individuals type more quickly and accurately

    In 1995, French fashion magazine editor Jean-Dominique Bauby suffered a seizure while driving a car, which left him with a condition known as locked-in syndrome, a neurological disease in which the patient is completely paralyzed and can only move muscles that control the eyes.

    Bauby, who had signed a book contract shortly before his accident, wrote the memoir “The Diving Bell and the Butterfly” using a dictation system in which his speech therapist recited the alphabet and he would blink when she said the correct letter. They wrote the 130-page book one blink at a time.

    Technology has come a long way since Bauby’s accident. Many individuals with severe motor impairments caused by locked-in syndrome, cerebral palsy, amyotrophic lateral sclerosis, or other conditions can communicate using computer interfaces where they select letters or words in an onscreen grid by activating a single switch, often by pressing a button, releasing a puff of air, or blinking.

    But these row-column scanning systems are very rigid, and, similar to the technique used by Bauby’s speech therapist, they highlight each option one at a time, making them frustratingly slow for some users. And they are not suitable for tasks where options can’t be arranged in a grid, like drawing, browsing the web, or gaming.

    A more flexible system being developed by researchers at MIT places individual selection indicators next to each option on a computer screen. The indicators can be placed anywhere — next to anything someone might click with a mouse — so a user does not need to cycle through a grid of choices to make selections. The system, called Nomon, incorporates probabilistic reasoning to learn how users make selections, and then adjusts the interface to improve their speed and accuracy.

    Participants in a user study were able to type faster using Nomon than with a row-column scanning system. The users also performed better on a picture selection task, demonstrating how Nomon could be used for more than typing.

    “It is so cool and exciting to be able to develop software that has the potential to really help people. Being able to find those signals and turn them into communication as we are used to it is a really interesting problem,” says senior author Tamara Broderick, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.

    Joining Broderick on the paper are lead author Nicholas Bonaker, an EECS graduate student; Emli-Mari Nel, head of innovation and machine learning at Averly and a visiting lecturer at the University of Witwatersrand in South Africa; and Keith Vertanen, an associate professor at Michigan Tech. The research is being presented at the ACM Conference on Human Factors in Computing Systems.

    On the clock

    In the Nomon interface, a small analog clock is placed next to every option the user can select. (A gnomon is the part of a sundial that casts a shadow.) The user looks at one option and then clicks their switch when that clock’s hand passes a red “noon” line. After each click, the system changes the phases of the clocks to separate the most probable next targets. The user clicks repeatedly until their target is selected.

    When used as a keyboard, Nomon’s machine-learning algorithms try to guess the next word based on previous words and each new letter as the user makes selections.

    Broderick developed a simplified version of Nomon several years ago but decided to revisit it to make the system easier for motor-impaired individuals to use. She enlisted the help of then-undergraduate Bonaker to redesign the interface.

    They first consulted nonprofit organizations that work with motor-impaired individuals, as well as a motor-impaired switch user, to gather feedback on the Nomon design.

    Then they designed a user study that would better represent the abilities of motor-impaired individuals. They wanted to make sure to thoroughly vet the system before using much of the valuable time of motor-impaired users, so they first tested on non-switch users, Broderick explains.

    Switching up the switch

    To gather more representative data, Bonaker devised a webcam-based switch that was harder to use than simply clicking a key. The non-switch users had to lean their bodies to one side of the screen and then back to the other side to register a click.

    “And they have to do this at precisely the right time, so it really slows them down. We did some empirical studies which showed that they were much closer to the response times of motor-impaired individuals,” Broderick says.

    They ran a 10-session user study with 13 non-switch participants and one single-switch user with an advanced form of spinal muscular dystrophy. In the first nine sessions, participants used Nomon and a row-column scanning interface for 20 minutes each to perform text entry, and in the 10th session they used the two systems for a picture selection task.

    Non-switch users typed 15 percent faster using Nomon, while the motor-impaired user typed even faster than the non-switch users. When typing unfamiliar words, the users were 20 percent faster overall and made half as many errors. In their final session, they were able to complete the picture selection task 36 percent faster using Nomon.

    “Nomon is much more forgiving than row-column scanning. With row-column scanning, even if you are just slightly off, now you’ve chosen B instead of A and that’s an error,” Broderick says.

    Adapting to noisy clicks

    With its probabilistic reasoning, Nomon incorporates everything it knows about where a user is likely to click to make the process faster, easier, and less error-prone. For instance, if the user selects “Q,” Nomon will make it as easy as possible for the user to select “U” next.

    Nomon also learns how a user clicks. So, if the user always clicks a little after the clock’s hand strikes noon, the system adapts to that in real time. It also adapts to noisiness. If a user’s click is often off the mark, the system requires extra clicks to ensure accuracy.

    This probabilistic reasoning makes Nomon powerful but also requires a higher click-load than row-column scanning systems. Clicking multiple times can be a trying task for severely motor-impaired users.

    Broderick hopes to reduce the click-load by incorporating gaze tracking into Nomon, which would give the system more robust information about what a user might choose next based on which part of the screen they are looking at. The researchers also want to find a better way to automatically adjust the clock speeds to help users be more accurate and efficient.

    They are working on a new series of studies in which they plan to partner with more motor-impaired users.

    “So far, the feedback from motor-impaired users has been invaluable to us; we’re very grateful to the motor-impaired user who commented on our initial interface and the separate motor-impaired user who participated in our study. We’re currently extending our study to work with a bigger and more diverse group of our target population. With their help, we’re already making further improvements to our interface and working to better understand the performance of Nomon,” she says.

    “Nonspeaking individuals with motor disabilities are currently not provided with efficient communication solutions for interacting with either speaking partners or computer systems. This ‘communication gap’ is a known unresolved problem in human-computer interaction, and so far there are no good solutions. This paper demonstrates that a highly creative approach underpinned by a statistical model can provide tangible performance gains to the users who need it the most: nonspeaking individuals reliant on a single switch to communicate,” says Per Ola Kristensson, professor of interactive systems engineering at Cambridge University, who was not involved with this research. “The paper also demonstrates the value of complementing insights from computational experiments with the involvement of end-users and other stakeholders in the design process. I find this a highly creative and important paper in an area where it is notoriously difficult to make significant progress.”

    This research was supported, in part, by the Seth Teller Memorial Fund to Advanced Technology for People with Disabilities, a Peter J. Eloranta Summer Undergraduate Research Fellowship, the MIT Quest for Intelligence, and the National Science Foundation. More