More stories

  • in

    Robots play with play dough

    The inner child in many of us feels an overwhelming sense of joy when stumbling across a pile of the fluorescent, rubbery mixture of water, salt, and flour that put goo on the map: play dough. (Even if this happens rarely in adulthood.)

    While manipulating play dough is fun and easy for 2-year-olds, the shapeless sludge is hard for robots to handle. Machines have become increasingly reliable with rigid objects, but manipulating soft, deformable objects comes with a laundry list of technical challenges, and most importantly, as with most flexible structures, if you move one part, you’re likely affecting everything else. 

    Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stanford University recently let robots take their hand at playing with the modeling compound, but not for nostalgia’s sake. Their new system learns directly from visual inputs to let a robot with a two-fingered gripper see, simulate, and shape doughy objects. “RoboCraft” could reliably plan a robot’s behavior to pinch and release play dough to make various letters, including ones it had never seen. With just 10 minutes of data, the two-finger gripper rivaled human counterparts that teleoperated the machine — performing on-par, and at times even better, on the tested tasks. 

    “Modeling and manipulating objects with high degrees of freedom are essential capabilities for robots to learn how to enable complex industrial and household interaction tasks, like stuffing dumplings, rolling sushi, and making pottery,” says Yunzhu Li, CSAIL PhD student and author on a new paper about RoboCraft. “While there’s been recent advances in manipulating clothes and ropes, we found that objects with high plasticity, like dough or plasticine — despite ubiquity in those household and industrial settings — was a largely underexplored territory. With RoboCraft, we learn the dynamics models directly from high-dimensional sensory data, which offers a promising data-driven avenue for us to perform effective planning.” 

    Play video

    With undefined, smooth material, the whole structure needs to be accounted for before you can do any type of efficient and effective modeling and planning. By turning the images into graphs of little particles, coupled with algorithms, RoboCraft, using a graph neural network as the dynamics model, makes more accurate predictions about the material’s change of shapes. 

    Typically, researchers have used complex physics simulators to model and understand force and dynamics being applied to objects, but RoboCraft simply uses visual data. The inner-workings of the system relies on three parts to shape soft material into, say, an “R.” 

    The first part — perception — is all about learning to “see.” It uses cameras to collect raw, visual sensor data from the environment, which are then turned into little clouds of particles to represent the shapes. A graph-based neural network then uses said particle data to learn to “simulate” the object’s dynamics, or how it moves. Then, algorithms help plan the robot’s behavior so it learns to “shape” a blob of dough, armed with the training data from the many pinches. While the letters are a bit loose, they’re indubitably representative. 

    Besides cutesy shapes, the team is (actually) working on making dumplings from dough and a prepared filling. Right now, with just a two finger gripper, it’s a big ask. RoboCraft would need additional tools (a baker needs multiple tools to cook; so do robots) — a rolling pin, a stamp, and a mold. 

    A more far in the future domain the scientists envision is using RoboCraft for assistance with household tasks and chores, which could be of particular help to the elderly or those with limited mobility. To accomplish this, given the many obstructions that could take place, a much more adaptive representation of the dough or item would be needed, and as well as exploration into what class of models might be suitable to capture the underlying structural systems. 

    “RoboCraft essentially demonstrates that this predictive model can be learned in very data-efficient ways to plan motion. In the long run, we are thinking about using various tools to manipulate materials,” says Li. “If you think about dumpling or dough making, just one gripper wouldn’t be able to solve it. Helping the model understand and accomplish longer-horizon planning tasks, such as, how the dough will deform given the current tool, movements and actions, is a next step for future work.” 

    Li wrote the paper alongside Haochen Shi, Stanford master’s student; Huazhe Xu, Stanford postdoc; Zhiao Huang, PhD student at the University of California at San Diego; and Jiajun Wu, assistant professor at Stanford. They will present the research at the Robotics: Science and Systems conference in New York City. The work is in part supported by the Stanford Institute for Human-Centered AI (HAI), the Samsung Global Research Outreach (GRO) Program, the Toyota Research Institute (TRI), and Amazon, Autodesk, Salesforce, and Bosch. More

  • in

    Researchers release open-source photorealistic simulator for autonomous driving

    Hyper-realistic virtual worlds have been heralded as the best driving schools for autonomous vehicles (AVs), since they’ve proven fruitful test beds for safely trying out dangerous driving scenarios. Tesla, Waymo, and other self-driving companies all rely heavily on data to enable expensive and proprietary photorealistic simulators, since testing and gathering nuanced I-almost-crashed data usually isn’t the most easy or desirable to recreate. 

    To that end, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being open-sourced to the public. 

    “Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says MIT Professor and CSAIL Director Daniela Rus, senior author on a paper about the research. 

    Play video

    VISTA is a data-driven, photorealistic simulator for autonomous driving. It can simulate not just live video but LiDAR data and event cameras, and also incorporate other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found below.

    VISTA 2.0 builds off of the team’s previous model, VISTA, and it’s fundamentally different from existing AV simulators since it’s data-driven — meaning it was built and photorealistically rendered from real-world data — thereby enabling direct transfer to reality. While the initial iteration supported only single car lane-following with one camera sensor, achieving high-fidelity data-driven simulation required rethinking the foundations of how different sensors and behavioral interactions can be synthesized. 

    Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massively interactive scenarios and intersections at scale. With much less data than previous models, the team was able to train autonomous vehicles that could be substantially more robust than those trained on large amounts of real-world data. 

    “This is a massive jump in capabilities of data-driven simulation for autonomous vehicles, as well as the increase of scale and ability to handle greater driving complexity,” says Alexander Amini, CSAIL PhD student and co-lead author on two new papers, together with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high dimensional 3D lidars with millions of points, irregularly timed event-based cameras, and even interactive and dynamic scenarios with other vehicles as well.” 

    The team was able to scale the complexity of the interactive driving tasks for things like overtaking, following, and negotiating, including multiagent scenarios in highly photorealistic environments. 

    Training AI models for autonomous vehicles involves hard-to-secure fodder of different varieties of edge cases and strange, dangerous scenarios, because most of our data (thankfully) is just run-of-the-mill, day-to-day driving. Logically, we can’t just crash into other cars just to teach a neural network how to not crash into other cars.

    Recently, there’s been a shift away from more classic, human-designed simulation environments to those built up from real-world data. The latter have immense photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question has emerged: Can the richness and complexity of all of the sensors that autonomous vehicles need, such as lidar and event-based cameras that are more sparse, accurately be synthesized? 

    Lidar sensor data is much harder to interpret in a data-driven world — you’re effectively trying to generate brand-new 3D point clouds with millions of points, only from sparse views of the world. To synthesize 3D lidar point clouds, the team used the data that the car collected, projected it into a 3D space coming from the lidar data, and then let a new virtual vehicle drive around locally from where that original vehicle was. Finally, they projected all of that sensory information back into the frame of view of this new virtual vehicle, with the help of neural networks. 

    Together with the simulation of event-based cameras, which operate at speeds greater than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but also doing so all in real time — making it possible to train neural nets offline, but also test online on the car in augmented reality setups for safe evaluations. “The question of if multisensor simulation at this scale of complexity and photorealism was possible in the realm of data-driven simulation was very much an open question,” says Amini. 

    With that, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and just drop in brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following, and more dicey scenarios like static and dynamic overtaking (seeing obstacles and moving around so you don’t collide). With the multi-agency, both real and simulated agents interact, and new agents can be dropped into the scene and controlled any which way. 

    Taking their full-scale car out into the “wild” — a.k.a. Devens, Massachusetts — the team saw  immediate transferability of results, with both failures and successes. They were also able to demonstrate the bodacious, magic word of self-driving car models: “robust.” They showed that AVs, trained entirely in VISTA 2.0, were so robust in the real world that they could handle that elusive tail of challenging failures. 

    Now, one guardrail humans rely on that can’t yet be simulated is human emotion. It’s the friendly wave, nod, or blinker switch of acknowledgement, which are the type of nuances the team wants to implement in future work. 

    “The central algorithm of this research is how we can take a dataset and build a completely synthetic world for learning and autonomy,” says Amini. “It’s a platform that I believe one day could extend in many different axes across robotics. Not just autonomous driving, but many areas that rely on vision and complex behaviors. We’re excited to release VISTA 2.0 to help enable the community to collect their own datasets and convert them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds, and then can directly transfer them to full-sized, real self-driving cars.” 

    Amini and Wang wrote the paper alongside Zhijian Liu, MIT CSAIL PhD student; Igor Gilitschenski, assistant professor in computer science at the University of Toronto; Wilko Schwarting, AI research scientist and MIT CSAIL PhD ’20; Song Han, associate professor at MIT’s Department of Electrical Engineering and Computer Science; Sertac Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Rus, MIT professor and CSAIL director. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia. 

    This work was supported by the National Science Foundation and Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of the Drive AGX Pegasus. More

  • in

    Companies use MIT research to identify and respond to supply chain risks

    In February 2020, MIT professor David Simchi-Levi predicted the future. In an article in Harvard Business Review, he and his colleague warned that the new coronavirus outbreak would throttle supply chains and shutter tens of thousands of businesses across North America and Europe by mid-March.

    For Simchi-Levi, who had developed new models of supply chain resiliency and advised major companies on how to best shield themselves from supply chain woes, the signs of disruption were plain to see. Two years later, the professor of engineering systems at the MIT Schwarzman College of Computing and the Department of Civil and Environmental Engineering, and director of the MIT Data Science Lab has found a “flood of interest” from companies anxious to apply his Risk Exposure Index (REI) research to identify and respond to hidden risks in their own supply chains.

    His work on “stress tests” for critical supply chains and ways to guide global supply chain recovery were included in the 2022 Economic Report of the President presented to the U.S. Congress in April.

    It is rare that data science research can influence policy at the highest levels, Simchi-Levi says, but his models reflect something that business needs now: a new world of continuing global crisis, without relying on historical precedent.

    “What the last two years showed is that you cannot plan just based on what happened last year or the last two years,” Simchi-Levi says.

    He recalled the famous quote, sometimes attributed to hockey great Wayne Gretzsky, that good players don’t skate to where the puck is, but where the puck is going to be. “We are not focusing on the state of the supply chain right now, but what may happen six weeks from now, eight weeks from now, to prepare ourselves today to prevent the problems of the future.”

    Finding hidden risks

    At the heart of REI is a mathematical model of the supply chain that focuses on potential failures at different supply chain nodes — a flood at a supplier’s factory, or a shortage of raw materials at another factory, for instance. By calculating variables such as “time-to-recover” (TTR), which measures how long it will take a particular node to be back at full function, and time-to-survive (TTS), which identifies the maximum duration that the supply chain can match supply with demand after a disruption, the model focuses on the impact of disruption on the supply chain, rather than the cause of disruption.

    Even before the pandemic, catastrophic events such as the 2010 Iceland volcanic eruption and the 2011 Tohoku earthquake and tsunami in Japan were threatening these nodes. “For many years, companies from a variety of industries focused mostly on efficiency, cutting costs as much as possible, using strategies like outsourcing and offshoring,” Simchi-Levi says. “They were very successful doing this, but it has dramatically increased their exposure to risk.”

    Using their model, Simchi-Levi and colleagues began working with Ford Motor Company in 2013 to improve the company’s supply chain resiliency. The partnership uncovered some surprising hidden risks.

    To begin with, the researchers found out that Ford’s “strategic suppliers” — the nodes of the supply chain where the company spent large amount of money each year — had only moderate exposure to risk. Instead, the biggest risk “tended to come from tiny suppliers that provide Ford with components that cost about 10 cents,” says Simchi-Levi.

    The analysis also found that risky suppliers are everywhere across the globe. “There is this idea that if you just move suppliers closer to market, to demand, to North America or to Mexico, you increase the resiliency of your supply chain. That is not supported by our data,” he says.

    Rewards of resiliency

    By creating a virtual representation, or “digital twin,” of the Ford supply chain, the researchers were able to test out strategies at each node to see what would increase supply chain resiliency. Should the company invest in more warehouses to store a key component? Should it shift production of a component to another factory?

    Companies are sometimes reluctant to invest in supply chain resiliency, Simchi-Levi says, but the analysis isn’t just about risk. “It’s also going to help you identify savings opportunities. The company may be building a lot of misplaced, costly inventory, for instance, and our method helps them to identify these inefficiencies and cut costs.”

    Since working with Ford, Simchi-Levi and colleagues have collaborated with many other companies, including a partnership with Accenture, to scale the REI technology to a variety of industries including high-tech, industrial equipment, home improvement retailers, fashion retailers, and consumer packaged goods.

    Annette Clayton, the CEO of Schneider Electric North America and previously its chief supply chain officer, has worked with Simchi-Levi for 17 years. “When I first went to work for Schneider, I asked David and his team to help us look at resiliency and inventory positioning in order to make the best cost, delivery, flexibility, and speed trade-offs for the North American supply chain,” she says. “As the pandemic unfolded, the very learnings in supply chain resiliency we had worked on before became even more important and we partnered with David and his team again,”

    “We have used TTR and TTS to determine places where we need to develop and duplicate supplier capability, from raw materials to assembled parts. We increased inventories where our time-to-recover because of extended logistics times exceeded our time-to-survive,” Clayton adds. “We have used TTR and TTS to prioritize our workload in supplier development, procurement and expanding our own manufacturing capacity.”

    The REI approach can even be applied to an entire country’s economy, as the U.N. Office for Disaster Risk Reduction has done for developing countries such as Thailand in the wake of disastrous flooding in 2011.

    Simchi-Levi and colleagues have been motivated by the pandemic to enhance the REI model with new features. “Because we have started collaborating with more companies, we have realized some interesting, company-specific business constraints,” he says, which are leading to more efficient ways of calculating hidden risk. More

  • in

    New CRISPR-based map ties every human gene to its function

    The Human Genome Project was an ambitious initiative to sequence every piece of human DNA. The project drew together collaborators from research institutions around the world, including MIT’s Whitehead Institute for Biomedical Research, and was finally completed in 2003. Now, over two decades later, MIT Professor Jonathan Weissman and colleagues have gone beyond the sequence to present the first comprehensive functional map of genes that are expressed in human cells. The data from this project, published online June 9 in Cell, ties each gene to its job in the cell, and is the culmination of years of collaboration on the single-cell sequencing method Perturb-seq.

    The data are available for other scientists to use. “It’s a big resource in the way the human genome is a big resource, in that you can go in and do discovery-based research,” says Weissman, who is also a member of the Whitehead Institute and an investigator with the Howard Hughes Medical Institute. “Rather than defining ahead of time what biology you’re going to be looking at, you have this map of the genotype-phenotype relationships and you can go in and screen the database without having to do any experiments.”

    The screen allowed the researchers to delve into diverse biological questions. They used it to explore the cellular effects of genes with unknown functions, to investigate the response of mitochondria to stress, and to screen for genes that cause chromosomes to be lost or gained, a phenotype that has proved difficult to study in the past. “I think this dataset is going to enable all sorts of analyses that we haven’t even thought up yet by people who come from other parts of biology, and suddenly they just have this available to draw on,” says former Weissman Lab postdoc Tom Norman, a co-senior author of the paper.

    Pioneering Perturb-seq

    The project takes advantage of the Perturb-seq approach that makes it possible to follow the impact of turning on or off genes with unprecedented depth. This method was first published in 2016 by a group of researchers including Weissman and fellow MIT professor Aviv Regev, but could only be used on small sets of genes and at great expense.

    The massive Perturb-seq map was made possible by foundational work from Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of the present paper. Replogle, in collaboration with Norman, who now leads a lab at Memorial Sloan Kettering Cancer Center; Britt Adamson, an assistant professor in the Department of Molecular Biology at Princeton University; and a group at 10x Genomics, set out to create a new version of Perturb-seq that could be scaled up. The researchers published a proof-of-concept paper in Nature Biotechnology in 2020. 

    The Perturb-seq method uses CRISPR-Cas9 genome editing to introduce genetic changes into cells, and then uses single-cell RNA sequencing to capture information about the RNAs that are expressed resulting from a given genetic change. Because RNAs control all aspects of how cells behave, this method can help decode the many cellular effects of genetic changes.

    Since their initial proof-of-concept paper, Weissman, Regev, and others have used this sequencing method on smaller scales. For example, the researchers used Perturb-seq in 2021 to explore how human and viral genes interact over the course of an infection with HCMV, a common herpesvirus.

    In the new study, Replogle and collaborators including Reuben Saunders, a graduate student in Weissman’s lab and co-first author of the paper, scaled up the method to the entire genome. Using human blood cancer cell lines as well noncancerous cells derived from the retina, he performed Perturb-seq across more than 2.5 million cells, and used the data to build a comprehensive map tying genotypes to phenotypes.

    Delving into the data

    Upon completing the screen, the researchers decided to put their new dataset to use and examine a few biological questions. “The advantage of Perturb-seq is it lets you get a big dataset in an unbiased way,” says Tom Norman. “No one knows entirely what the limits are of what you can get out of that kind of dataset. Now, the question is, what do you actually do with it?”

    The first, most obvious application was to look into genes with unknown functions. Because the screen also read out phenotypes of many known genes, the researchers could use the data to compare unknown genes to known ones and look for similar transcriptional outcomes, which could suggest the gene products worked together as part of a larger complex.

    The mutation of one gene called C7orf26 in particular stood out. Researchers noticed that genes whose removal led to a similar phenotype were part of a protein complex called Integrator that played a role in creating small nuclear RNAs. The Integrator complex is made up of many smaller subunits — previous studies had suggested 14 individual proteins — and the researchers were able to confirm that C7orf26 made up a 15th component of the complex.

    They also discovered that the 15 subunits worked together in smaller modules to perform specific functions within the Integrator complex. “Absent this thousand-foot-high view of the situation, it was not so clear that these different modules were so functionally distinct,” says Saunders.

    Another perk of Perturb-seq is that because the assay focuses on single cells, the researchers could use the data to look at more complex phenotypes that become muddied when they are studied together with data from other cells. “We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” Weissman says. “But sometimes when you knock down a gene, different cells that are losing that same gene behave differently, and that behavior may be missed by the average.”

    The researchers found that a subset of genes whose removal led to different outcomes from cell to cell were responsible for chromosome segregation. Their removal was causing cells to lose a chromosome or pick up an extra one, a condition known as aneuploidy. “You couldn’t predict what the transcriptional response to losing this gene was because it depended on the secondary effect of what chromosome you gained or lost,” Weissman says. “We realized we could then turn this around and create this composite phenotype looking for signatures of chromosomes being gained and lost. In this way, we’ve done the first genome-wide screen for factors that are required for the correct segregation of DNA.”

    “I think the aneuploidy study is the most interesting application of this data so far,” Norman says. “It captures a phenotype that you can only get using a single-cell readout. You can’t go after it any other way.”

    The researchers also used their dataset to study how mitochondria responded to stress. Mitochondria, which evolved from free-living bacteria, carry 13 genes in their genomes. Within the nuclear DNA, around 1,000 genes are somehow related to mitochondrial function. “People have been interested for a long time in how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed,” Replogle says.

    The researchers found that when they perturbed different mitochondria-related genes, the nuclear genome responded similarly to many different genetic changes. However, the mitochondrial genome responses were much more variable. 

    “There’s still an open question of why mitochondria still have their own DNA,” said Replogle. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors.”

    “If you have one mitochondria that’s broken, and another one that is broken in a different way, those mitochondria could be responding differentially,” Weissman says.

    In the future, the researchers hope to use Perturb-seq on different types of cells besides the cancer cell line they started in. They also hope to continue to explore their map of gene functions, and hope others will do the same. “This really is the culmination of many years of work by the authors and other collaborators, and I’m really pleased to see it continue to succeed and expand,” says Norman. More

  • in

    Hallucinating to better text translation

    As babies, we babble and imitate our way to learning languages. We don’t start off reading raw text, which requires fundamental knowledge and understanding about the world, as well as the advanced ability to interpret and infer descriptions and relationships. Rather, humans begin our language journey slowly, by pointing and interacting with our environment, basing our words and perceiving their meaning through the context of the physical and social world. Eventually, we can craft full sentences to communicate complex ideas.

    Similarly, when humans begin learning and translating into another language, the incorporation of other sensory information, like multimedia, paired with the new and unfamiliar words, like flashcards with images, improves language acquisition and retention. Then, with enough practice, humans can accurately translate new, unseen sentences in context without the accompanying media; however, imagining a picture based on the original text helps.

    This is the basis of a new machine learning model, called VALHALLA, by researchers from MIT, IBM, and the University of California at San Diego, in which a trained neural network sees a source sentence in one language, hallucinates an image of what it looks like, and then uses both to translate into a target language. The team found that their method demonstrates improved accuracy of machine translation over text-only translation. Further, it provided an additional boost for cases with long sentences, under-resourced languages, and instances where part of the source sentence is inaccessible to the machine translator.

    As a core task within the AI field of natural language processing (NLP), machine translation is an “eminently practical technology that’s being used by millions of people every day,” says study co-author Yoon Kim, assistant professor in MIT’s Department of Electrical Engineering and Computer Science with affiliations in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT-IBM Watson AI Lab. With recent, significant advances in deep learning, “there’s been an interesting development in how one might use non-text information — for example, images, audio, or other grounding information — to tackle practical tasks involving language” says Kim, because “when humans are performing language processing tasks, we’re doing so within a grounded, situated world.” The pairing of hallucinated images and text during inference, the team postulated, imitates that process, providing context for improved performance over current state-of-the-art techniques, which utilize text-only data.

    This research will be presented at the IEEE / CVF Computer Vision and Pattern Recognition Conference this month. Kim’s co-authors are UC San Diego graduate student Yi Li and Professor Nuno Vasconcelos, along with research staff members Rameswar Panda, Chun-fu “Richard” Chen, Rogerio Feris, and IBM Director David Cox of IBM Research and the MIT-IBM Watson AI Lab.

    Learning to hallucinate from images

    When we learn new languages and to translate, we’re often provided with examples and practice before venturing out on our own. The same is true for machine-translation systems; however, if images are used during training, these AI methods also require visual aids for testing, limiting their applicability, says Panda.

    “In real-world scenarios, you might not have an image with respect to the source sentence. So, our motivation was basically: Instead of using an external image during inference as input, can we use visual hallucination — the ability to imagine visual scenes — to improve machine translation systems?” says Panda.

    To do this, the team used an encoder-decoder architecture with two transformers, a type of neural network model that’s suited for sequence-dependent data, like language, that can pay attention key words and semantics of a sentence. One transformer generates a visual hallucination, and the other performs multimodal translation using outputs from the first transformer.

    During training, there are two streams of translation: a source sentence and a ground-truth image that is paired with it, and the same source sentence that is visually hallucinated to make a text-image pair. First the ground-truth image and sentence are tokenized into representations that can be handled by transformers; for the case of the sentence, each word is a token. The source sentence is tokenized again, but this time passed through the visual hallucination transformer, outputting a hallucination, a discrete image representation of the sentence. The researchers incorporated an autoregression that compares the ground-truth and hallucinated representations for congruency — e.g., homonyms: a reference to an animal “bat” isn’t hallucinated as a baseball bat. The hallucination transformer then uses the difference between them to optimize its predictions and visual output, making sure the context is consistent.

    The two sets of tokens are then simultaneously passed through the multimodal translation transformer, each containing the sentence representation and either the hallucinated or ground-truth image. The tokenized text translation outputs are compared with the goal of being similar to each other and to the target sentence in another language. Any differences are then relayed back to the translation transformer for further optimization.

    For testing, the ground-truth image stream drops off, since images likely wouldn’t be available in everyday scenarios.

    “To the best of our knowledge, we haven’t seen any work which actually uses a hallucination transformer jointly with a multimodal translation system to improve machine translation performance,” says Panda.

    Visualizing the target text

    To test their method, the team put VALHALLA up against other state-of-the-art multimodal and text-only translation methods. They used public benchmark datasets containing ground-truth images with source sentences, and a dataset for translating text-only news articles. The researchers measured its performance over 13 tasks, ranging from translation on well-resourced languages (like English, German, and French), under-resourced languages (like English to Romanian) and non-English (like Spanish to French). The group also tested varying transformer model sizes, how accuracy changes with the sentence length, and translation under limited textual context, where portions of the text were hidden from the machine translators.

    The team observed significant improvements over text-only translation methods, improving data efficiency, and that smaller models performed better than the larger base model. As sentences became longer, VALHALLA’s performance over other methods grew, which the researchers attributed to the addition of more ambiguous words. In cases where part of the sentence was masked, VALHALLA could recover and translate the original text, which the team found surprising.

    Further unexpected findings arose: “Where there weren’t as many training [image and] text pairs, [like for under-resourced languages], improvements were more significant, which indicates that grounding in images helps in low-data regimes,” says Kim. “Another thing that was quite surprising to me was this improved performance, even on types of text that aren’t necessarily easily connectable to images. For example, maybe it’s not so surprising if this helps in translating visually salient sentences, like the ‘there is a red car in front of the house.’ [However], even in text-only [news article] domains, the approach was able to improve upon text-only systems.”

    While VALHALLA performs well, the researchers note that it does have limitations, requiring pairs of sentences to be annotated with an image, which could make it more expensive to obtain. It also performs better in its ground domain and not the text-only news articles. Moreover, Kim and Panda note, a technique like VALHALLA is still a black box, with the assumption that hallucinated images are providing helpful information, and the team plans to investigate what and how the model is learning in order to validate their methods.

    In the future, the team plans to explore other means of improving translation. “Here, we only focus on images, but there are other types of a multimodal information — for example, speech, video or touch, or other sensory modalities,” says Panda. “We believe such multimodal grounding can lead to even more efficient machine translation models, potentially benefiting translation across many low-resource languages spoken in the world.”

    This research was supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation. More

  • in

    Making data visualization more accessible for blind and low-vision individuals

    Data visualizations on the web are largely inaccessible for blind and low-vision individuals who use screen readers, an assistive technology that reads on-screen elements as text-to-speech. This excludes millions of people from the opportunity to probe and interpret insights that are often presented through charts, such as election results, health statistics, and economic indicators. 

    When a designer attempts to make a visualization accessible, best practices call for including a few sentences of text that describe the chart and a link to the underlying data table — a far cry from the rich reading experience available to sighted users.

    An interdisciplinary team of researchers from MIT and elsewhere is striving to create screen-reader-friendly data visualizations that offer a similarly rich experience. They prototyped several visualization structures that provide text descriptions at varying levels of detail, enabling a screen-reader user to drill down from high-level data to more detailed information using just a few keystrokes.

    The MIT team embarked on an iterative co-design process with collaborator Daniel Hajas, a researcher at University College London who works with the Global Disability Innovation Hub and lost his sight at age 16. They collaborated to develop prototypes and ran a detailed user study with blind and low-vision individuals to gather feedback.

    “Researchers might see some connections between problems and be aware of potential solutions, but very often they miss it by a little bit. Insights from people who have the lived experience of a certain specific, measurable problem are really important for a lot of disability-related solutions. I think we found a really nice fit,” says Hajas.

    They created a framework to help designers think systematically about how to develop accessible visualizations. In the future, they plan to use their prototypes and design framework to build a user-friendly tool that could convert visualizations into accessible formats.

    MIT collaborators include co-lead authors and Computer Science and Artificial Intelligence Laboratory (CSAIL) graduate students Jonathan Zong, Crystal Lee, and Alan Lundgard, as well as JiWoong Jang, an undergraduate at Carnegie Mellon University who worked on this project during MIT’s Summer Research Program (MSRP), and senior author Arvind Satyanarayan, assistant professor of computer science who leads the Visualization Group in CSAIL. The research paper, which will be presented at the Eurographics Conference on Visualization, won a best paper honorable mention award.

    “Push what is possible”

    The researchers defined three design dimensions as key to making accessible visualizations: structure, navigation, and description. Structure involves arranging the information into a hierarchy. Navigation refers to how the user moves through different levels of detail. Description is how the information is spoken, including how much information is conveyed.

    Using these design dimensions, they developed several visualization prototypes that emphasized ease-of-navigation for screen-reader users. One prototype, known as multiview, enabled individuals to use the up and down arrows to navigate between different levels of information (like the chart title as the top level, the legend as the second level, etc.), and the right and left arrow keys to cycle through information on the same level (such as adjacent scatterplots). Another prototype, known as target, included the same arrow key navigation but also a drop-down menu of key chart locations so the user could quickly jump to an area of interest.

    “Our goal is not just to work within existing standards to make them serviceable. We really set out to do grounded speculation and imagine where we can push what is possible with these existing standards. We didn’t want to limit ourselves to refitting tools that were designed for images,” says Zong.

    They tested these prototypes and an accessible data table, the existing best practice for accessible visualizations, with 13 blind and visually impaired screen-reader users. They asked users to rate each tool on several criteria, including how easy it was to learn and how easy it was to locate data or answer questions.

    “One thing I thought was really interesting was how much people were constantly testing their own hypotheses or trying to make specific patterns as they moved through the visualization. The implication for navigation is that you want to be able to orient yourself within the visualization so you know where the limits are,” says Lee. “Can you accurately and easily know where the walls are in the room you are exploring?”

    Improved insights

    Users said both prototypes enabled them to more rapidly identify patterns in the data. Scrolling from a high level to deeper levels of information helped them gain insights more easily than when browsing the data table, they said. They also enjoyed faster navigation using the menu in the target prototype.

    But the data table got top marks for ease of use.

    “I expected people to be disappointed with the everyday tools when compared to the new prototypes, but they still clung to the data table a bit, likely because of their familiarity with it. That shows that principles like familiarity, learnability, and usability still matter. No matter how ‘good’ our new invention is, if it is not easy enough to learn, people might stick with an older version,” Hajas says.

    Drawing on these insights, the researchers are refining the prototypes and using them to build a software package that can be used with existing design tools to give visualizations an accessible, navigable structure.

    They also want to explore multimodal solutions. Some study participants used different devices together, like screen readers and braille displays, or data sonification tools that convey information using non-speech audio. How these tools can complement each other when applied to a visualization is still an open question, Zong says.

    In the long-run, they hope their work might lead to careful rethinking of web accessibility standards.

    “There is no one-size-fits-all solution for accessibility. While existing standards don’t presume that, they only offer simple approaches, like data tables and alt text. One of the key benefits of our research contribution is that we are proposing a framework — different preferences and data representations are situated at different points in this design space,” says Lundgard.

    “We have been working hard toward reducing the inequities that screen-reader users face when extracting information from online data visualizations for the past few years. So, we are really appreciative of this work and the knowledge that it adds to the existing literature,” says Ather Sharif, a graduate student who researches accessibility and visualization in the labs of professors Jacob Wobbrock and Katharina Reinecke at the Paul G. Allen School of Computer Science and Engineering of the University of Washington at Seattle, and who was not involved with this work.

    “I like to think of it as a movement where we’re all finally coming together and improving the experiences of a demographic that has been largely ignored, especially when presenting data through visualizations. Kudos to Jonathan, Arvind, and their team for this insightful and timely work! I am looking forward to what’s next,” adds Sharif, who is lead author of several recent papers related to accessible data visualizations.

    Amy Bower, a senior scientist in the Department of Physical Oceanography at the Woods Hole Oceanographic Institution who suffers from a degenerative retinal disease and uses a screen reader extensively in her work as a researcher and also for basic living tasks, found the researchers’ explanations of the importance of co-design to be powerful and compelling.  

    “As a blind scientist, I’m constantly searching for effective tools that will allow me to access the information conveyed in data visualizations. The layered approach taken by these researchers, which provides the option to get the ‘big picture’ from the data as well as drill down into the data points themselves, allows the user to choose how they want to explore the data,” says Bower, who also was not involved with this work. “I think the ability to freely explore the data is necessary not just to learn the ‘story’ that the data are telling, but to allow a blind researcher such as myself to formulate the next questions that need to be tackled to advance understanding in any field of study.”

    This work was supported, in part, by the National Science Foundation.   More

  • in

    Cracking the case of Arctic sea ice breakup

    Despite its below-freezing temperatures, the Arctic is warming twice as fast as the rest of the planet. As Arctic sea ice melts, fewer bright surfaces are available to reflect sunlight back into space. When fractures open in the ice cover, the water underneath gets exposed. Dark, ice-free water absorbs the sun’s energy, heating the ocean and driving further melting — a vicious cycle. This warming in turn melts glacial ice, contributing to rising sea levels.

    Warming climate and rising sea levels endanger the nearly 40 percent of the U.S. population living in coastal areas, the billions of people who depend on the ocean for food and their livelihoods, and species such as polar bears and Artic foxes. Reduced ice coverage is also making the once-impassable region more accessible, opening up new shipping lanes and ports. Interest in using these emerging trans-Arctic routes for product transit, extraction of natural resources (e.g., oil and gas), and military activity is turning an area traditionally marked by low tension and cooperation into one of global geopolitical competition.

    As the Arctic opens up, predicting when and where the sea ice will fracture becomes increasingly important in strategic decision-making. However, huge gaps exist in our understanding of the physical processes contributing to ice breakup. Researchers at MIT Lincoln Laboratory seek to help close these gaps by turning a data-sparse environment into a data-rich one. They envision deploying a distributed set of unattended sensors across the Arctic that will persistently detect and geolocate ice fracturing events. Concurrently, the network will measure various environmental conditions, including water temperature and salinity, wind speed and direction, and ocean currents at different depths. By correlating these fracturing events and environmental conditions, they hope to discover meaningful insights about what is causing the sea ice to break up. Such insights could help predict the future state of Arctic sea ice to inform climate modeling, climate change planning, and policy decision-making at the highest levels.

    “We’re trying to study the relationship between ice cracking, climate change, and heat flow in the ocean,” says Andrew March, an assistant leader of Lincoln Laboratory’s Advanced Undersea Systems and Technology Group. “Do cracks in the ice cause warm water to rise and more ice to melt? Do undersea currents and waves cause cracking? Does cracking cause undersea waves? These are the types of questions we aim to investigate.”

    Arctic access

    In March 2022, Ben Evans and Dave Whelihan, both researchers in March’s group, traveled for 16 hours across three flights to Prudhoe Bay, located on the North Slope of Alaska. From there, they boarded a small specialized aircraft and flew another 90 minutes to a three-and-a-half-mile-long sheet of ice floating 160 nautical miles offshore in the Arctic Ocean. In the weeks before their arrival, the U.S. Navy’s Arctic Submarine Laboratory had transformed this inhospitable ice floe into a temporary operating base called Ice Camp Queenfish, named after the first Sturgeon-class submarine to operate under the ice and the fourth to reach the North Pole. The ice camp featured a 2,500-foot-long runway, a command center, sleeping quarters to accommodate up to 60 personnel, a dining tent, and an extremely limited internet connection.

    At Queenfish, for the next four days, Evans and Whelihan joined U.S. Navy, Army, Air Force, Marine Corps, and Coast Guard members, and members of the Royal Canadian Air Force and Navy and United Kingdom Royal Navy, who were participating in Ice Exercise (ICEX) 2022. Over the course of about three weeks, more than 200 personnel stationed at Queenfish, Prudhoe Bay, and aboard two U.S. Navy submarines participated in this biennial exercise. The goals of ICEX 2022 were to assess U.S. operational readiness in the Arctic; increase our country’s experience in the region; advance our understanding of the Arctic environment; and continue building relationships with other services, allies, and partner organizations to ensure a free and peaceful Arctic. The infrastructure provided for ICEX concurrently enables scientists to conduct research in an environment — either in person or by sending their research equipment for exercise organizers to deploy on their behalf — that would be otherwise extremely difficult and expensive to access.

    In the Arctic, windchill temperatures can plummet to as low as 60 degrees Fahrenheit below zero, cold enough to freeze exposed skin within minutes. Winds and ocean currents can drift the entire camp beyond the reach of nearby emergency rescue aircraft, and the ice can crack at any moment. To ensure the safety of participants, a team of Navy meteorological specialists continually monitors the ever-changing conditions. The original camp location for ICEX 2022 had to be evacuated and relocated after a massive crack formed in the ice, delaying Evans’ and Whelihan’s trip. Even the newly selected site had a large crack form behind the camp and another crack that necessitated moving a number of tents.

    “Such cracking events are only going to increase as the climate warms, so it’s more critical now than ever to understand the physical processes behind them,” Whelihan says. “Such an understanding will require building technology that can persist in the environment despite these incredibly harsh conditions. So, it’s a challenge not only from a scientific perspective but also an engineering one.”

    “The weather always gets a vote, dictating what you’re able to do out here,” adds Evans. “The Arctic Submarine Laboratory does a lot of work to construct the camp and make it a safe environment where researchers like us can come to do good science. ICEX is really the only opportunity we have to go onto the sea ice in a place this remote to collect data.”

    A legacy of sea ice experiments

    Though this trip was Whelihan’s and Evans’ first to the Arctic region, staff from the laboratory’s Advanced Undersea Systems and Technology Group have been conducting experiments at ICEX since 2018. However, because of the Arctic’s remote location and extreme conditions, data collection has rarely been continuous over long periods of time or widespread across large areas. The team now hopes to change that by building low-cost, expendable sensing platforms consisting of co-located devices that can be left unattended for automated, persistent, near-real-time monitoring. 

    “The laboratory’s extensive expertise in rapid prototyping, seismo-acoustic signal processing, remote sensing, and oceanography make us a natural fit to build this sensor network,” says Evans.

    In the months leading up to the Arctic trip, the team collected seismometer data at Firepond, part of the laboratory’s Haystack Observatory site in Westford, Massachusetts. Through this local data collection, they aimed to gain a sense of what anthropogenic (human-induced) noise would look like so they could begin to anticipate the kinds of signatures they might see in the Arctic. They also collected ice melting/fracturing data during a thaw cycle and correlated these data with the weather conditions (air temperature, humidity, and pressure). Through this analysis, they detected an increase in seismic signals as the temperature rose above 32 F — an indication that air temperature and ice cracking may be related.

    A sensing network

    At ICEX, the team deployed various commercial off-the-shelf sensors and new sensors developed by the laboratory and University of New Hampshire (UNH) to assess their resiliency in the frigid environment and to collect an initial dataset.

    “One aspect that differentiates these experiments from those of the past is that we concurrently collected seismo-acoustic data and environmental parameters,” says Evans.

    The commercial technologies were seismometers to detect the vibrational energy released when sea ice fractures or collides with other ice floes; a hydrophone (underwater microphone) array to record the acoustic energy created by ice-fracturing events; a sound speed profiler to measure the speed of sound through the water column; and a conductivity, temperature, and depth (CTD) profiler to measure the salinity (related to conductivity), temperature, and pressure (related to depth) throughout the water column. The speed of sound in the ocean primarily depends on these three quantities. 

    To precisely measure the temperature across the entire water column at one location, they deployed an array of transistor-based temperature sensors developed by the laboratory’s Advanced Materials and Microsystems Group in collaboration with the Advanced Functional Fabrics of America Manufacturing Innovation Institute. The small temperature sensors run along the length of a thread-like polymer fiber embedded with multiple conductors. This fiber platform, which can support a broad range of sensors, can be unspooled hundreds of feet below the water’s surface to concurrently measure temperature or other water properties — the fiber deployed in the Arctic also contained accelerometers to measure depth — at many points in the water column. Traditionally, temperature profiling has required moving a device up and down through the water column.

    The team also deployed a high-frequency echosounder supplied by Anthony Lyons and Larry Mayer, collaborators at UNH’s Center for Coastal and Ocean Mapping. This active sonar uses acoustic energy to detect internal waves, or waves occurring beneath the ocean’s surface.

    “You may think of the ocean as a homogenous body of water, but it’s not,” Evans explains. “Different currents can exist as you go down in depth, much like how you can get different winds when you go up in altitude. The UNH echosounder allows us to see the different currents in the water column, as well as ice roughness when we turn the sensor to look upward.”

    “The reason we care about currents is that we believe they will tell us something about how warmer water from the Atlantic Ocean is coming into contact with sea ice,” adds Whelihan. “Not only is that water melting ice but it also has lower salt content, resulting in oceanic layers and affecting how long ice lasts and where it lasts.”

    Back home, the team has begun analyzing their data. For the seismic data, this analysis involves distinguishing any ice events from various sources of anthropogenic noise, including generators, snowmobiles, footsteps, and aircraft. Similarly, the researchers know their hydrophone array acoustic data are contaminated by energy from a sound source that another research team participating in ICEX placed in the water. Based on their physics, icequakes — the seismic events that occur when ice cracks — have characteristic signatures that can be used to identify them. One approach is to manually find an icequake and use that signature as a guide for finding other icequakes in the dataset.

    From their water column profiling sensors, they identified an interesting evolution in the sound speed profile 30 to 40 meters below the ocean surface, related to a mass of colder water moving in later in the day. The group’s physical oceanographer believes this change in the profile is due to water coming up from the Bering Sea, water that initially comes from the Atlantic Ocean. The UNH-supplied echosounder also generated an interesting signal at a similar depth.

    “Our supposition is that this result has something to do with the large sound speed variation we detected, either directly because of reflections off that layer or because of plankton, which tend to rise on top of that layer,” explains Evans.  

    A future predictive capability

    Going forward, the team will continue mining their collected data and use these data to begin building algorithms capable of automatically detecting and localizing — and ultimately predicting — ice events correlated with changes in environmental conditions. To complement their experimental data, they have initiated conversations with organizations that model the physical behavior of sea ice, including the National Oceanic and Atmospheric Administration and the National Ice Center. Merging the laboratory’s expertise in sensor design and signal processing with their expertise in ice physics would provide a more complete understanding of how the Arctic is changing.

    The laboratory team will also start exploring cost-effective engineering approaches for integrating the sensors into packages hardened for deployment in the harsh environment of the Arctic.

    “Until these sensors are truly unattended, the human factor of usability is front and center,” says Whelihan. “Because it’s so cold, equipment can break accidentally. For example, at ICEX 2022, our waterproof enclosure for the seismometers survived, but the enclosure for its power supply, which was made out of a cheaper plastic, shattered in my hand when I went to pick it up.”

    The sensor packages will not only need to withstand the frigid environment but also be able to “phone home” over some sort of satellite data link and sustain their power. The team plans to investigate whether waste heat from processing can keep the instruments warm and how energy could be harvested from the Arctic environment.

    Before the next ICEX scheduled for 2024, they hope to perform preliminary testing of their sensor packages and concepts in Arctic-like environments. While attending ICEX 2022, they engaged with several other attendees — including the U.S. Navy, Arctic Submarine Laboratory, National Ice Center, and University of Alaska Fairbanks (UAF) — and identified cold room experimentation as one area of potential collaboration. Testing can also be performed at outdoor locations a bit closer to home and more easily accessible, such as the Great Lakes in Michigan and a UAF-maintained site in Barrow, Alaska. In the future, the laboratory team may have an opportunity to accompany U.S. Coast Guard personnel on ice-breaking vessels traveling from Alaska to Greenland. The team is also thinking about possible venues for collecting data far removed from human noise sources.

    “Since I’ve told colleagues, friends, and family I was going to the Arctic, I’ve had a lot of interesting conversations about climate change and what we’re doing there and why we’re doing it,” Whelihan says. “People don’t have an intrinsic, automatic understanding of this environment and its impact because it’s so far removed from us. But the Arctic plays a crucial role in helping to keep the global climate in balance, so it’s imperative we understand the processes leading to sea ice fractures.”

    This work is funded through Lincoln Laboratory’s internally administered R&D portfolio on climate. More

  • in

    In bias we trust?

    When the stakes are high, machine-learning models are sometimes used to aid human decision-makers. For instance, a model could predict which law school applicants are most likely to pass the bar exam to help an admissions officer determine which students should be accepted.

    These models often have millions of parameters, so how they make predictions is nearly impossible for researchers to fully understand, let alone an admissions officer with no machine-learning experience. Researchers sometimes employ explanation methods that mimic a larger model by creating simple approximations of its predictions. These approximations, which are far easier to understand, help users determine whether to trust the model’s predictions.

    But are these explanation methods fair? If an explanation method provides better approximations for men than for women, or for white people than for Black people, it may encourage users to trust the model’s predictions for some people but not for others.

    MIT researchers took a hard look at the fairness of some widely used explanation methods. They found that the approximation quality of these explanations can vary dramatically between subgroups and that the quality is often significantly lower for minoritized subgroups.

    In practice, this means that if the approximation quality is lower for female applicants, there is a mismatch between the explanations and the model’s predictions that could lead the admissions officer to wrongly reject more women than men.

    Once the MIT researchers saw how pervasive these fairness gaps are, they tried several techniques to level the playing field. They were able to shrink some gaps, but couldn’t eradicate them.

    “What this means in the real-world is that people might incorrectly trust predictions more for some subgroups than for others. So, improving explanation models is important, but communicating the details of these models to end users is equally important. These gaps exist, so users may want to adjust their expectations as to what they are getting when they use these explanations,” says lead author Aparna Balagopalan, a graduate student in the Healthy ML group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Balagopalan wrote the paper with CSAIL graduate students Haoran Zhang and Kimia Hamidieh; CSAIL postdoc Thomas Hartvigsen; Frank Rudzicz, associate professor of computer science at the University of Toronto; and senior author Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group. The research will be presented at the ACM Conference on Fairness, Accountability, and Transparency.

    High fidelity

    Simplified explanation models can approximate predictions of a more complex machine-learning model in a way that humans can grasp. An effective explanation model maximizes a property known as fidelity, which measures how well it matches the larger model’s predictions.

    Rather than focusing on average fidelity for the overall explanation model, the MIT researchers studied fidelity for subgroups of people in the model’s dataset. In a dataset with men and women, the fidelity should be very similar for each group, and both groups should have fidelity close to that of the overall explanation model.

    “When you are just looking at the average fidelity across all instances, you might be missing out on artifacts that could exist in the explanation model,” Balagopalan says.

    They developed two metrics to measure fidelity gaps, or disparities in fidelity between subgroups. One is the difference between the average fidelity across the entire explanation model and the fidelity for the worst-performing subgroup. The second calculates the absolute difference in fidelity between all possible pairs of subgroups and then computes the average.

    With these metrics, they searched for fidelity gaps using two types of explanation models that were trained on four real-world datasets for high-stakes situations, such as predicting whether a patient dies in the ICU, whether a defendant reoffends, or whether a law school applicant will pass the bar exam. Each dataset contained protected attributes, like the sex and race of individual people. Protected attributes are features that may not be used for decisions, often due to laws or organizational policies. The definition for these can vary based on the task specific to each decision setting.

    The researchers found clear fidelity gaps for all datasets and explanation models. The fidelity for disadvantaged groups was often much lower, up to 21 percent in some instances. The law school dataset had a fidelity gap of 7 percent between race subgroups, meaning the approximations for some subgroups were wrong 7 percent more often on average. If there are 10,000 applicants from these subgroups in the dataset, for example, a significant portion could be wrongly rejected, Balagopalan explains.

    “I was surprised by how pervasive these fidelity gaps are in all the datasets we evaluated. It is hard to overemphasize how commonly explanations are used as a ‘fix’ for black-box machine-learning models. In this paper, we are showing that the explanation methods themselves are imperfect approximations that may be worse for some subgroups,” says Ghassemi.

    Narrowing the gaps

    After identifying fidelity gaps, the researchers tried some machine-learning approaches to fix them. They trained the explanation models to identify regions of a dataset that could be prone to low fidelity and then focus more on those samples. They also tried using balanced datasets with an equal number of samples from all subgroups.

    These robust training strategies did reduce some fidelity gaps, but they didn’t eliminate them.

    The researchers then modified the explanation models to explore why fidelity gaps occur in the first place. Their analysis revealed that an explanation model might indirectly use protected group information, like sex or race, that it could learn from the dataset, even if group labels are hidden.

    They want to explore this conundrum more in future work. They also plan to further study the implications of fidelity gaps in the context of real-world decision making.

    Balagopalan is excited to see that concurrent work on explanation fairness from an independent lab has arrived at similar conclusions, highlighting the importance of understanding this problem well.

    As she looks to the next phase in this research, she has some words of warning for machine-learning users.

    “Choose the explanation model carefully. But even more importantly, think carefully about the goals of using an explanation model and who it eventually affects,” she says.

    This work was funded, in part, by the MIT-IBM Watson AI Lab, the Quanta Research Institute, a Canadian Institute for Advanced Research AI Chair, and Microsoft Research. More