More stories

  • in

    Researchers release open-source photorealistic simulator for autonomous driving

    Hyper-realistic virtual worlds have been heralded as the best driving schools for autonomous vehicles (AVs), since they’ve proven fruitful test beds for safely trying out dangerous driving scenarios. Tesla, Waymo, and other self-driving companies all rely heavily on data to enable expensive and proprietary photorealistic simulators, since testing and gathering nuanced I-almost-crashed data usually isn’t the most easy or desirable to recreate. 

    To that end, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being open-sourced to the public. 

    “Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says MIT Professor and CSAIL Director Daniela Rus, senior author on a paper about the research. 

    Play video

    VISTA is a data-driven, photorealistic simulator for autonomous driving. It can simulate not just live video but LiDAR data and event cameras, and also incorporate other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found below.

    VISTA 2.0 builds off of the team’s previous model, VISTA, and it’s fundamentally different from existing AV simulators since it’s data-driven — meaning it was built and photorealistically rendered from real-world data — thereby enabling direct transfer to reality. While the initial iteration supported only single car lane-following with one camera sensor, achieving high-fidelity data-driven simulation required rethinking the foundations of how different sensors and behavioral interactions can be synthesized. 

    Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massively interactive scenarios and intersections at scale. With much less data than previous models, the team was able to train autonomous vehicles that could be substantially more robust than those trained on large amounts of real-world data. 

    “This is a massive jump in capabilities of data-driven simulation for autonomous vehicles, as well as the increase of scale and ability to handle greater driving complexity,” says Alexander Amini, CSAIL PhD student and co-lead author on two new papers, together with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high dimensional 3D lidars with millions of points, irregularly timed event-based cameras, and even interactive and dynamic scenarios with other vehicles as well.” 

    The team was able to scale the complexity of the interactive driving tasks for things like overtaking, following, and negotiating, including multiagent scenarios in highly photorealistic environments. 

    Training AI models for autonomous vehicles involves hard-to-secure fodder of different varieties of edge cases and strange, dangerous scenarios, because most of our data (thankfully) is just run-of-the-mill, day-to-day driving. Logically, we can’t just crash into other cars just to teach a neural network how to not crash into other cars.

    Recently, there’s been a shift away from more classic, human-designed simulation environments to those built up from real-world data. The latter have immense photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question has emerged: Can the richness and complexity of all of the sensors that autonomous vehicles need, such as lidar and event-based cameras that are more sparse, accurately be synthesized? 

    Lidar sensor data is much harder to interpret in a data-driven world — you’re effectively trying to generate brand-new 3D point clouds with millions of points, only from sparse views of the world. To synthesize 3D lidar point clouds, the team used the data that the car collected, projected it into a 3D space coming from the lidar data, and then let a new virtual vehicle drive around locally from where that original vehicle was. Finally, they projected all of that sensory information back into the frame of view of this new virtual vehicle, with the help of neural networks. 

    Together with the simulation of event-based cameras, which operate at speeds greater than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but also doing so all in real time — making it possible to train neural nets offline, but also test online on the car in augmented reality setups for safe evaluations. “The question of if multisensor simulation at this scale of complexity and photorealism was possible in the realm of data-driven simulation was very much an open question,” says Amini. 

    With that, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and just drop in brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following, and more dicey scenarios like static and dynamic overtaking (seeing obstacles and moving around so you don’t collide). With the multi-agency, both real and simulated agents interact, and new agents can be dropped into the scene and controlled any which way. 

    Taking their full-scale car out into the “wild” — a.k.a. Devens, Massachusetts — the team saw  immediate transferability of results, with both failures and successes. They were also able to demonstrate the bodacious, magic word of self-driving car models: “robust.” They showed that AVs, trained entirely in VISTA 2.0, were so robust in the real world that they could handle that elusive tail of challenging failures. 

    Now, one guardrail humans rely on that can’t yet be simulated is human emotion. It’s the friendly wave, nod, or blinker switch of acknowledgement, which are the type of nuances the team wants to implement in future work. 

    “The central algorithm of this research is how we can take a dataset and build a completely synthetic world for learning and autonomy,” says Amini. “It’s a platform that I believe one day could extend in many different axes across robotics. Not just autonomous driving, but many areas that rely on vision and complex behaviors. We’re excited to release VISTA 2.0 to help enable the community to collect their own datasets and convert them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds, and then can directly transfer them to full-sized, real self-driving cars.” 

    Amini and Wang wrote the paper alongside Zhijian Liu, MIT CSAIL PhD student; Igor Gilitschenski, assistant professor in computer science at the University of Toronto; Wilko Schwarting, AI research scientist and MIT CSAIL PhD ’20; Song Han, associate professor at MIT’s Department of Electrical Engineering and Computer Science; Sertac Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Rus, MIT professor and CSAIL director. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia. 

    This work was supported by the National Science Foundation and Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of the Drive AGX Pegasus. More

  • in

    New CRISPR-based map ties every human gene to its function

    The Human Genome Project was an ambitious initiative to sequence every piece of human DNA. The project drew together collaborators from research institutions around the world, including MIT’s Whitehead Institute for Biomedical Research, and was finally completed in 2003. Now, over two decades later, MIT Professor Jonathan Weissman and colleagues have gone beyond the sequence to present the first comprehensive functional map of genes that are expressed in human cells. The data from this project, published online June 9 in Cell, ties each gene to its job in the cell, and is the culmination of years of collaboration on the single-cell sequencing method Perturb-seq.

    The data are available for other scientists to use. “It’s a big resource in the way the human genome is a big resource, in that you can go in and do discovery-based research,” says Weissman, who is also a member of the Whitehead Institute and an investigator with the Howard Hughes Medical Institute. “Rather than defining ahead of time what biology you’re going to be looking at, you have this map of the genotype-phenotype relationships and you can go in and screen the database without having to do any experiments.”

    The screen allowed the researchers to delve into diverse biological questions. They used it to explore the cellular effects of genes with unknown functions, to investigate the response of mitochondria to stress, and to screen for genes that cause chromosomes to be lost or gained, a phenotype that has proved difficult to study in the past. “I think this dataset is going to enable all sorts of analyses that we haven’t even thought up yet by people who come from other parts of biology, and suddenly they just have this available to draw on,” says former Weissman Lab postdoc Tom Norman, a co-senior author of the paper.

    Pioneering Perturb-seq

    The project takes advantage of the Perturb-seq approach that makes it possible to follow the impact of turning on or off genes with unprecedented depth. This method was first published in 2016 by a group of researchers including Weissman and fellow MIT professor Aviv Regev, but could only be used on small sets of genes and at great expense.

    The massive Perturb-seq map was made possible by foundational work from Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of the present paper. Replogle, in collaboration with Norman, who now leads a lab at Memorial Sloan Kettering Cancer Center; Britt Adamson, an assistant professor in the Department of Molecular Biology at Princeton University; and a group at 10x Genomics, set out to create a new version of Perturb-seq that could be scaled up. The researchers published a proof-of-concept paper in Nature Biotechnology in 2020. 

    The Perturb-seq method uses CRISPR-Cas9 genome editing to introduce genetic changes into cells, and then uses single-cell RNA sequencing to capture information about the RNAs that are expressed resulting from a given genetic change. Because RNAs control all aspects of how cells behave, this method can help decode the many cellular effects of genetic changes.

    Since their initial proof-of-concept paper, Weissman, Regev, and others have used this sequencing method on smaller scales. For example, the researchers used Perturb-seq in 2021 to explore how human and viral genes interact over the course of an infection with HCMV, a common herpesvirus.

    In the new study, Replogle and collaborators including Reuben Saunders, a graduate student in Weissman’s lab and co-first author of the paper, scaled up the method to the entire genome. Using human blood cancer cell lines as well noncancerous cells derived from the retina, he performed Perturb-seq across more than 2.5 million cells, and used the data to build a comprehensive map tying genotypes to phenotypes.

    Delving into the data

    Upon completing the screen, the researchers decided to put their new dataset to use and examine a few biological questions. “The advantage of Perturb-seq is it lets you get a big dataset in an unbiased way,” says Tom Norman. “No one knows entirely what the limits are of what you can get out of that kind of dataset. Now, the question is, what do you actually do with it?”

    The first, most obvious application was to look into genes with unknown functions. Because the screen also read out phenotypes of many known genes, the researchers could use the data to compare unknown genes to known ones and look for similar transcriptional outcomes, which could suggest the gene products worked together as part of a larger complex.

    The mutation of one gene called C7orf26 in particular stood out. Researchers noticed that genes whose removal led to a similar phenotype were part of a protein complex called Integrator that played a role in creating small nuclear RNAs. The Integrator complex is made up of many smaller subunits — previous studies had suggested 14 individual proteins — and the researchers were able to confirm that C7orf26 made up a 15th component of the complex.

    They also discovered that the 15 subunits worked together in smaller modules to perform specific functions within the Integrator complex. “Absent this thousand-foot-high view of the situation, it was not so clear that these different modules were so functionally distinct,” says Saunders.

    Another perk of Perturb-seq is that because the assay focuses on single cells, the researchers could use the data to look at more complex phenotypes that become muddied when they are studied together with data from other cells. “We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” Weissman says. “But sometimes when you knock down a gene, different cells that are losing that same gene behave differently, and that behavior may be missed by the average.”

    The researchers found that a subset of genes whose removal led to different outcomes from cell to cell were responsible for chromosome segregation. Their removal was causing cells to lose a chromosome or pick up an extra one, a condition known as aneuploidy. “You couldn’t predict what the transcriptional response to losing this gene was because it depended on the secondary effect of what chromosome you gained or lost,” Weissman says. “We realized we could then turn this around and create this composite phenotype looking for signatures of chromosomes being gained and lost. In this way, we’ve done the first genome-wide screen for factors that are required for the correct segregation of DNA.”

    “I think the aneuploidy study is the most interesting application of this data so far,” Norman says. “It captures a phenotype that you can only get using a single-cell readout. You can’t go after it any other way.”

    The researchers also used their dataset to study how mitochondria responded to stress. Mitochondria, which evolved from free-living bacteria, carry 13 genes in their genomes. Within the nuclear DNA, around 1,000 genes are somehow related to mitochondrial function. “People have been interested for a long time in how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed,” Replogle says.

    The researchers found that when they perturbed different mitochondria-related genes, the nuclear genome responded similarly to many different genetic changes. However, the mitochondrial genome responses were much more variable. 

    “There’s still an open question of why mitochondria still have their own DNA,” said Replogle. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors.”

    “If you have one mitochondria that’s broken, and another one that is broken in a different way, those mitochondria could be responding differentially,” Weissman says.

    In the future, the researchers hope to use Perturb-seq on different types of cells besides the cancer cell line they started in. They also hope to continue to explore their map of gene functions, and hope others will do the same. “This really is the culmination of many years of work by the authors and other collaborators, and I’m really pleased to see it continue to succeed and expand,” says Norman. More

  • in

    Hallucinating to better text translation

    As babies, we babble and imitate our way to learning languages. We don’t start off reading raw text, which requires fundamental knowledge and understanding about the world, as well as the advanced ability to interpret and infer descriptions and relationships. Rather, humans begin our language journey slowly, by pointing and interacting with our environment, basing our words and perceiving their meaning through the context of the physical and social world. Eventually, we can craft full sentences to communicate complex ideas.

    Similarly, when humans begin learning and translating into another language, the incorporation of other sensory information, like multimedia, paired with the new and unfamiliar words, like flashcards with images, improves language acquisition and retention. Then, with enough practice, humans can accurately translate new, unseen sentences in context without the accompanying media; however, imagining a picture based on the original text helps.

    This is the basis of a new machine learning model, called VALHALLA, by researchers from MIT, IBM, and the University of California at San Diego, in which a trained neural network sees a source sentence in one language, hallucinates an image of what it looks like, and then uses both to translate into a target language. The team found that their method demonstrates improved accuracy of machine translation over text-only translation. Further, it provided an additional boost for cases with long sentences, under-resourced languages, and instances where part of the source sentence is inaccessible to the machine translator.

    As a core task within the AI field of natural language processing (NLP), machine translation is an “eminently practical technology that’s being used by millions of people every day,” says study co-author Yoon Kim, assistant professor in MIT’s Department of Electrical Engineering and Computer Science with affiliations in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT-IBM Watson AI Lab. With recent, significant advances in deep learning, “there’s been an interesting development in how one might use non-text information — for example, images, audio, or other grounding information — to tackle practical tasks involving language” says Kim, because “when humans are performing language processing tasks, we’re doing so within a grounded, situated world.” The pairing of hallucinated images and text during inference, the team postulated, imitates that process, providing context for improved performance over current state-of-the-art techniques, which utilize text-only data.

    This research will be presented at the IEEE / CVF Computer Vision and Pattern Recognition Conference this month. Kim’s co-authors are UC San Diego graduate student Yi Li and Professor Nuno Vasconcelos, along with research staff members Rameswar Panda, Chun-fu “Richard” Chen, Rogerio Feris, and IBM Director David Cox of IBM Research and the MIT-IBM Watson AI Lab.

    Learning to hallucinate from images

    When we learn new languages and to translate, we’re often provided with examples and practice before venturing out on our own. The same is true for machine-translation systems; however, if images are used during training, these AI methods also require visual aids for testing, limiting their applicability, says Panda.

    “In real-world scenarios, you might not have an image with respect to the source sentence. So, our motivation was basically: Instead of using an external image during inference as input, can we use visual hallucination — the ability to imagine visual scenes — to improve machine translation systems?” says Panda.

    To do this, the team used an encoder-decoder architecture with two transformers, a type of neural network model that’s suited for sequence-dependent data, like language, that can pay attention key words and semantics of a sentence. One transformer generates a visual hallucination, and the other performs multimodal translation using outputs from the first transformer.

    During training, there are two streams of translation: a source sentence and a ground-truth image that is paired with it, and the same source sentence that is visually hallucinated to make a text-image pair. First the ground-truth image and sentence are tokenized into representations that can be handled by transformers; for the case of the sentence, each word is a token. The source sentence is tokenized again, but this time passed through the visual hallucination transformer, outputting a hallucination, a discrete image representation of the sentence. The researchers incorporated an autoregression that compares the ground-truth and hallucinated representations for congruency — e.g., homonyms: a reference to an animal “bat” isn’t hallucinated as a baseball bat. The hallucination transformer then uses the difference between them to optimize its predictions and visual output, making sure the context is consistent.

    The two sets of tokens are then simultaneously passed through the multimodal translation transformer, each containing the sentence representation and either the hallucinated or ground-truth image. The tokenized text translation outputs are compared with the goal of being similar to each other and to the target sentence in another language. Any differences are then relayed back to the translation transformer for further optimization.

    For testing, the ground-truth image stream drops off, since images likely wouldn’t be available in everyday scenarios.

    “To the best of our knowledge, we haven’t seen any work which actually uses a hallucination transformer jointly with a multimodal translation system to improve machine translation performance,” says Panda.

    Visualizing the target text

    To test their method, the team put VALHALLA up against other state-of-the-art multimodal and text-only translation methods. They used public benchmark datasets containing ground-truth images with source sentences, and a dataset for translating text-only news articles. The researchers measured its performance over 13 tasks, ranging from translation on well-resourced languages (like English, German, and French), under-resourced languages (like English to Romanian) and non-English (like Spanish to French). The group also tested varying transformer model sizes, how accuracy changes with the sentence length, and translation under limited textual context, where portions of the text were hidden from the machine translators.

    The team observed significant improvements over text-only translation methods, improving data efficiency, and that smaller models performed better than the larger base model. As sentences became longer, VALHALLA’s performance over other methods grew, which the researchers attributed to the addition of more ambiguous words. In cases where part of the sentence was masked, VALHALLA could recover and translate the original text, which the team found surprising.

    Further unexpected findings arose: “Where there weren’t as many training [image and] text pairs, [like for under-resourced languages], improvements were more significant, which indicates that grounding in images helps in low-data regimes,” says Kim. “Another thing that was quite surprising to me was this improved performance, even on types of text that aren’t necessarily easily connectable to images. For example, maybe it’s not so surprising if this helps in translating visually salient sentences, like the ‘there is a red car in front of the house.’ [However], even in text-only [news article] domains, the approach was able to improve upon text-only systems.”

    While VALHALLA performs well, the researchers note that it does have limitations, requiring pairs of sentences to be annotated with an image, which could make it more expensive to obtain. It also performs better in its ground domain and not the text-only news articles. Moreover, Kim and Panda note, a technique like VALHALLA is still a black box, with the assumption that hallucinated images are providing helpful information, and the team plans to investigate what and how the model is learning in order to validate their methods.

    In the future, the team plans to explore other means of improving translation. “Here, we only focus on images, but there are other types of a multimodal information — for example, speech, video or touch, or other sensory modalities,” says Panda. “We believe such multimodal grounding can lead to even more efficient machine translation models, potentially benefiting translation across many low-resource languages spoken in the world.”

    This research was supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation. More

  • in

    Making data visualization more accessible for blind and low-vision individuals

    Data visualizations on the web are largely inaccessible for blind and low-vision individuals who use screen readers, an assistive technology that reads on-screen elements as text-to-speech. This excludes millions of people from the opportunity to probe and interpret insights that are often presented through charts, such as election results, health statistics, and economic indicators. 

    When a designer attempts to make a visualization accessible, best practices call for including a few sentences of text that describe the chart and a link to the underlying data table — a far cry from the rich reading experience available to sighted users.

    An interdisciplinary team of researchers from MIT and elsewhere is striving to create screen-reader-friendly data visualizations that offer a similarly rich experience. They prototyped several visualization structures that provide text descriptions at varying levels of detail, enabling a screen-reader user to drill down from high-level data to more detailed information using just a few keystrokes.

    The MIT team embarked on an iterative co-design process with collaborator Daniel Hajas, a researcher at University College London who works with the Global Disability Innovation Hub and lost his sight at age 16. They collaborated to develop prototypes and ran a detailed user study with blind and low-vision individuals to gather feedback.

    “Researchers might see some connections between problems and be aware of potential solutions, but very often they miss it by a little bit. Insights from people who have the lived experience of a certain specific, measurable problem are really important for a lot of disability-related solutions. I think we found a really nice fit,” says Hajas.

    They created a framework to help designers think systematically about how to develop accessible visualizations. In the future, they plan to use their prototypes and design framework to build a user-friendly tool that could convert visualizations into accessible formats.

    MIT collaborators include co-lead authors and Computer Science and Artificial Intelligence Laboratory (CSAIL) graduate students Jonathan Zong, Crystal Lee, and Alan Lundgard, as well as JiWoong Jang, an undergraduate at Carnegie Mellon University who worked on this project during MIT’s Summer Research Program (MSRP), and senior author Arvind Satyanarayan, assistant professor of computer science who leads the Visualization Group in CSAIL. The research paper, which will be presented at the Eurographics Conference on Visualization, won a best paper honorable mention award.

    “Push what is possible”

    The researchers defined three design dimensions as key to making accessible visualizations: structure, navigation, and description. Structure involves arranging the information into a hierarchy. Navigation refers to how the user moves through different levels of detail. Description is how the information is spoken, including how much information is conveyed.

    Using these design dimensions, they developed several visualization prototypes that emphasized ease-of-navigation for screen-reader users. One prototype, known as multiview, enabled individuals to use the up and down arrows to navigate between different levels of information (like the chart title as the top level, the legend as the second level, etc.), and the right and left arrow keys to cycle through information on the same level (such as adjacent scatterplots). Another prototype, known as target, included the same arrow key navigation but also a drop-down menu of key chart locations so the user could quickly jump to an area of interest.

    “Our goal is not just to work within existing standards to make them serviceable. We really set out to do grounded speculation and imagine where we can push what is possible with these existing standards. We didn’t want to limit ourselves to refitting tools that were designed for images,” says Zong.

    They tested these prototypes and an accessible data table, the existing best practice for accessible visualizations, with 13 blind and visually impaired screen-reader users. They asked users to rate each tool on several criteria, including how easy it was to learn and how easy it was to locate data or answer questions.

    “One thing I thought was really interesting was how much people were constantly testing their own hypotheses or trying to make specific patterns as they moved through the visualization. The implication for navigation is that you want to be able to orient yourself within the visualization so you know where the limits are,” says Lee. “Can you accurately and easily know where the walls are in the room you are exploring?”

    Improved insights

    Users said both prototypes enabled them to more rapidly identify patterns in the data. Scrolling from a high level to deeper levels of information helped them gain insights more easily than when browsing the data table, they said. They also enjoyed faster navigation using the menu in the target prototype.

    But the data table got top marks for ease of use.

    “I expected people to be disappointed with the everyday tools when compared to the new prototypes, but they still clung to the data table a bit, likely because of their familiarity with it. That shows that principles like familiarity, learnability, and usability still matter. No matter how ‘good’ our new invention is, if it is not easy enough to learn, people might stick with an older version,” Hajas says.

    Drawing on these insights, the researchers are refining the prototypes and using them to build a software package that can be used with existing design tools to give visualizations an accessible, navigable structure.

    They also want to explore multimodal solutions. Some study participants used different devices together, like screen readers and braille displays, or data sonification tools that convey information using non-speech audio. How these tools can complement each other when applied to a visualization is still an open question, Zong says.

    In the long-run, they hope their work might lead to careful rethinking of web accessibility standards.

    “There is no one-size-fits-all solution for accessibility. While existing standards don’t presume that, they only offer simple approaches, like data tables and alt text. One of the key benefits of our research contribution is that we are proposing a framework — different preferences and data representations are situated at different points in this design space,” says Lundgard.

    “We have been working hard toward reducing the inequities that screen-reader users face when extracting information from online data visualizations for the past few years. So, we are really appreciative of this work and the knowledge that it adds to the existing literature,” says Ather Sharif, a graduate student who researches accessibility and visualization in the labs of professors Jacob Wobbrock and Katharina Reinecke at the Paul G. Allen School of Computer Science and Engineering of the University of Washington at Seattle, and who was not involved with this work.

    “I like to think of it as a movement where we’re all finally coming together and improving the experiences of a demographic that has been largely ignored, especially when presenting data through visualizations. Kudos to Jonathan, Arvind, and their team for this insightful and timely work! I am looking forward to what’s next,” adds Sharif, who is lead author of several recent papers related to accessible data visualizations.

    Amy Bower, a senior scientist in the Department of Physical Oceanography at the Woods Hole Oceanographic Institution who suffers from a degenerative retinal disease and uses a screen reader extensively in her work as a researcher and also for basic living tasks, found the researchers’ explanations of the importance of co-design to be powerful and compelling.  

    “As a blind scientist, I’m constantly searching for effective tools that will allow me to access the information conveyed in data visualizations. The layered approach taken by these researchers, which provides the option to get the ‘big picture’ from the data as well as drill down into the data points themselves, allows the user to choose how they want to explore the data,” says Bower, who also was not involved with this work. “I think the ability to freely explore the data is necessary not just to learn the ‘story’ that the data are telling, but to allow a blind researcher such as myself to formulate the next questions that need to be tackled to advance understanding in any field of study.”

    This work was supported, in part, by the National Science Foundation.   More

  • in

    In bias we trust?

    When the stakes are high, machine-learning models are sometimes used to aid human decision-makers. For instance, a model could predict which law school applicants are most likely to pass the bar exam to help an admissions officer determine which students should be accepted.

    These models often have millions of parameters, so how they make predictions is nearly impossible for researchers to fully understand, let alone an admissions officer with no machine-learning experience. Researchers sometimes employ explanation methods that mimic a larger model by creating simple approximations of its predictions. These approximations, which are far easier to understand, help users determine whether to trust the model’s predictions.

    But are these explanation methods fair? If an explanation method provides better approximations for men than for women, or for white people than for Black people, it may encourage users to trust the model’s predictions for some people but not for others.

    MIT researchers took a hard look at the fairness of some widely used explanation methods. They found that the approximation quality of these explanations can vary dramatically between subgroups and that the quality is often significantly lower for minoritized subgroups.

    In practice, this means that if the approximation quality is lower for female applicants, there is a mismatch between the explanations and the model’s predictions that could lead the admissions officer to wrongly reject more women than men.

    Once the MIT researchers saw how pervasive these fairness gaps are, they tried several techniques to level the playing field. They were able to shrink some gaps, but couldn’t eradicate them.

    “What this means in the real-world is that people might incorrectly trust predictions more for some subgroups than for others. So, improving explanation models is important, but communicating the details of these models to end users is equally important. These gaps exist, so users may want to adjust their expectations as to what they are getting when they use these explanations,” says lead author Aparna Balagopalan, a graduate student in the Healthy ML group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Balagopalan wrote the paper with CSAIL graduate students Haoran Zhang and Kimia Hamidieh; CSAIL postdoc Thomas Hartvigsen; Frank Rudzicz, associate professor of computer science at the University of Toronto; and senior author Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group. The research will be presented at the ACM Conference on Fairness, Accountability, and Transparency.

    High fidelity

    Simplified explanation models can approximate predictions of a more complex machine-learning model in a way that humans can grasp. An effective explanation model maximizes a property known as fidelity, which measures how well it matches the larger model’s predictions.

    Rather than focusing on average fidelity for the overall explanation model, the MIT researchers studied fidelity for subgroups of people in the model’s dataset. In a dataset with men and women, the fidelity should be very similar for each group, and both groups should have fidelity close to that of the overall explanation model.

    “When you are just looking at the average fidelity across all instances, you might be missing out on artifacts that could exist in the explanation model,” Balagopalan says.

    They developed two metrics to measure fidelity gaps, or disparities in fidelity between subgroups. One is the difference between the average fidelity across the entire explanation model and the fidelity for the worst-performing subgroup. The second calculates the absolute difference in fidelity between all possible pairs of subgroups and then computes the average.

    With these metrics, they searched for fidelity gaps using two types of explanation models that were trained on four real-world datasets for high-stakes situations, such as predicting whether a patient dies in the ICU, whether a defendant reoffends, or whether a law school applicant will pass the bar exam. Each dataset contained protected attributes, like the sex and race of individual people. Protected attributes are features that may not be used for decisions, often due to laws or organizational policies. The definition for these can vary based on the task specific to each decision setting.

    The researchers found clear fidelity gaps for all datasets and explanation models. The fidelity for disadvantaged groups was often much lower, up to 21 percent in some instances. The law school dataset had a fidelity gap of 7 percent between race subgroups, meaning the approximations for some subgroups were wrong 7 percent more often on average. If there are 10,000 applicants from these subgroups in the dataset, for example, a significant portion could be wrongly rejected, Balagopalan explains.

    “I was surprised by how pervasive these fidelity gaps are in all the datasets we evaluated. It is hard to overemphasize how commonly explanations are used as a ‘fix’ for black-box machine-learning models. In this paper, we are showing that the explanation methods themselves are imperfect approximations that may be worse for some subgroups,” says Ghassemi.

    Narrowing the gaps

    After identifying fidelity gaps, the researchers tried some machine-learning approaches to fix them. They trained the explanation models to identify regions of a dataset that could be prone to low fidelity and then focus more on those samples. They also tried using balanced datasets with an equal number of samples from all subgroups.

    These robust training strategies did reduce some fidelity gaps, but they didn’t eliminate them.

    The researchers then modified the explanation models to explore why fidelity gaps occur in the first place. Their analysis revealed that an explanation model might indirectly use protected group information, like sex or race, that it could learn from the dataset, even if group labels are hidden.

    They want to explore this conundrum more in future work. They also plan to further study the implications of fidelity gaps in the context of real-world decision making.

    Balagopalan is excited to see that concurrent work on explanation fairness from an independent lab has arrived at similar conclusions, highlighting the importance of understanding this problem well.

    As she looks to the next phase in this research, she has some words of warning for machine-learning users.

    “Choose the explanation model carefully. But even more importantly, think carefully about the goals of using an explanation model and who it eventually affects,” she says.

    This work was funded, in part, by the MIT-IBM Watson AI Lab, the Quanta Research Institute, a Canadian Institute for Advanced Research AI Chair, and Microsoft Research. More

  • in

    Emery Brown wins a share of 2022 Gruber Neuroscience Prize

    The Gruber Foundation announced on May 17 that Emery N. Brown, the Edward Hood Taplin Professor of Medical Engineering and Computational Neuroscience at MIT, has won the 2022 Gruber Neuroscience Prize along with neurophysicists Laurence Abbott of Columbia University, Terrence Sejnowski of the Salk Institute for Biological Studies, and Haim Sompolinsky of the Hebrew University of Jerusalem.

    The foundation says it honored the four recipients for their influential contributions to the fields of computational and theoretical neuroscience. As datasets have grown ever larger and more complex, these fields have increasingly helped scientists unravel the mysteries of how the brain functions in both health and disease. The prize, which includes a total $500,000 award, will be presented in San Diego, California, on Nov. 13 at the annual meeting of the Society for Neuroscience.

    “These four remarkable scientists have applied their expertise in mathematical and statistical analysis, physics, and machine learning to create theories, mathematical models, and tools that have greatly advanced how we study and understand the brain,” says Joshua Sanes, professor of molecular and cellular biology and founding director of the Center for Brain Science at Harvard University and member of the selection advisory board to the prize. “Their insights and research have not only transformed how experimental neuroscientists do their research, but also are leading to promising new ways of providing clinical care.”

    Brown, who is an investigator in The Picower Institute for Learning and Memory and the Institute for Medical Engineering and Science at MIT, an anesthesiologist at Massachusetts General Hospital, and a professor at Harvard Medical School, says: “It is a pleasant surprise and tremendous honor to be named a co-recipient of the 2022 Gruber Prize in Neuroscience. I am especially honored to share this award with three luminaries in computational and theoretical neuroscience.”

    Brown’s early groundbreaking findings in neuroscience included a novel algorithm that decodes the position of an animal by observing the activity of a small group of place cells in the animal’s brain, a discovery he made while working with fellow Picower Institute investigator Matt Wilson in the 1990s. The resulting state-space algorithm for point processes not only offered much better decoding with fewer neurons than previous approaches, but it also established a new framework for specifying dynamically the relationship between the spike trains (the timing sequence of firing neurons) in the brain and factors from the outside world.

    “One of the basic questions at the time was whether an animal holds a representation of where it is in its mind — in the hippocampus,” Brown says. “We were able to show that it did, and we could show that with only 30 neurons.”

    After introducing this state-space paradigm to neuroscience, Brown went on to refine the original idea and apply it to other dynamic situations — to simultaneously track neural activity and learning, for example, and to define with precision anesthesia-induced loss of consciousness, as well as its subsequent recovery. In the early 2000s, Brown put together a team to specifically study anesthesia’s effects on the brain.

    Through experimental research and mathematical modeling, Brown and his team showed that the altered arousal states produced by the main classes of anesthesia medications can be characterized by analyzing the oscillatory patterns observed in the EEG along with the locations of their molecular targets, and the anatomy and physiology of the neural circuits that connect those locations. He has established, including in recent papers with Picower Professor Earl K. Miller, that a principal way in which anesthetics produce unconsciousness is by producing oscillations that impair how different brain regions communicate with each other.

    The result of Brown’s research has been a new paradigm for brain monitoring during general anesthesia for surgery, one that allows an anesthesiologist to dose the patient based on EEG readouts (neural oscillations) of the patient’s anesthetic state rather than purely on vital sign responses. This pioneering approach promises to revolutionize how anesthesia medications are delivered to patients, and also shed light on other altered states of arousal such as sleep and coma.

    To advance that vision, Brown recently discussed how he is working to develop a new research center at MIT and MGH to further integrate anesthesiology with neuroscience research. The Brain Arousal State Control Innovation Center, he said, would not only advance anesthesiology care but also harness insights gained from anesthesiology research to improve other aspects of clinical neuroscience.

    “By demonstrating that physics and mathematics can make an enormous contribution to neuroscience, doctors Abbott, Brown, Sejnowski, and Sompolinsky have inspired an entire new generation of physicists and other quantitative scientists to follow in their footsteps,” says Frances Jensen, professor and chair of the Department of Neurology and co-director of the Penn Medicine Translational Neuroscience Center within the Perelman School of Medicine at the University of Pennsylvania, and chair of the Selection Advisory Board to the prize. “The ramifications for neuroscience have been broad and profound. It is a great pleasure to be honoring each of them with this prestigious award.”

    This report was adapted from materials provided by the Gruber Foundation. More

  • in

    Artificial intelligence predicts patients’ race from their medical images

    The miseducation of algorithms is a critical problem; when artificial intelligence mirrors unconscious thoughts, racism, and biases of the humans who generated these algorithms, it can lead to serious harm. Computer programs, for example, have wrongly flagged Black defendants as twice as likely to reoffend as someone who’s white. When an AI used cost as a proxy for health needs, it falsely named Black patients as healthier than equally sick white ones, as less money was spent on them. Even AI used to write a play relied on using harmful stereotypes for casting. 

    Removing sensitive features from the data seems like a viable tweak. But what happens when it’s not enough? 

    Examples of bias in natural language processing are boundless — but MIT scientists have investigated another important, largely underexplored modality: medical images. Using both private and public datasets, the team found that AI can accurately predict self-reported race of patients from medical images alone. Using imaging data of chest X-rays, limb X-rays, chest CT scans, and mammograms, the team trained a deep learning model to identify race as white, Black, or Asian — even though the images themselves contained no explicit mention of the patient’s race. This is a feat even the most seasoned physicians cannot do, and it’s not clear how the model was able to do this. 

    In an attempt to tease out and make sense of the enigmatic “how” of it all, the researchers ran a slew of experiments. To investigate possible mechanisms of race detection, they looked at variables like differences in anatomy, bone density, resolution of images — and many more, and the models still prevailed with high ability to detect race from chest X-rays. “These results were initially confusing, because the members of our research team could not come anywhere close to identifying a good proxy for this task,” says paper co-author Marzyeh Ghassemi, an assistant professor in the MIT Department of Electrical Engineering and Computer Science and the Institute for Medical Engineering and Science (IMES), who is an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and of the MIT Jameel Clinic. “Even when you filter medical images past where the images are recognizable as medical images at all, deep models maintain a very high performance. That is concerning because superhuman capacities are generally much more difficult to control, regulate, and prevent from harming people.”

    In a clinical setting, algorithms can help tell us whether a patient is a candidate for chemotherapy, dictate the triage of patients, or decide if a movement to the ICU is necessary. “We think that the algorithms are only looking at vital signs or laboratory tests, but it’s possible they’re also looking at your race, ethnicity, sex, whether you’re incarcerated or not — even if all of that information is hidden,” says paper co-author Leo Anthony Celi, principal research scientist in IMES at MIT and associate professor of medicine at Harvard Medical School. “Just because you have representation of different groups in your algorithms, that doesn’t guarantee it won’t perpetuate or magnify existing disparities and inequities. Feeding the algorithms with more data with representation is not a panacea. This paper should make us pause and truly reconsider whether we are ready to bring AI to the bedside.” 

    The study, “AI recognition of patient race in medical imaging: a modeling study,” was published in Lancet Digital Health on May 11. Celi and Ghassemi wrote the paper alongside 20 other authors in four countries.

    To set up the tests, the scientists first showed that the models were able to predict race across multiple imaging modalities, various datasets, and diverse clinical tasks, as well as across a range of academic centers and patient populations in the United States. They used three large chest X-ray datasets, and tested the model on an unseen subset of the dataset used to train the model and a completely different one. Next, they trained the racial identity detection models for non-chest X-ray images from multiple body locations, including digital radiography, mammography, lateral cervical spine radiographs, and chest CTs to see whether the model’s performance was limited to chest X-rays. 

    The team covered many bases in an attempt to explain the model’s behavior: differences in physical characteristics between different racial groups (body habitus, breast density), disease distribution (previous studies have shown that Black patients have a higher incidence for health issues like cardiac disease), location-specific or tissue specific differences, effects of societal bias and environmental stress, the ability of deep learning systems to detect race when multiple demographic and patient factors were combined, and if specific image regions contributed to recognizing race. 

    What emerged was truly staggering: The ability of the models to predict race from diagnostic labels alone was much lower than the chest X-ray image-based models. 

    For example, the bone density test used images where the thicker part of the bone appeared white, and the thinner part appeared more gray or translucent. Scientists assumed that since Black people generally have higher bone mineral density, the color differences helped the AI models to detect race. To cut that off, they clipped the images with a filter, so the model couldn’t color differences. It turned out that cutting off the color supply didn’t faze the model — it still could accurately predict races. (The “Area Under the Curve” value, meaning the measure of the accuracy of a quantitative diagnostic test, was 0.94–0.96). As such, the learned features of the model appeared to rely on all regions of the image, meaning that controlling this type of algorithmic behavior presents a messy, challenging problem. 

    The scientists acknowledge limited availability of racial identity labels, which caused them to focus on Asian, Black, and white populations, and that their ground truth was a self-reported detail. Other forthcoming work will include potentially looking at isolating different signals before image reconstruction, because, as with bone density experiments, they couldn’t account for residual bone tissue that was on the images. 

    Notably, other work by Ghassemi and Celi led by MIT student Hammaad Adam has found that models can also identify patient self-reported race from clinical notes even when those notes are stripped of explicit indicators of race. Just as in this work, human experts are not able to accurately predict patient race from the same redacted clinical notes.

    “We need to bring social scientists into the picture. Domain experts, which are usually the clinicians, public health practitioners, computer scientists, and engineers are not enough. Health care is a social-cultural problem just as much as it’s a medical problem. We need another group of experts to weigh in and to provide input and feedback on how we design, develop, deploy, and evaluate these algorithms,” says Celi. “We need to also ask the data scientists, before any exploration of the data, are there disparities? Which patient groups are marginalized? What are the drivers of those disparities? Is it access to care? Is it from the subjectivity of the care providers? If we don’t understand that, we won’t have a chance of being able to identify the unintended consequences of the algorithms, and there’s no way we’ll be able to safeguard the algorithms from perpetuating biases.”

    “The fact that algorithms ‘see’ race, as the authors convincingly document, can be dangerous. But an important and related fact is that, when used carefully, algorithms can also work to counter bias,” says Ziad Obermeyer, associate professor at the University of California at Berkeley, whose research focuses on AI applied to health. “In our own work, led by computer scientist Emma Pierson at Cornell, we show that algorithms that learn from patients’ pain experiences can find new sources of knee pain in X-rays that disproportionately affect Black patients — and are disproportionately missed by radiologists. So just like any tool, algorithms can be a force for evil or a force for good — which one depends on us, and the choices we make when we build algorithms.”

    The work is supported, in part, by the National Institutes of Health. More

  • in

    Zero-trust architecture may hold the answer to cybersecurity insider threats

    For years, organizations have taken a defensive “castle-and-moat” approach to cybersecurity, seeking to secure the perimeters of their networks to block out any malicious actors. Individuals with the right credentials were assumed to be trustworthy and allowed access to a network’s systems and data without having to reauthorize themselves at each access attempt. However, organizations today increasingly store data in the cloud and allow employees to connect to the network remotely, both of which create vulnerabilities to this traditional approach. A more secure future may require a “zero-trust architecture,” in which users must prove their authenticity each time they access a network application or data.

    In May 2021, President Joe Biden’s Executive Order on Improving the Nation’s Cybersecurity outlined a goal for federal agencies to implement zero-trust security. Since then, MIT Lincoln Laboratory has been performing a study on zero-trust architectures, with the goals of reviewing their implementation in government and industry, identifying technical gaps and opportunities, and developing a set of recommendations for the United States’ approach to a zero-trust system.

    The study team’s first step was to define the term “zero trust” and understand the misperceptions in the field surrounding the concept. Some of these misperceptions suggest that a zero-trust architecture requires entirely new equipment to implement, or that it makes systems so “locked down” they’re not usable. 

    “Part of the reason why there is a lot of confusion about what zero trust is, is because it takes what the cybersecurity world has known about for many years and applies it in a different way,” says Jeffrey Gottschalk, the assistant head of Lincoln Laboratory’s Cyber Security and Information Sciences Division and study’s co-lead. “It is a paradigm shift in terms of how to think about security, but holistically it takes a lot of things that we already know how to do — such as multi-factor authentication, encryption, and software-defined networking­ — and combines them in different ways.”

    Play video

    Presentation: Overview of Zero Trust Architectures

    Recent high-profile cybersecurity incidents — such as those involving the National Security Agency, the U.S. Office of Personnel Management, Colonial Pipeline, SolarWinds, and Sony Pictures — highlight the vulnerability of systems and the need to rethink cybersecurity approaches.

    The study team reviewed recent, impactful cybersecurity incidents to identify which security principles were most responsible for the scale and impact of the attack. “We noticed that while a number of these attacks exploited previously unknown implementation vulnerabilities (also known as ‘zero-days’), the vast majority actually were due to the exploitation of operational security principles,” says Christopher Roeser, study co-lead and the assistant head of the Homeland Protection and Air Traffic Control Division, “that is, the gaining of individuals’ credentials, and the movement within a well-connected network that allows users to gather a significant amount of information or have very widespread effects.”

    In other words, the malicious actor had “breached the moat” and effectively became an insider.

    Zero-trust security principles could protect against this type of insider threat by treating every component, service, and user of a system as continuously exposed to and potentially compromised by a malicious actor. A user’s identity is verified each time that they request to access a new resource, and every access is mediated, logged, and analyzed. It’s like putting trip wires all over the inside of a network system, says Gottschalk. “So, when an adversary trips over that trip wire, you’ll get a signal and can validate that signal and see what’s going on.”

    In practice, a zero-trust approach could look like replacing a single-sign-on system, which lets users sign in just once for access to multiple applications, with a cloud-based identity that is known and verified. “Today, a lot of organizations have different ways that people authenticate and log onto systems, and many of those have been aggregated for expediency into single-sign-on capabilities, just to make it easier for people to log onto their systems. But we envision a future state that embraces zero trust, where identity verification is enabled by cloud-based identity that’s portable and ubiquitous, and very secure itself.”

    While conducting their study, the team spoke to approximately 10 companies and government organizations that have adopted zero-trust implementations — either through cloud services, in-house management, or a combination of both. They found the hybrid approach to be a good model for government organizations to adopt. They also found that the implementation could take from three to five years. “We talked to organizations that have actually done implementations of zero trust, and all of them have indicated that significant organizational commitment and change was required to be able to implement them,” Gottschalk says.

    But a key takeaway from the study is that there isn’t a one-size-fits-all approach to zero trust. “It’s why we think that having test-bed and pilot efforts are going to be very important to balance out zero-trust security with the mission needs of those systems,” Gottschalk says. The team also recognizes the importance of conducting ongoing research and development beyond initial zero-trust implementations, to continue to address evolving threats.

    Lincoln Laboratory will present further findings from the study at its upcoming Cyber Technology for National Security conference, which will be held June 28-29. The conference will also offer a short course for attendees to learn more about the benefits and implementations of zero-trust architectures.  More