More stories

  • in

    To the brain, reading computer code is not the same as reading language

    In some ways, learning to program a computer is similar to learning a new language. It requires learning new symbols and terms, which must be organized correctly to instruct the computer what to do. The computer code must also be clear enough that other programmers can read and understand it.
    In spite of those similarities, MIT neuroscientists have found that reading computer code does not activate the regions of the brain that are involved in language processing. Instead, it activates a distributed network called the multiple demand network, which is also recruited for complex cognitive tasks such as solving math problems or crossword puzzles.
    However, although reading computer code activates the multiple demand network, it appears to rely more on different parts of the network than math or logic problems do, suggesting that coding does not precisely replicate the cognitive demands of mathematics either.
    “Understanding computer code seems to be its own thing. It’s not the same as language, and it’s not the same as math and logic,” says Anna Ivanova, an MIT graduate student and the lead author of the study.
    Evelina Fedorenko, the Frederick A. and Carole J. Middleton Career Development Associate Professor of Neuroscience and a member of the McGovern Institute for Brain Research, is the senior author of the paper, which appears today in eLife. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and Tufts University were also involved in the study.
    Language and cognition
    A major focus of Fedorenko’s research is the relationship between language and other cognitive functions. In particular, she has been studying the question of whether other functions rely on the brain’s language network, which includes Broca’s area and other regions in the left hemisphere of the brain. In previous work, her lab has shown that music and math do not appear to activate this language network.
    “Here, we were interested in exploring the relationship between language and computer programming, partially because computer programming is such a new invention that we know that there couldn’t be any hardwired mechanisms that make us good programmers,” Ivanova says.
    There are two schools of thought regarding how the brain learns to code, she says. One holds that in order to be good at programming, you must be good at math. The other suggests that because of the parallels between coding and language, language skills might be more relevant. To shed light on this issue, the researchers set out to study whether brain activity patterns while reading computer code would overlap with language-related brain activity.
    The two programming languages that the researchers focused on in this study are known for their readability — Python and ScratchJr, a visual programming language designed for children age 5 and older. The subjects in the study were all young adults proficient in the language they were being tested on. While the programmers lay in a functional magnetic resonance (fMRI) scanner, the researchers showed them snippets of code and asked them to predict what action the code would produce.
    The researchers saw little to no response to code in the language regions of the brain. Instead, they found that the coding task mainly activated the so-called multiple demand network. This network, whose activity is spread throughout the frontal and parietal lobes of the brain, is typically recruited for tasks that require holding many pieces of information in mind at once, and is responsible for our ability to perform a wide variety of mental tasks.
    “It does pretty much anything that’s cognitively challenging, that makes you think hard,” Ivanova says.
    Previous studies have shown that math and logic problems seem to rely mainly on the multiple demand regions in the left hemisphere, while tasks that involve spatial navigation activate the right hemisphere more than the left. The MIT team found that reading computer code appears to activate both the left and right sides of the multiple demand network, and ScratchJr activated the right side slightly more than the left. This finding goes against the hypothesis that math and coding rely on the same brain mechanisms.
    Effects of experience
    The researchers say that while they didn’t identify any regions that appear to be exclusively devoted to programming, such specialized brain activity might develop in people who have much more coding experience.
    “It’s possible that if you take people who are professional programmers, who have spent 30 or 40 years coding in a particular language, you may start seeing some specialization, or some crystallization of parts of the multiple demand system,” Fedorenko says. “In people who are familiar with coding and can efficiently do these tasks, but have had relatively limited experience, it just doesn’t seem like you see any specialization yet.”
    In a companion paper appearing in the same issue of eLife, a team of researchers from Johns Hopkins University also reported that solving code problems activates the multiple demand network rather than the language regions.
    The findings suggest there isn’t a definitive answer to whether coding should be taught as a math-based skill or a language-based skill. In part, that’s because learning to program may draw on both language and multiple demand systems, even if — once learned — programming doesn’t rely on the language regions, the researchers say.
    “There have been claims from both camps — it has to be together with math, it has to be together with language,” Ivanova says. “But it looks like computer science educators will have to develop their own approaches for teaching code most effectively.”
    The research was funded by the National Science Foundation, the Department of the Brain and Cognitive Sciences at MIT, and the McGovern Institute for Brain Research. More

  • in

    Building machines that better understand human goals 

    In a classic experiment on human social intelligence by psychologists Felix Warneken and Michael Tomasello, an 18-month old toddler watches a man carry a stack of books towards an unopened cabinet. When the man reaches the cabinet, he clumsily bangs the books against the door of the cabinet several times, then makes a puzzled noise. 
    Something remarkable happens next: the toddler offers to help. Having inferred the man’s goal, the toddler walks up to the cabinet and opens its doors, allowing the man to place his books inside. But how is the toddler, with such limited life experience, able to make this inference? 
    Recently, computer scientists have redirected this question toward computers: How can machines do the same? 
    The critical component to engineering this type of understanding is arguably what makes us most human: our mistakes. Just as the toddler could infer the man’s goal merely from his failure, machines that infer our goals need to account for our mistaken actions and plans. 
    In the quest to capture this social intelligence in machines, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Department of Brain and Cognitive Sciences created an algorithm capable of inferring goals and plans, even when those plans might fail. 
    This type of research could eventually be used to improve a range of assistive technologies, collaborative or caretaking robots, and digital assistants like Siri and Alexa. 
    “This ability to account for mistakes could be crucial for building machines that robustly infer and act in our interests,” says Tan Zhi-Xuan, PhD student in MIT’s Department of Electrical Engineering and Computer Science (EECS) and the lead author on a new paper about the research. “Otherwise, AI systems might wrongly infer that, since we failed to achieve our higher-order goals, those goals weren’t desired after all. We’ve seen what happens when algorithms feed on our reflexive and unplanned usage of social media, leading us down paths of dependency and polarization. Ideally, the algorithms of the future will recognize our mistakes, bad habits, and irrationalities and help us avoid, rather than reinforce, them.” 
    To create their model the team used Gen, a new AI programming platform recently developed at MIT, to combine symbolic AI planning with Bayesian inference. Bayesian inference provides an optimal way to combine uncertain beliefs with new data, and is widely used for financial risk evaluation, diagnostic testing, and election forecasting. 
    The team’s model performed 20 to 150 times faster than an existing baseline method called Bayesian Inverse Reinforcement Learning (BIRL), which learns an agent’s objectives, values, or rewards by observing its behavior, and attempts to compute full policies or plans in advance. The new model was accurate 75 percent of the time in inferring goals. 
    “AI is in the process of abandoning the ‘standard model’ where a fixed, known objective is given to the machine,” says Stuart Russell, the Smith-Zadeh Professor of Engineering at the University of California at Berkeley. “Instead, the machine knows that it doesn’t know what we want, which means that research on how to infer goals and preferences from human behavior becomes a central topic in AI. This paper takes that goal seriously; in particular, it is a step towards modeling — and hence inverting — the actual process by which humans generate behavior from goals and preferences.”
    How it works 
    While there’s been considerable work on inferring the goals and desires of agents, much of this work has assumed that agents act optimally to achieve their goals. 
    However, the team was particularly inspired by a common way of human planning that’s largely sub-optimal: not to plan everything out in advance, but rather to form only partial plans, execute them, and then plan again from there. While this can lead to mistakes from not thinking enough “ahead of time,” it also reduces the cognitive load. 
    For example, imagine you’re watching your friend prepare food, and you would like to help by figuring out what they’re cooking. You guess the next few steps your friend might take: maybe preheating the oven, then making dough for an apple pie. You then “keep” only the partial plans that remain consistent with what your friend actually does, and then you repeat the process by planning ahead just a few steps from there. 
    Once you’ve seen your friend make the dough, you can restrict the possibilities only to baked goods, and guess that they might slice apples next, or get some pecans for a pie mix. Eventually, you’ll have eliminated all the plans for dishes that your friend couldn’t possibly be making, keeping only the possible plans (i.e., pie recipes). Once you’re sure enough which dish it is, you can offer to help.
    The team’s inference algorithm, called “Sequential Inverse Plan Search (SIPS)”, follows this sequence to infer an agent’s goals, as it only makes partial plans at each step, and cuts unlikely plans early on. Since the model only plans a few steps ahead each time, it also accounts for the possibility that the agent — your friend — might be doing the same. This includes the possibility of mistakes due to limited planning, such as not realizing you might need two hands free before opening the refrigerator. By detecting these potential failures in advance, the team hopes the model could be used by machines to better offer assistance.
    “One of our early insights was that if you want to infer someone’s goals, you don’t need to think further ahead than they do. We realized this could be used not just to speed up goal inference, but also to infer intended goals from actions that are too shortsighted to succeed, leading us to shift from scaling up algorithms to exploring ways to resolve more fundamental limitations of current AI systems,” says Vikash Mansinghka, a principal research scientist at MIT and one of Tan Zhi-Xuan’s co-advisors, along with Joshua Tenenbaum, MIT professor in brain and cognitive sciences. “This is part of our larger moonshot — to reverse-engineer 18-month-old human common sense.” 
    The work builds conceptually on earlier cognitive models from Tenenbaum’s group, showing how simpler inferences that children and even 10-month-old infants make about others’ goals can be modeled quantitatively as a form of Bayesian inverse planning.
    While to date the researchers have explored inference only in relatively small planning problems over fixed sets of goals, through future work they plan to explore richer hierarchies of human goals and plans. By encoding or learning these hierarchies, machines might be able to infer a much wider variety of goals, as well as the deeper purposes they serve.
    “Though this work represents only a small initial step, my hope is that this research will lay some of the philosophical and conceptual groundwork necessary to build machines that truly understand human goals, plans and values,” says Xuan. “This basic approach of modeling humans as imperfect reasoners feels very promising. It now allows us to infer when plans are mistaken, and perhaps it will eventually allow us to infer when people hold mistaken beliefs, assumptions, and guiding principles as well.”
    Zhi-Xuan, Mansinghka, and Tenenbaum wrote the paper alongside EECS graduate student Jordyn Mann and PhD student Tom Silver. They virtually presented their work last week at the Conference on Neural Information Processing Systems (NeurIPS 2020). More

  • in

    Model could help determine quarantine measures needed to reduce Covid-19’s spread

    Some of the research described in this article has been published on a preprint server but has not yet been peer-reviewed by experts in the field.
    As Covid-19 infections soar across the U.S., some states are tightening restrictions and reinstituting quarantine measures to slow the virus’ spread. A model developed by MIT researchers shows a direct link between the number of people who become infected and how effectively a state maintains its quarantine measures.
    The researchers described their model in a paper published in Cell Patterns in November, showing that the system could recapitulate the effects that quarantine measures had on viral spread in countries around the world. In their next study, recently posted to the preprint server medRxiv, they drilled into data from the United States last spring and summer. That earlier surge in infections, they found, was strongly related to a drop in “quarantine strength” — a measure the team defines as the ability to keep infected individuals from infecting others.
    The latest study focuses on last spring and early summer, when the southern and west-central United States saw a precipitous rise in infections as states in those regions reopened and relaxed quarantine measures. The researchers used their model to calculate the quarantine strength in these states, many of which were early to reopen following initial lockdowns in the spring.
    If these states had not reopened so early, or had reopened but strictly enforced measures such as mask-wearing and social distancing, the model calculates that more than 40 percent of infections could have been avoided in all states that the researchers considered. In particular, the study estimates, if Texas and Florida had maintained stricter quarantine measures, more than 100,000 infections could have been avoided in each of those states.
    “If you look at these numbers, simple actions on an individual level can lead to huge reductions in the number of infections and can massively influence the global statistics of this pandemic,” says lead author Raj Dandekar, a graduate student in MIT’s Department of Civil and Environmental Engineering. 
    As the country battles a winter wave of new infections, and states are once again tightening restrictions, the team hopes the model can help policymakers determine the level of quarantine measures to put in place.
    “What I think we have learned quantitatively is, jumping around from hyper-quarantine to no quarantine and back to hyper-quarantine definitely doesn’t work,” says co-author Christopher Rackauckas, an applied mathematics instructor at MIT. “Instead, good consistent application of policy would have been a much more effective tool.”
    The new paper’s MIT co-authors also include undergraduate Emma Wang and professor of mechanical engineering George Barbastathis.
    Strength learning
    The team’s model is a modification of a standard SIR model, an epidemiological model that is used to predict the way a disease spreads, based on the number of people who are either “susceptible,” “infectious,” or “recovered.” Dandekar and his colleagues enhanced an SIR model with a neural network that they trained to process real Covid-19 data.
    The machine-learning-enhanced model learns to identify patterns in data of infected and recovered cases, and from these data, it calculates the number of infected individuals who are not transmitting the virus to others (presumably because the infected individuals are following some sort of quarantining measures). This value is what the researchers label as “quarantine strength,” which reflects how effective a region is in quarantining an infected individual. The model can process data over time to see how a region’s quarantine strength evolves.
    The researchers developed the model in early February and have since applied it to Covid-19 data from more than 70 countries, finding that it has accurately simulated the on-the-ground quarantine situation in European, South American, and Asian countries that were initially hard-hit by the virus.
    “When we look at these countries to see when quarantines were instituted, and we compare that with results for the trained quarantine strength signal, we see a very strong correlation,” Rackauckas says. “The quarantine strength in our model changes a day or two after policies are instituted, among all countries. Those results validated the model.”
    The team published these country-level results last month in Cell Patterns, and are also hosting the results at covid19ml.org, where users can click on a map of the world to see how a given country’s quarantine strength has changed over time.
    What if states had delayed?
    Once the researchers validated the model at the country level, they applied it to individual states in the U.S., to see not only how a state’s quarantine measures evolved over time, but how the number of infections would have changed if a state modified its quarantine strength, for instance by delaying reopening.
    They focused on the south and west-central U.S., where many states were early to reopen and subsequently experienced rapid surges in infections. The team used the model to calculate the quarantine strength for Arizona, Florida, Louisiana, Nevada, Oklahoma, South Carolina, Tennessee, Texas, and Utah, all of which opened before May 15. They also modeled New York, New Jersey, and Illinois — states that delayed reopening to late May and early June.
    They fed the model the number of infected and recovered individuals that was reported for each state, starting from when the 500th infection was reported in each state, up until mid-July. They also noted the day on which each state’s stay-at-home order was lifted, effectively signaling the state’s reopening.
    For every state, the quarantine strength declined soon after reopening; the steepness of this decline, and the subsequent rise in infections, was strongly related to a state’s reopening. States that reopened early on, such as South Carolina and Tennessee, had a steeper drop in quarantine strength and a higher rate of daily cases.
    “Instead of just saying that reopening early is bad, we are actually quantifying here how bad it was,” Dandekar says.
    Meanwhile, states like New York and New Jersey, which delayed reopening or enforced quarantine measures such as mask-wearing even after reopening, kept a more or less steady quarantine strength, with no significant rise in infections. 
    “Now that we can give a measure of quarantine strength that matches reality, we can say, ‘What if we kept everything constant? How much difference would the southern states have had in their outlook?’” Rackauckas says.
    Next, the team reversed its model to estimate the number of infections that would have occurred if a given state maintained a steady quarantine strength even after reopening. In this scenario, more than 40 percent of infections could have been avoided in each state they modeled. In Texas and Florida, that percentage amounts to about 100,000 preventable cases for each state.
    Conceivably, as the pandemic continues to ebb and surge, policymakers could use the model to calculate the quarantine strength needed to keep a state’s current infections below a certain number. They could then look through the data to a point in time where the state exhibited this same value, and refer to the type of restrictions that were in place at that time, as a guide to the policies they could put in place at the present time.
    “What is the rate of growth of the disease that we’re comfortable with, and what would be the quarantine policies that would get us there?” Rackauckas says. “Is it everyone holing up in their houses, or is it everyone allowed to go to restaurants, but once a week? That’s what the model can kind of tell us. It can give us more of a refined quantitative view of that question.”
    This research was funded, in part, by the Intelligence Advanced Research (Projects Activity (IARPA). More

  • in

    Automating material-matching for movies and video games

    Very few of us who play video games or watch computer-generated image-filled movies ever take the time to sit back and appreciate all the handiwork that make their graphics so thrilling and immersive. 
    One key aspect of this is texture. The glossy pictures we see on our screens often appear seamlessly rendered, but they require huge amounts of work behind the scenes. When effects studios create scenes in computer-assisted design programs, they first 3D model all the objects that they plan to put in the scene, and then give a texture to each generated object: for example, making a wood table appear to be glossy, polished, or matte.
    If a designer is trying to recreate a particular texture from the real world, they may find themselves digging around online trying to find a close match that can be stitched together for the scene. But most of the time you can’t just take a photo of an object and use it in a scene — you have to create a set of “maps” that quantify different properties like roughness or light levels.
    There are programs that have made this process easier than ever before, like the Adobe Substance software that helped propel the photorealistic ruins of Las Vegas in “Blade Runner 2049”. However, these so-called “procedural” programs can take months to learn, and still involve painstaking hours or even days to create a particular texture.

    Even the design of a simple leather shoe can be made up of dozens of different textures.

    Previous item Next item

    A team led by researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed an approach that they say can make texturing even less tedious, to the point where you can snap a picture of something you see in a store, and then go recreate the material on your home laptop. 
    “Imagine being able to take a photo of a pair of jeans that you can then have your character wear in a video game,” says PhD student Liang Shi, lead author of a paper about the new “MATch” project. “We believe this system would further close the gap between ‘virtual’ and ‘reality.’”
    Shi says that the goal of MATch is to “significantly simplify and expedite the creation of synthetic materials using machine learning.” The team evaluated MATch on both rendered synthetic materials and real materials captured on camera, and showed that it can reconstruct materials more accurately and at a higher resolution than existing state-of-the-art methods. 
    A collaboration with researchers at Adobe, one core element is a new library called “DiffMat” that essentially provides the various building blocks for constructing different textured materials. 
    The team’s framework involves dozens of so-called “procedural graphs” made up of different nodes that all act like mini-Instagram filters: they take some input and transform it in a certain artistic way to produce an output. 
    “A graph simply defines a way to combine hundreds of such filters to achieve a very complex visual effect, like a particular texture,” says Shi. “The neural network selects the most appropriate combinations of filter nodes until it perceptually matches the appearance of the user’s input image.”
    Moving forward, Shi says that Adobe is interested in incorporating the team’s work into future versions of Substance. In terms of next steps, the team would like to go beyond inputting just a single flat sample, and to instead be able to capture materials from images of curved objects, or with multiple materials in the image.
    They also hope to expand the pipeline to handle more complex materials that have different properties depending on how they are pointed. (For example, with a piece of wood you can see lines going in one direction “with the grain”; wood is stronger with the grain compared to “against the grain”).
    Shi co-wrote the paper with MIT Professor Wojciech Matusik, alongside MIT graduate student research scientist Beichen Li and Adobe researchers Miloš Hašan, Kalyan Sunkavali, Radomír Měch, and Tamy Boubekeur. The paper will be presented virtually this month at the SIGGRAPH Asia computer graphics conference.
    The work is supported, in part, by the National Science Foundation. More

  • in

    Better learning with shape-shifting objects

    Have you ever seen a fancy ergonomic chair that seems to magically mold to a person’s body? Such products got researchers at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) thinking about other everyday objects that could be made to shape-shift to help their users — not only to get things done, but to actually improve their skills in particular areas. 
    One idea they came up with: a basketball hoop that helps you train more effectively by shrinking and raising as you make shots more consistently.
    The thinking is that a beginner could start by using the basket at a lower height and with a wider hoop diameter. As they proceed to make baskets more consistently, the hoop automatically shrinks and rises until it reaches regulation size. 

    Play video

    An Adaptive Basketball Hoop for Training Motor Skills

    Led by MIT Professor Stefanie Mueller, the researchers say that these sorts of adaptive tools could help people who can’t afford coaches or personal trainers to learn different skills or train for sports. They hope that the idea might be particularly timely during the pandemic, with the sudden cancellation of so many in-person gym classes.
    Mueller and colleagues have already started to develop several other prototype tools, including a bicycle with raisable training wheels, an armband that helps golfers keep their arms straight, and even adaptive life jackets and high-heeled shoes. 
    With the basketball hoop, the CSAIL team tested it under two different conditions. In “manually adaptive” mode, the user is the one who changes the hoop’s height and width; in “auto-adaptive” mode, the hoop itself automatically adjusts so that the user is always learning at an “optimal challenge point” where the task is neither too easy nor too hard. 
    Experimental results showed that training on the auto-adaptive hoop led to better performance than with either the static hoop or the manually-adaptive mode — which lead author Dishita Turakhia says is an indication that people often over- or under-challenge themselves and “aren’t all that good at assessing their skill levels.” 
    Users found that, compared to adjusting the hoop themselves, the auto-adaptive system was not just more effective, but more enjoyable and less distracting, since it removed them from having to constantly make decisions about whether to make the task more difficult.
    “It’s interesting in that it’s objectively measuring performance,” says Fraser Anderson, a senior principal research scientist at Autodesk who was not involved in the study. “You don’t have to rely on your own sense of whether or not you’ve mastered a skill: the system can do that and take out the self-doubt, overconfidence, or guesswork.”
    The system’s algorithm for determining shot accuracy is somewhat crude at the moment: It essentially gives the shooter one point if the ball goes through the net, and half a point if it hits the backboard. If the shooter’s average after at least four shots is 0.75 points or greater, the hoop will shrink and rise a set amount, and the whole process will then repeat. (Turakhia says that, with a greater number of sensors and cameras, the hoop could sense a wider range of skills and adapt accordingly.) 
    The team plans to continue to work on adaptive tools for other use cases, including rehabilitation and workplace training. Anderson says he could even imagine an adaptive approach being used in medical schools to help surgeons improve their skills.
    Turakhia and Mueller wrote the paper alongside master’s student Andrew Wong and former graduate students Yini Qi ’17, MNG ’18 and Lotta Blumberg ’18, MNG ’19. They will present the paper virtually in February at the Association for Computing Machinery’s Conference on Tangible, Embedded and Embodied Interaction (TEI). The project was supported, in part, by the MIT Integrated Learning Initiative. More

  • in

    Neuroscientists find a way to make object-recognition models perform better

    Computer vision models known as convolutional neural networks can be trained to recognize objects nearly as accurately as humans do. However, these models have one significant flaw: Very small changes to an image, which would be nearly imperceptible to a human viewer, can trick them into making egregious errors such as classifying a cat as a tree.
    A team of neuroscientists from MIT, Harvard University, and IBM have developed a way to alleviate this vulnerability, by adding to these models a new layer that is designed to mimic the earliest stage of the brain’s visual processing system. In a new study, they showed that this layer greatly improved the models’ robustness against this type of mistake.
    “Just by making the models more similar to the brain’s primary visual cortex, in this single stage of processing, we see quite significant improvements in robustness across many different types of perturbations and corruptions,” says Tiago Marques, an MIT postdoc and one of the lead authors of the study.
    Convolutional neural networks are often used in artificial intelligence applications such as self-driving cars, automated assembly lines, and medical diagnostics. Harvard graduate student Joel Dapello, who is also a lead author of the study, adds that “implementing our new approach could potentially make these systems less prone to error and more aligned with human vision.”
    “Good scientific hypotheses of how the brain’s visual system works should, by definition, match the brain in both its internal neural patterns and its remarkable robustness. This study shows that achieving those scientific gains directly leads to engineering and application gains,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the Center for Brains, Minds, and Machines and the McGovern Institute for Brain Research, and the senior author of the study.
    The study, which is being presented at the NeurIPS conference this month, is also co-authored by MIT graduate student Martin Schrimpf, MIT visiting student Franziska Geiger, and MIT-IBM Watson AI Lab Director David Cox.
    Mimicking the brain
    Recognizing objects is one of the visual system’s primary functions. In just a small fraction of a second, visual information flows through the ventral visual stream to the brain’s inferior temporal cortex, where neurons contain information needed to classify objects. At each stage in the ventral stream, the brain performs different types of processing. The very first stage in the ventral stream, V1, is one of the most well-characterized parts of the brain and contains neurons that respond to simple visual features such as edges.
    “It’s thought that V1 detects local edges or contours of objects, and textures, and does some type of segmentation of the images at a very small scale. Then that information is later used to identify the shape and texture of objects downstream,” Marques says. “The visual system is built in this hierarchical way, where in early stages neurons respond to local features such as small, elongated edges.”
    For many years, researchers have been trying to build computer models that can identify objects as well as the human visual system. Today’s leading computer vision systems are already loosely guided by our current knowledge of the brain’s visual processing. However, neuroscientists still don’t know enough about how the entire ventral visual stream is connected to build a model that precisely mimics it, so they borrow techniques from the field of machine learning to train convolutional neural networks on a specific set of tasks. Using this process, a model can learn to identify objects after being trained on millions of images.
    Many of these convolutional networks perform very well, but in most cases, researchers don’t know exactly how the network is solving the object-recognition task. In 2013, researchers from DiCarlo’s lab showed that some of these neural networks could not only accurately identify objects, but they could also predict how neurons in the primate brain would respond to the same objects much better than existing alternative models. However, these neural networks are still not able to perfectly predict responses along the ventral visual stream, particularly at the earliest stages of object recognition, such as V1.
    These models are also vulnerable to so-called “adversarial attacks.” This means that small changes to an image, such as changing the colors of a few pixels, can lead the model to completely confuse an object for something different — a type of mistake that a human viewer would not make.
    As a first step in their study, the researchers analyzed the performance of 30 of these models and found that models whose internal responses better matched the brain’s V1 responses were also less vulnerable to adversarial attacks. That is, having a more brain-like V1 seemed to make the model more robust. To further test and take advantage of that idea, the researchers decided to create their own model of V1, based on existing neuroscientific models, and place it at the front of convolutional neural networks that had already been developed to perform object recognition.
    When the researchers added their V1 layer, which is also implemented as a convolutional neural network, to three of these models, they found that these models became about four times more resistant to making mistakes on images perturbed by adversarial attacks. The models were also less vulnerable to misidentifying objects that were blurred or distorted due to other corruptions.
    “Adversarial attacks are a big, open problem for the practical deployment of deep neural networks. The fact that adding neuroscience-inspired elements can improve robustness substantially suggests that there is still a lot that AI can learn from neuroscience, and vice versa,” Cox says.
    Better defense
    Currently, the best defense against adversarial attacks is a computationally expensive process of training models to recognize the altered images. One advantage of the new V1-based model is that it doesn’t require any additional training. It is also better able to handle a wide range of distortions, beyond adversarial attacks.
    The researchers are now trying to identify the key features of their V1 model that allows it to do a better job resisting adversarial attacks, which could help them to make future models even more robust. It could also help them learn more about how the human brain is able to recognize objects.
    “One big advantage of the model is that we can map components of the model to particular neuronal populations in the brain,” Dapello says. “We can use this as a tool for novel neuroscientific discoveries, and also continue developing this model to improve its performance under this challenging task.”
    The research was funded by the PhRMA Foundation Postdoctoral Fellowship in Informatics, the Semiconductor Research Corporation, DARPA, the MIT Shoemaker Fellowship, the U.S. Office of Naval Research, the Simons Foundation, and the MIT-IBM Watson AI Lab. More