More stories

  • in

    A language learning system that pays attention — more efficiently than ever before

    Human language can be inefficient. Some words are vital. Others, expendable.
    Reread the first sentence of this story. Just two words, “language” and “inefficient,” convey almost the entire meaning of the sentence. The importance of key words underlies a popular new tool for natural language processing (NLP) by computers: the attention mechanism. When coded into a broader NLP algorithm, the attention mechanism homes in on key words rather than treating every word with equal importance. That yields better results in NLP tasks like detecting positive or negative sentiment or predicting which words should come next in a sentence.
    The attention mechanism’s accuracy often comes at the expense of speed and computing power, however. It runs slowly on general-purpose processors like you might find in consumer-grade computers. So, MIT researchers have designed a combined software-hardware system, dubbed SpAtten, specialized to run the attention mechanism. SpAtten enables more streamlined NLP with less computing power.
    “Our system is similar to how the human brain processes language,” says Hanrui Wang. “We read very fast and just focus on key words. That’s the idea with SpAtten.”
    The research will be presented this month at the IEEE International Symposium on High-Performance Computer Architecture. Wang is the paper’s lead author and a PhD student in the Department of Electrical Engineering and Computer Science. Co-authors include Zhekai Zhang and their advisor, Assistant Professor Song Han.
    Since its introduction in 2015, the attention mechanism has been a boon for NLP. It’s built into state-of-the-art NLP models like Google’s BERT and OpenAI’s GPT-3. The attention mechanism’s key innovation is selectivity — it can infer which words or phrases in a sentence are most important, based on comparisons with word patterns the algorithm has previously encountered in a training phase. Despite the attention mechanism’s rapid adoption into NLP models, it’s not without cost.
    NLP models require a hefty load of computer power, thanks in part to the high memory demands of the attention mechanism. “This part is actually the bottleneck for NLP models,” says Wang. One challenge he points to is the lack of specialized hardware to run NLP models with the attention mechanism. General-purpose processors, like CPUs and GPUs, have trouble with the attention mechanism’s complicated sequence of data movement and arithmetic. And the problem will get worse as NLP models grow more complex, especially for long sentences. “We need algorithmic optimizations and dedicated hardware to process the ever-increasing computational demand,” says Wang.
    The researchers developed a system called SpAtten to run the attention mechanism more efficiently. Their design encompasses both specialized software and hardware. One key software advance is SpAtten’s use of “cascade pruning,” or eliminating unnecessary data from the calculations. Once the attention mechanism helps pick a sentence’s key words (called tokens), SpAtten prunes away unimportant tokens and eliminates the corresponding computations and data movements. The attention mechanism also includes multiple computation branches (called heads). Similar to tokens, the unimportant heads are identified and pruned away. Once dispatched, the extraneous tokens and heads don’t factor into the algorithm’s downstream calculations, reducing both computational load and memory access.
    To further trim memory use, the researchers also developed a technique called “progressive quantization.” The method allows the algorithm to wield data in smaller bitwidth chunks and fetch as few as possible from memory. Lower data precision, corresponding to smaller bitwidth, is used for simple sentences, and higher precision is used for complicated ones. Intuitively it’s like fetching the phrase “cmptr progm” as the low-precision version of “computer program.”
    Alongside these software advances, the researchers also developed a hardware architecture specialized to run SpAtten and the attention mechanism while minimizing memory access. Their architecture design employs a high degree of “parallelism,” meaning multiple operations are processed simultaneously on multiple processing elements, which is useful because the attention mechanism analyzes every word of a sentence at once. The design enables SpAtten to rank the importance of tokens and heads (for potential pruning) in a small number of computer clock cycles. Overall, the software and hardware components of SpAtten combine to eliminate unnecessary or inefficient data manipulation, focusing only on the tasks needed to complete the user’s goal.
    The philosophy behind the system is captured in its name. SpAtten is a portmanteau of “sparse attention,” and the researchers note in the paper that SpAtten is “homophonic with ‘spartan,’ meaning simple and frugal.” Wang says, “that’s just like our technique here: making the sentence more concise.” That concision was borne out in testing.
    The researchers coded a simulation of SpAtten’s hardware design — they haven’t fabricated a physical chip yet — and tested it against competing general-purposes processors. SpAtten ran more than 100 times faster than the next best competitor (a TITAN Xp GPU). Further, SpAtten was more than 1,000 times more energy efficient than competitors, indicating that SpAtten could help trim NLP’s substantial electricity demands.
    The researchers also integrated SpAtten into their previous work, to help validate their philosophy that hardware and software are best designed in tandem. They built a specialized NLP model architecture for SpAtten, using their Hardware-Aware Transformer (HAT) framework, and achieved a roughly two times speedup over a more general model.
    The researchers think SpAtten could be useful to companies that employ NLP models for the majority of their artificial intelligence workloads. “Our vision for the future is that new algorithms and hardware that remove the redundancy in languages will reduce cost and save on the power budget for data center NLP workloads” says Wang.
    On the opposite end of the spectrum, SpAtten could bring NLP to smaller, personal devices. “We can improve the battery life for mobile phone or IoT devices,” says Wang, referring to internet-connected “things” — televisions, smart speakers, and the like. “That’s especially important because in the future, numerous IoT devices will interact with humans by voice and natural language, so NLP will be the first application we want to employ.”
    Han says SpAtten’s focus on efficiency and redundancy removal is the way forward in NLP research. “Human brains are sparsely activated [by key words]. NLP models that are sparsely activated will be promising in the future,” he says. “Not all words are equal — pay attention only to the important ones.” More

  • in

    Fabricating fully functional drones

    From Star Trek’s replicators to Richie Rich’s wishing machine, popular culture has a long history of parading flashy machines that can instantly output any item to a user’s delight. 
    While 3D printers have now made it possible to produce a range of objects that include product models, jewelry, and novelty toys, we still lack the ability to fabricate more complex devices that are essentially ready-to-go right out of the printer. 
    A group from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) recently developed a new system to print functional, custom-made devices and robots, without human intervention. Their single system uses a three-ingredient recipe that lets users create structural geometry, print traces, and assemble electronic components like sensors and actuators. 

    Play video

    LaserFactory: Fabricating fully functional devices

    “LaserFactory” has two parts that work in harmony: a software toolkit that allows users to design custom devices, and a hardware platform that fabricates them. 
    CSAIL PhD student Martin Nisser says that this type of “one-stop shop” could be beneficial for product developers, makers, researchers, and educators looking to rapidly prototype things like wearables, robots, and printed electronics. 
    “Making fabrication inexpensive, fast, and accessible to a layman remains a challenge,” says Nisser, lead author on a paper about LaserFactory that will appear in the ACM Conference on Human Factors in Computing Systems in May. “By leveraging widely available manufacturing platforms like 3D printers and laser cutters, LaserFactory is the first system that integrates these capabilities and automates the full pipeline for making functional devices in one system.” 
    Inside LaserFactory 
    Let’s say a user has aspirations to create their own drone. They’d first design their device by placing components on it from a parts library, and then draw on circuit traces, which are the copper or aluminum lines on a printed circuit board that allow electricity to flow between electronic components. They’d then finalize the drone’s geometry in the 2D editor. In this case, they’d use propellers and batteries on the canvas, wire them up to make electrical connections, and draw the perimeter to define the quadcopter’s shape. 
    The user can then preview their design before the software translates their custom blueprint into machine instructions. The commands are embedded into a single fabrication file for LaserFactory to make the device in one go, aided by the standard laser cutter software. On the hardware side, an add-on that prints circuit traces and assembles components is clipped onto the laser cutter. 
    Similar to a chef, LaserFactory automatically cuts the geometry, dispenses silver for circuit traces, picks and places components, and finally cures the silver to make the traces conductive, securing the components in place to complete fabrication. 
    The device is then fully functional, and in the case of the drone, it can immediately take off to begin a task — a feature that could in theory be used for diverse jobs such as delivery or search-and-rescue operations.
    As a future avenue, the team hopes to increase the quality and resolution of the circuit traces, which would allow for denser and more complex electronics. 
    As well as fine-tuning the current system, the researchers hope to build on this technology by exploring how to create a fuller range of 3D geometries, potentially through integrating traditional 3D printing into the process. 
    “Beyond engineering, we’re also thinking about how this kind of one-stop shop for fabrication devices could be optimally integrated into today’s existing supply chains for manufacturing, and what challenges we may need to solve to allow for that to happen,” says Nisser. “In the future, people shouldn’t be expected to have an engineering degree to build robots, any more than they should have a computer science degree to install software.” 
    This research is based upon work supported by the National Science Foundation. The work was also supported by a Microsoft Research Faculty Fellowship and The Royal Swedish Academy of Sciences. More

  • in

    Examining the world through signals and systems

    There’s a mesmerizing video animation on YouTube of simulated, self-driving traffic streaming through a six-lane, four-way intersection. Dozens of cars flow through the streets, pausing, turning, slowing, and speeding up to avoid colliding with their neighbors. And not a single car stopping. But what if even one of those vehicles was not autonomous? What if only one was?
    In the coming decades, autonomous vehicles will play a growing role in society, whether keeping drivers safer, making deliveries, or increasing accessibility and mobility for elderly or disabled passengers.
    But MIT Assistant Professor Cathy Wu argues that autonomous vehicles are just part of a complex transport system that may involve individual self-driving cars, delivery fleets, human drivers, and a range of last-mile solutions to get passengers to their doorstep — not to mention road infrastructure like highways, roundabouts, and, yes, intersections.
    Transport today accounts for about one-third of U.S. energy consumption. The decisions we make today about autonomous vehicles could have a big impact on this number — ranging from a 40 percent decrease in energy use to a doubling of energy consumption.
    So how can we better understand the problem of integrating autonomous vehicles into the transportation system? Equally important, how can we use this understanding to guide us toward better-functioning systems?
    Wu, who joined the Laboratory for Information and Decision Systems (LIDS) and MIT in 2019, is the Gilbert W. Winslow Assistant Professor of Civil and Environmental Engineering as well as a core faculty member of the MIT Institute for Data, Systems, and Society. Growing up in a Philadelphia-area family of electrical engineers, Wu sought a field that would enable her to harness engineering skills to solve societal challenges. 
    During her years as an undergraduate at MIT, she reached out to Professor Seth Teller of the Computer Science and Artificial Intelligence Laboratory to discuss her interest in self-driving cars.
    Teller, who passed away in 2014, met her questions with warm advice, says Wu. “He told me, ‘If you have an idea of what your passion in life is, then you have to go after it as hard as you possibly can. Only then can you hope to find your true passion.’
    “Anyone can tell you to go after your dreams, but his insight was that dreams and ambitions are not always clear from the start. It takes hard work to find and pursue your passion.” 
    Chasing that passion, Wu would go on to work with Teller, as well as in Professor Daniela Rus’s Distributed Robotics Laboratory, and finally as a graduate student at the University of California at Berkeley, where she won the IEEE Intelligent Transportation Systems Society’s best PhD award in 2019.
    In graduate school, Wu had an epiphany: She realized that for autonomous vehicles to fulfill their promise of fewer accidents, time saved, lower emissions, and greater socioeconomic and physical accessibility, these goals must be explicitly designed-for, whether as physical infrastructure, algorithms used by vehicles and sensors, or deliberate policy decisions.
    At LIDS, Wu uses a type of machine learning called reinforcement learning to study how traffic systems behave, and how autonomous vehicles in those systems ought to behave to get the best possible outcomes.
    Reinforcement learning, which was most famously used by AlphaGo, DeepMind’s human-beating Go program, is a powerful class of methods that capture the idea behind trial-and-error — given an objective, a learning agent repeatedly attempts to achieve the objective, failing and learning from its mistakes in the process.
    In a traffic system, the objectives might be to maximize the overall average velocity of vehicles, to minimize travel time, to minimize energy consumption, and so on.
    When studying common components of traffic networks such as grid roads, bottlenecks, and on- and off-ramps, Wu and her colleagues have found that reinforcement learning can match, and in some cases exceed, the performance of current traffic control strategies. And more importantly, reinforcement learning can shed new light toward understanding complex networked systems — which have long evaded classical control techniques. For instance, if just 5 to 10 percent of vehicles on the road were autonomous and used reinforcement learning, that could eliminate congestion and boost vehicle speeds by 30 to 140 percent. And the learning from one scenario often translates well to others. These insights could one day soon help to inform public policy or business decisions.
    In the course of this research, Wu and her colleagues helped improve a class of reinforcement learning methods called policy gradient methods. Their advancements turned out to be a general improvement to most existing deep reinforcement learning methods.
    But reinforcement learning techniques will need to be continually improved to keep up with the scale and shifts in infrastructure and changing behavior patterns. And research findings will need to be translated into action by urban planners, auto makers and other organizations.
    Today, Wu is collaborating with public agencies in Taiwan and Indonesia to use insights from her work to guide better dialogues and decisions. By changing traffic signals or using nudges to shift drivers’ behavior, are there other ways to achieve lower emissions or smoother traffic? 
    “I’m surprised by this work every day,” says Wu. “We set out to answer a question about self-driving cars, and it turns out you can pull apart the insights, apply them in other ways, and then this leads to new exciting questions to answer.”
    Wu is happy to have found her intellectual home at LIDS. Her experience of it is as a “very deep, intellectual, friendly, and welcoming place.” And she counts among her research inspirations MIT course 6.003 (Signals and Systems) — a class she encourages everyone to take — taught in the tradition of professors Alan Oppenheim (Research Laboratory of Electronics) and Alan Willsky (LIDS). “The course taught me that so much in this world could be fruitfully examined through the lens of signals and systems, be it electronics or institutions or society,” she says. “I am just realizing as I’m saying this, that I’ve been empowered by LIDS thinking all along!”
    Research and teaching through a pandemic haven’t been easy, but Wu is making the best of a challenging first year as faculty. (“I’ve been working from home in Cambridge — my short walking commute is irrelevant at this point,” she says wryly.) To unwind, she enjoys running, listening to podcasts covering topics ranging from science to history, and reverse-engineering her favorite Trader Joe’s frozen foods.
    She’s also been working on two Covid-related projects born at MIT: One explores how data from the environment, such as data collected by internet-of-things-connected thermometers, can help identify emerging community outbreaks. Another project asks if it’s possible to ascertain how contagious the virus is on public transport, and how different factors might decrease the transmission risk.
    Both are in their early stages, Wu says. “We hope to contribute a bit to the pool of knowledge that can help decision-makers somewhere. It’s been very enlightening and rewarding to do this and see all the other efforts going on around MIT.”  More

  • in

    Byte-sized learning

    For 50 years, MIT students have taken advantage of Independent Activities Period, a special mini-term, only four weeks long, tucked between the end of the fall and beginning of the spring semesters. This year, IAP looked a little different, as Covid-19 precautions led instructors to shift their classes online, but the term’s spirit of innovative, creative exploration remained.
    In keeping with that spirit, IAP offerings from the Department of Electrical Engineering and Computer Science (EECS) ranged from the playful to the profound.  
    Many MIT students take advantage of the short IAP session to delve into a topic or field outside their regular comfort zone. “Deep Learning For Art, Aesthetics, and Creativity” was designed to accommodate exactly that sort of exploration. The course’s instructor, Ali Jahanian, says his inspiration for teaching the IAP stems from his own experiences with the arts. “I have a background in design, painting, and visual arts, and always got inspired from that kind of creative work. In my research, I have been working on understanding and quantifying aesthetics and design, but after my PhD thesis, I got more involved with the intriguing notion of learning by creating,” he reports. “While many of my colleagues are working on understanding intelligence, I am specifically interested in the angle of creativity and innovation of our intelligence. For instance, how can we generalize from data to create data that is out-of-distribution of the seen datasets?”
    Jahanian’s approach is specifically geared to appeal not only to computer scientists, but to anyone with curiosity about the artistic possibilities of AI. “The idea of the course is to help students understand how can we use AI for creativity, and how creativity can help us learn and develop better AI. First they need to understand what’s happening right now, in 2021, around AI and creativity — then they need to understand the boundaries of those notions and how they can push that boundary forward,” says Jahanian, who sees the use of artificial intelligence as an inherently creative act. “To me, AI is fascinating because it’s a reflection of how we are. We have the desire of replicating and recreating ourselves, and that is why artists get joy out of creating.”
    The applications of a creative AI — or one that can predict the human aesthetic preference — are numerous. “We can certainly learn and quantify our taste. Those kinds of quantitative algorithms have broad applications in understanding our emotions and feeling; for instance, if you have a robot which communicates with you, in your home, you want that robot to understand what you want, what you like, what is your personal taste.” More immediately, Jahanian believes his students will benefit from the tangible nature of art as a tool for learning. “I think of this process as learning by defining a problem; if the problem is something interesting and tangible, something we can relate to, then we have a better chance of engaging ourselves,” says Jahanian, noting that the scholarly term for the satisfying feeling of solving a problem is “visceral aesthetics.” “Hopefully, the students will get motivated to learn something about AI through this cool topic. Sometimes it is hard for students to understand how a loss function works, but if I could see it visually while trying to match two images, maybe that can help me to intuitively understand loss function or the math behind it. Maybe that even helps me become motivated to learn more.”
    For those who like a bit of risk with their learning, the Pokerbots competition, now going on its 10th year, offered the chance to win real prizes by designing and deploying competitive, card-playing bots in a virtual tournament. “I joined freshman year because I was interested in the intersection of math, computer science, and game theory; Pokerbots seemed like a good way to combine all three,” says Shreyas Srinivasan, now a junior in mathematics with computer science who is serving as the president of MIT Pokerbots for the second year. “The project of building a bot forces you to apply the concepts you’ve learned, dive deep into them, and increase your understanding. It leaves you with a sense of accomplishment and the unique experience of creating a bot that can demolish human players in poker.”
    Srinivasan isn’t exaggerating the bots’ power. “There is a good comparison to be made to chess and other games that have been solved with computational processes,” says Stephen Otremba, a senior in Course 6-3 (Computer Science and Engineering) and head instructor of this year’s Pokerbots course. “With the progress we’ve made in the realms of machine and reinforcement learning, programs have been able to advance more quickly than human play.” Each Pokerbots attendee had the chance to test that claim; at the end of each IAP, the organizers have traditionally set aside time for the players to try their luck against their own bots. “Over the last few years, in the game variants that we set, the bots are able to beat their creators handily,” reports Srinivasan. Those variants change every year — this year, the organizers took their inspiration from popular minigame Blotto by providing each player with an individual pool of chips. The players then created a bot that can advantageously allocate the pool across three simultaneous poker games. Fittingly, this year’s competition was primarily sponsored by quantitative trading firms. Srinivasan explains why: “Both trading and poker revolve around risk management, and this year’s game variant is all about managing the resource of your starting stack of chips across three boards. It is similar to trading where you are managing a portfolio of different stocks and securities and seeking maximum returns from that allocation.”
    Both Srinivasan and Otremba acknowledge the changes that Covid brought to their beloved competition, but those changes are not without a silver lining. “Something we’re considering implementing in future years is continuing to livestream our lectures, because it means that more students are able to attend,” says Srinivasan, who also reports greater ease in booking guest speakers over Zoom. This year, one of those guest speakers included famed computational poker researcher Noam Brown, now an AI researcher at Facebook, who developed Libratus, one of the first AIs capable of beating professional poker players. Whether Pokerbots students go on to reign on the competitive circuit or join a trading firm, their experience is sure to serve them well. Says Otremba: “The Pokerbots competition not only provided me an outlet in a real project environment, but taught me about putting my ideas into code, collaborating with a team effectively, learning all the stages that software development goes through; handling all the challenges like debugging and theory problems. It provided me with a great way to get some valuable experience in the software area and acted like a springboard to even more challenging projects.”
    “Code For Good” was another offering for students who wanted to use their technical skills to make a positive difference in the world. The long-running workshop partnered with nonprofit organizations to tackle technical projects, giving students a chance to contribute valuable time and expertise to organizations that align with their personal areas of interest. “I really believe in the mission of Code For Good, helping nonprofits solve their challenges with technology,” says Lucy Liao, a senior majoring in 6-3 who joined Code For Good in her junior year as an organizer, a project management role that helps link a team of students with a nonprofit. “It was an interesting experience because, in addition to thinking about projects and code, there was a lot of communications as well — talking to the nonprofits, making sure everyone is aware of the deadlines, keeping the projects moving smoothly,” Liao reports.
    Those projects run the gamut, from discrete and contained to sprawling and ambitious. Senior Victoria Juan explains the application process: “Throughout the school year, the nonprofits apply for help through a Google Form which asks them to include details about themselves, like what kind of project they’re looking for and what kind of resources or budget they already have. At the beginning of IAP, we make the projects known to the students, who make their choices based on the project; sometimes, people are passionate about the mission of the nonprofit as well.” Those nonprofits run the gamut, from a peace-building organization based in Afghanistan to the Cambridge, Massachusetts, YWCA.  
    While the Code For Good team used to exert a degree of editorial control over the scope of the projects, they more recently experimented with allowing the students to take on as much as they feel they can accomplish — with surprising results. “There were quite a few projects which I thought were so big, and I was not really sure if they were doable within one month, but I’ve been surprised by how much progress our students have been able to make over the course of one January,” says Liao. “It’s cool to see what people have been able to accomplish in such a short time.”
    Besides the resume-building software accomplishments, Code For Good students are able to add leadership skills to their portfolio. Says Juan: “Helping to run the IAP was a really valuable experience because I had the chance to experience being the teacher and the leader of a classroom: making deadlines, running weekly presentations, to feel that responsibility and to make sure that the teams and projects are running on schedule — or, if they’re not, help them find a way to resolve those issues.” And Liao points out the surprisingly important role of communications in this tech-forward club: “A big part of the experience is learning to communicate with nontechnical people. We work often with people at MIT who are very technical, and so that’s easy for us — but with people who aren’t familiar with computers and are less comfortable with technical lingo, there are challenges. I think that’s valuable communications experience to get.”
    Not too bad for a byte-sized semester! More

  • in

    Machine-learning model helps determine protein structures

    Cryo-electron microscopy (cryo-EM) allows scientists to produce high-resolution, three-dimensional images of tiny molecules such as proteins. This technique works best for imaging proteins that exist in only one conformation, but MIT researchers have now developed a machine-learning algorithm that helps them identify multiple possible structures that a protein can take.
    Unlike AI techniques that aim to predict protein structure from sequence data alone, protein structure can also be experimentally determined using cryo-EM, which produces hundreds of thousands, or even millions, of two-dimensional images of protein samples frozen in a thin layer of ice. Computer algorithms then piece together these images, taken from different angles, into a three-dimensional representation of the protein in a process termed reconstruction.
    In a Nature Methods paper, the MIT researchers report a new AI-based software for reconstructing multiple structures and motions of the imaged protein — a major goal in the protein science community. Instead of using the traditional representation of protein structure as electron-scattering intensities on a 3D lattice, which is impractical for modeling multiple structures, the researchers introduced a new neural network architecture that can efficiently generate the full ensemble of structures in a single model.
    “With the broad representation power of neural networks, we can extract structural information from noisy images and visualize detailed movements of macromolecular machines,” says Ellen Zhong, an MIT graduate student and the lead author of the paper.

    Graduate student Ellen Zhong shows how her team combines cryo-electron microscopy and machine learning to visualize molecules in 3D.

    With their software, they discovered protein motions from imaging datasets where only a single static 3D structure was originally identified. They also visualized large-scale flexible motions of the spliceosome — a protein complex that coordinates the splicing of the protein coding sequences of transcribed RNA.
    “Our idea was to try to use machine-learning techniques to better capture the underlying structural heterogeneity, and to allow us to inspect the variety of structural states that are present in a sample,” says Joseph Davis, the Whitehead Career Development Assistant Professor in MIT’s Department of Biology.
    Davis and Bonnie Berger, the Simons Professor of Mathematics at MIT and head of the Computation and Biology group at the Computer Science and Artificial Intelligence Laboratory, are the senior authors of the study, which appears today in Nature Methods. MIT postdoc Tristan Bepler is also an author of the paper.
    Visualizing a multistep process
    The researchers demonstrated the utility of their new approach by analyzing structures that form during the process of assembling ribosomes — the cell organelles responsible for reading messenger RNA and translating it into proteins. Davis began studying the structure of ribosomes while a postdoc at the Scripps Research Institute. Ribosomes have two major subunits, each of which contains many individual proteins that are assembled in a multistep process.
    To study the steps of ribosome assembly in detail, Davis stalled the process at different points and then took electron microscope images of the resulting structures. At some points, blocking assembly resulted in accumulation of just a single structure, suggesting that there is only one way for that step to occur. However, blocking other points resulted in many different structures, suggesting that the assembly could occur in a variety of ways.
    Because some of these experiments generated so many different protein structures, traditional cryo-EM reconstruction tools did not work well to determine what those structures were.
    “In general, it’s an extremely challenging problem to try to figure out how many states you have when you have a mixture of particles,” Davis says.
    After starting his lab at MIT in 2017, he teamed up with Berger to use machine learning to develop a model that can use the two-dimensional images produced by cryo-EM to generate all of the three-dimensional structures found in the original sample.
    In the new Nature Methods study, the researchers demonstrated the power of the technique by using it to identify a new ribosomal state that hadn’t been seen before. Previous studies had suggested that as a ribosome is assembled, large structural elements, which are akin to the foundation for a building, form first. Only after this foundation is formed are the “active sites” of the ribosome, which read messenger RNA and synthesize proteins, added to the structure.
    In the new study, however, the researchers found that in a very small subset of ribosomes, about 1 percent, a structure that is normally added at the end actually appears before assembly of the foundation. To account for that, Davis hypothesizes that it might be too energetically expensive for cells to ensure that every single ribosome is assembled in the correct order.
    “The cells are likely evolved to find a balance between what they can tolerate, which is maybe a small percentage of these types of potentially deleterious structures, and what it would cost to completely remove them from the assembly pathway,” he says.
    Viral proteins
    The researchers are now using this technique to study the coronavirus spike protein, which is the viral protein that binds to receptors on human cells and allows them to enter cells. The receptor binding domain (RBD) of the spike protein has three subunits, each of which can point either up or down.
    “For me, watching the pandemic unfold over the past year has emphasized how important front-line antiviral drugs will be in battling similar viruses, which are likely to emerge in the future. As we start to think about how one might develop small molecule compounds to force all of the RBDs into the ‘down’ state so that they can’t interact with human cells, understanding exactly what the ‘up’ state looks like and how much conformational flexibility there is will be informative for drug design. We hope our new technique can reveal these sorts of structural details,” Davis says.
    The research was funded by the National Science Foundation Graduate Research Fellowship Program, the National Institutes of Health, and the MIT Jameel Clinic for Machine Learning and Health. This work was supported by MIT Satori computation cluster hosted at the MGHPCC. More

  • in

    Epigenomic map reveals circuitry of 30,000 human disease regions

    Twenty years ago this month, the first draft of the human genome was publicly released. One of the major surprises that came from that project was the revelation that only 1.5 percent of the human genome consists of protein-coding genes.
    Over the past two decades, it has become apparent that those noncoding stretches of DNA, originally thought to be “junk DNA,” play critical roles in development and gene regulation. In a new study published today, a team of researchers from MIT has published the most comprehensive map yet of this noncoding DNA.
    This map provides in-depth annotation of epigenomic marks — modifications indicating which genes are turned on or off in different types of cells — across 833 tissues and cell types, a significant increase over what has been covered before. The researchers also identified groups of regulatory elements that control specific biological programs, and they uncovered candidate mechanisms of action for about 30,000 genetic variants linked to 540 specific traits.
    “What we’re delivering is really the circuitry of the human genome. Twenty years later, we not only have the genes, we not only have the noncoding annotations, but we have the modules, the upstream regulators, the downstream targets, the disease variants, and the interpretation of these disease variants,” says Manolis Kellis, a professor of computer science, a member of MIT’s Computer Science and Artificial Intelligence Laboratory and of the Broad Institute of MIT and Harvard, and the senior author of the new study.
    MIT graduate student Carles Boix is the lead author of the paper, which appears today in Nature. Other authors of the paper are MIT graduate students Benjamin James and former MIT postdocs Yongjin Park and Wouter Meuleman, who are now principal investigators at the University of British Columbia and the Altius Institute for Biomedical Sciences, respectively. The researchers have made all of their data publicly available for the broader scientific community to use.
    Epigenomic control
    Layered atop the human genome — the sequence of nucleotides that makes up the genetic code — is the epigenome. The epigenome consists of chemical marks that help determine which genes are expressed at different times, and in different cells. These marks include histone modifications, DNA methylation, and how accessible a given stretch of DNA is.
    “Epigenomics directly reads the marks used by our cells to remember what to turn on and what to turn off in every cell type, and in every tissue of our body. They act as post-it notes, highlighters, and underlining,” Kellis says. “Epigenomics allows us to peek at what each cell marked as important in every cell type, and thus understand how the genome actually functions.”
    Mapping these epigenomic annotations can reveal genetic control elements, and the cell types in which different elements are active. These control elements can be grouped into clusters or modules that function together to control specific biological functions. Some of these elements are enhancers, which are bound by proteins that activate gene expression, while others are repressors that turn genes off.
    The new map, EpiMap (Epigenome Integration across Multiple Annotation Projects), builds on and combines data from several large-scale mapping consortia, including ENCODE, Roadmap Epigenomics, and Genomics of Gene Regulation.
    The researchers assembled a total of 833 biosamples, representing diverse tissues and cell types, each of which was mapped with a slightly different subset of epigenomic marks, making it difficult to fully integrate data across the multiple consortia. They then filled in the missing datasets, by combining available data for similar marks and biosamples, and used the resulting compendium of 10,000 marks across 833 biosamples to study gene regulation and human disease.
    The researchers annotated more than 2 million enhancer sites, covering only 0.8 percent of each biosample, and collectively 13 percent of the genome. They grouped them into 300 modules based on their activity patterns, and linked them to the biological processes they control, the regulators that control them, and the short sequence motifs that mediate this control. The researchers also predicted 3.3 million links between control elements and the genes that they target based on their coordinated activity patterns, representing the most complete circuitry of the human genome to date.
    Disease links
    Since the final draft of the human genome was completed in 2003, researchers have performed thousands of genome-wide association studies (GWAS), revealing common genetic variants that predispose their carriers to a particular trait or disease.
    These studies have yielded about 120,000 variants, but only 7 percent of these are located within protein-coding genes, leaving 93 percent that lie in regions of noncoding DNA.
    How noncoding variants act is extremely difficult to resolve, however, for many reasons. First, genetic variants are inherited in blocks, making it difficult to pinpoint causal variants among dozens of variants in each disease-associated region. Moreover, noncoding variants can act at large distances, sometimes millions of nucleotides away, making it difficult to find their target gene of action. They are also extremely dynamic, making it difficult to know which tissue they act in. Lastly, understanding their upstream regulators remains an unsolved problem.
    In this study, the researchers were able to address these questions and provide candidate mechanistic insights for more than 30,000 of these noncoding GWAS variants. The researchers found that variants associated with the same trait tended to be enriched in specific tissues that are biologically relevant to the trait. For example, genetic variants linked to intelligence were found to be in noncoding regions active in the brain, while variants associated with cholesterol level are in regions active in the liver.
    The researchers also showed that some traits or diseases are affected by enhancers active in many different tissue types. For example, they found that genetic variants associated with coronary heart disease (CAD) were active in adipose tissue, coronary arteries, and the liver, among many other tissues.
    Kellis’ lab is now working with diverse collaborators to pursue their leads in specific diseases, guided by these genome-wide predictions. They are profiling heart tissue from patients with coronary artery disease, microglia from Alzheimer’s patients, and muscle, adipose, and blood from obesity patients, which are predicted mediators of these disease based on the current paper, and his lab’s previous work.
    Many other labs are already using the EpiMap data to pursue studies of diverse diseases. “We hope that our predictions will be used broadly in industry and in academia to help elucidate genetic variants and their mechanisms of action, help target therapies to the most promising targets, and help accelerate drug development for many disorders,” Kellis says.
    The research was funded by the National Institutes of Health. More

  • in

    An origami-inspired medical patch for sealing internal injuries

    Many surgeries today are performed via minimally invasive procedures, in which a small incision is made and miniature cameras and surgical tools are threaded through the body to remove tumors and repair damaged tissues and organs. The process results in less pain and shorter recovery times compared to open surgery.
    While many procedures can be performed in this way, surgeons can face challenges at an important step in the process: the sealing of internal wounds and tears.
    Taking inspiration from origami, MIT engineers have now designed a medical patch that can be folded around minimally invasive surgical tools and delivered through airways, intestines, and other narrow spaces, to patch up internal injuries. The patch resembles a foldable, paper-like film when dry. Once it makes contact with wet tissues or organs, it transforms into a stretchy gel, similar to a contact lens, and can stick to an injured site.
    In contrast to existing surgical adhesives, the team’s new tape is designed to resist contamination when exposed to bacteria and bodily fluids. Over time, the patch can safely biodegrade away. The team has published its results in the journal Advanced Materials.

    Play video

    The researchers are working with clinicians and surgeons to optimize the design for surgical use, and they envision that the new bioadhesive could be delivered via minimally invasive surgical tools, operated by a surgeon either directly or remotely via a medical robot.
    “Minimally invasive surgery and robotic surgery are being increasingly adopted, as they decrease trauma and hasten recovery related to open surgery. However, the sealing of internal wounds is challenging in these surgeries,” says Xuanhe Zhao, professor of mechanical engineering and of civil and environmental engineering at MIT.
    “This patch technology spans many fields,” adds co-author Christoph Nabzdyk, a cardiac anesthesiologist and critical care physician at the Mayo Clinic in Rochester, Minnesota. “This could be used to repair a perforation from a coloscopy, or seal solid organs or blood vessels after a trauma or elective surgical intervention. Instead of having to carry out a full open surgical approach, one could go from the inside to deliver a patch to seal a wound at least temporarily and maybe even long-term.”
    The study’s co-authors include lead authors Sarah Wu and Hyunwoo Yuk, and Jingjing Wu at MIT.
    Layered protection
    The bioadhesives currently used in minimally invasive surgeries are available mostly as biodegradable liquids and glues that can be spread over damaged tissues. When these glues solidify, however, they can stiffen over the softer underlying surface, creating an imperfect seal. Blood and other biological fluids can also contaminate glues, preventing successful adhesion to the injured site. Glues can also wash away before an injury has fully healed, and, after application, they can also cause inflammation and scar tissue formation.
    Given the limitations of current designs, the team aimed to engineer an alternative that would meet three functional requirements. It should be able to stick to the wet surface of an injured site, avoid binding to anything before reaching its destination, and once applied to an injured site resist bacterial contamination and excessive inflammation.
    The team’s design meets all three requirements, in the form of a three-layered patch. The middle layer is the main bioadhesive, made from a hydrogel material that is embedded with compounds called NHS esters. When in contact with a wet surface, the adhesive absorbs any surrounding water and becomes pliable and stretchy, molding to a tissue’s contours. Simultaneously, the esters in the adhesive form strong covalent bonds with compounds on the tissue surface, creating a tight seal between the two materials. The design of this middle layer is based on previous work in Zhao’s group.
    The team then sandwiched the adhesive with two layers, each with a different protective effect. The bottom layer is made from a material coated with silicone oil, which acts to temporarily lubricate the adhesive, preventing it from sticking to other surfaces as it travels through the body. When the adhesive reaches its destination and is pressed lightly against an injured tissue, the silicone oil is squeezed out, allowing the adhesive to bind to the tissue. 
    The adhesive’s top layer consists of an elastomer film embedded with zwitterionic polymers, or molecular chains made from both positive and negative ions that act to attract any surrounding water molecules to the elastomer’s surface. In this way, the adhesive’s outward-facing layer forms a water-based skin, or barrier against bacteria and other contaminants.
    “In minimally invasive surgery, you don’t have the luxury of easily accessing a site to apply an adhesive,” Yuk says. “You really are battling a lot of random contaminants and body fluids on your way to your destination.”
    Fit for robots
    In a series of demonstrations, the researchers showed that the new bioadhesive strongly adheres to animal tissue samples, even after being submerged in beakers of fluid, including blood, for long periods of time.
    They also used origami-inspired techniques to fold the adhesive around instruments commonly used in minimally invasive surgeries, such as a balloon catheter and a surgical stapler. They threaded these tools through animal models of major airways and vessels, including the trachea, esophagus, aorta, and intestines. By inflating the balloon catheter or applying light pressure to the stapler, they were able to stick the patch onto torn tissues and organs, and found no signs of contamination on or near the patched-up site up to one month after its application.
    The researchers envision that the new bioadhesive could be manufactured in prefolded configurations that surgeons can easily fit around minimally invasive instruments as well as on tools that are currently being used in robotic surgery. They are seeking to collaborate with designers to integrate the bioadhesive into robotic surgery platforms.
    “We believe that the conceptual novelty in the form and function of this patch represents an exciting step toward overcoming translational barriers in robotic surgery and facilitating the wider clinical adoption of bioadhesive materials,” Wu says.
    This research was supported, in part, by the National Science Foundation. More

  • in

    Robust artificial intelligence tools to predict future cancer 

    To catch cancer earlier, we need to predict who is going to get it in the future. The complex nature of forecasting risk has been bolstered by artificial intelligence (AI) tools, but the adoption of AI in medicine has been limited by poor performance on new patient populations and neglect to racial minorities. 
    Two years ago, a team of scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Jameel Clinic (J-Clinic) demonstrated a deep learning system to predict cancer risk using just a patient’s mammogram. The model showed significant promise and even improved inclusivity: It was equally accurate for both white and Black women, which is especially important given that Black women are 43 percent more likely to die from breast cancer. 
    But to integrate image-based risk models into clinical care and make them widely available, the researchers say the models needed both algorithmic improvements and large-scale validation across several hospitals to prove their robustness. 
    To that end, they tailored their new “Mirai” algorithm to capture the unique requirements of risk modeling. Mirai jointly models a patient’s risk across multiple future time points, and can optionally benefit from clinical risk factors such as age or family history, if they are available. The algorithm is also designed to produce predictions that are consistent across minor variances in clinical environments, like the choice of mammography machine.  

    Robust artificial intelligence tools may be used to predict future breast cancer.

    The team trained Mirai on the same dataset of over 200,000 exams from Massachusetts General Hospital (MGH) from their prior work, and validated it on test sets from MGH, the Karolinska Institute in Sweden, and Chang Gung Memorial Hospital in Taiwan. Mirai is now installed at MGH, and the team’s collaborators are actively working on integrating the model into care. 
    Mirai was significantly more accurate than prior methods in predicting cancer risk and identifying high-risk groups across all three datasets. When comparing high-risk cohorts on the MGH test set, the team found that their model identified nearly two times more future cancer diagnoses compared the current clinical standard, the Tyrer-Cuzick model. Mirai was similarly accurate across patients of different races, age groups, and breast density categories in the MGH test set, and across different cancer subtypes in the Karolinska test set. 
    “Improved breast cancer risk models enable targeted screening strategies that achieve earlier detection, and less screening harm than existing guidelines,” says Adam Yala, CSAIL PhD student and lead author on a paper about Mirai that was published this week in Science Translational Medicine. “Our goal is to make these advances part of the standard of care. We are partnering with clinicians from Novant Health in North Carolina, Emory in Georgia, Maccabi in Israel, TecSalud in Mexico, Apollo in India, and Barretos in Brazil to further validate the model on diverse populations and study how to best clinically implement it.” 
    How it works 
    Despite the wide adoption of breast cancer screening, the researchers say the practice is riddled with controversy: More-aggressive screening strategies aim to maximize the benefits of early detection, whereas less-frequent screenings aim to reduce false positives, anxiety, and costs for those who will never even develop breast cancer.  
    Current clinical guidelines use risk models to determine which patients should be recommended for supplemental imaging and MRI. Some guidelines use risk models with just age to determine if, and how often, a woman should get screened; others combine multiple factors related to age, hormones, genetics, and breast density to determine further testing. Despite decades of effort, the accuracy of risk models used in clinical practice remains modest.  
    Recently, deep learning mammography-based risk models have shown promising performance. To bring this technology to the clinic, the team identified three innovations they believe are critical for risk modeling: jointly modeling time, the optional use of non-image risk factors, and methods to ensure consistent performance across clinical settings. 
    1. Time
    Inherent to risk modeling is learning from patients with different amounts of follow-up, and assessing risk at different time points: this can determine how often they get screened, whether they should have supplemental imaging, or even consider preventive treatments. 
    Although it’s possible to train separate models to assess risk for each time point, this approach can result in risk assessments that don’t make sense — like predicting that a patient has a higher risk of developing cancer within two years than they do within five years. To address this, the team designed their model to predict risk at all time points simultaneously, by using a tool called an “additive-hazard layer.” 
    The additive-hazard layer works as follows: Their network predicts a patient’s risk at a time point, such as five years, as an extension of their risk at the previous time point, such as four years. In doing so, their model can learn from data with variable amounts of follow-up, and then produce self-consistent risk assessments. 
    2. Non-image risk factors
    While this method primarily focuses on mammograms, the team wanted to also use non-image risk factors such as age and hormonal factors if they were available — but not require them at the time of the test. One approach would be to add these factors as an input to the model with the image, but this design would prevent the majority of hospitals (such as Karolinska and CGMH), which don’t have this infrastructure, from using the model. 
    For Mirai to benefit from risk factors without requiring them, the network predicts that information at training time, and if it’s not there, it can use its own predictive version. Mammograms are rich sources of health information, and so many traditional risk factors such as age and menopausal status can be easily predicted from their imaging. As a result of this design, the same model could be used by any clinic globally, and if they have that additional information, they can use it. 
    3. Consistent performance across clinical environments
    To incorporate deep-learning risk models into clinical guidelines, the models must perform consistently across diverse clinical environments, and its predictions cannot be affected by minor variations like which machine the mammogram was taken on. Even across a single hospital, the scientists found that standard training did not produce consistent predictions before and after a change in mammography machines, as the algorithm could learn to rely on different cues specific to the environment. To de-bias the model, the team used an adversarial scheme where the model specifically learns mammogram representations that are invariant to the source clinical environment, to produce consistent predictions. 
    To further test these updates across diverse clinical settings, the scientists evaluated Mirai on new test sets from Karolinska in Sweden and Chang Gung Memorial Hospital in Taiwan, and found it obtained consistent performance. The team also analyzed the model’s performance across races, ages, and breast density categories in the MGH test set, and across cancer subtypes on the Karolinska dataset, and found it performed similarly across all subgroups. 
    “African-American women continue to present with breast cancer at younger ages, and often at later stages,” says Salewai Oseni, a breast surgeon at Massachusetts General Hospital who was not involved with the work. “This, coupled with the higher instance of triple-negative breast cancer in this group, has resulted in increased breast cancer mortality. This study demonstrates the development of a risk model whose prediction has notable accuracy across race. The opportunity for its use clinically is high.” 
    Here’s how Mirai works: 
    1. The mammogram image is put through something called an “image encoder.”
    2. Each image representation, as well as which view it came from, is aggregated with other images from other views to obtain a representation of the entire mammogram.
    3. With the mammogram, a patient’s traditional risk factors are predicted using a Tyrer-Cuzick model (age, weight, hormonal factors). If unavailable, predicted values are used. 
    4. With this information, the additive-hazard layer predicts a patient’s risk for each year over the next five years. 
    Improving Mirai 
    Although the current model doesn’t look at any of the patient’s previous imaging results, changes in imaging over time contain a wealth of information. In the future the team aims to create methods that can effectively utilize a patient’s full imaging history.
    In a similar fashion, the team notes that the model could be further improved by utilizing “tomosynthesis,” an X-ray technique for screening asymptomatic cancer patients. Beyond improving accuracy, additional research is required to determine how to adapt image-based risk models to different mammography devices with limited data. 
    “We know MRI can catch cancers earlier than mammography, and that earlier detection improves patient outcomes,” says Yala. “But for patients at low risk of cancer, the risk of false-positives can outweigh the benefits. With improved risk models, we can design more nuanced risk-screening guidelines that offer more sensitive screening, like MRI, to patients who will develop cancer, to get better outcomes while reducing unnecessary screening and over-treatment for the rest.” 
    “We’re both excited and humbled to ask the question if this AI system will work for African-American populations,” says Judy Gichoya, MD, MS and assistant professor of interventional radiology and informatics at Emory University, who was not involved with the work. “We’re extensively studying this question, and how to detect failure.” 
    Yala wrote the paper on Mirai alongside MIT research specialist Peter G. Mikhael, radiologist Fredrik Strand of Karolinska University Hospital, Gigin Lin of Chang Gung Memorial Hospital, Associate Professor Kevin Smith of KTH Royal Institute of Technology, Professor Yung-Liang Wan of Chang Gung University, Leslie Lamb of MGH, Kevin Hughes of MGH, senior author and Harvard Medical School Professor Constance Lehman of MGH, and senior author and MIT Professor Regina Barzilay. 
    The work was supported by grants from Susan G Komen, Breast Cancer Research Foundation, Quanta Computing, and the MIT Jameel Clinic. It was also supported by Chang Gung Medical Foundation Grant, and by Stockholm Läns Landsting HMT Grant.   More