More stories

  • in

    Power when the sun doesn’t shine

    In 2016, at the huge Houston energy conference CERAWeek, MIT materials scientist Yet-Ming Chiang found himself talking to a Tesla executive about a thorny problem: how to store the output of solar panels and wind turbines for long durations.        

    Chiang, the Kyocera Professor of Materials Science and Engineering, and Mateo Jaramillo, a vice president at Tesla, knew that utilities lacked a cost-effective way to store renewable energy to cover peak levels of demand and to bridge the gaps during windless and cloudy days. They also knew that the scarcity of raw materials used in conventional energy storage devices needed to be addressed if renewables were ever going to displace fossil fuels on the grid at scale.

    Energy storage technologies can facilitate access to renewable energy sources, boost the stability and reliability of power grids, and ultimately accelerate grid decarbonization. The global market for these systems — essentially large batteries — is expected to grow tremendously in the coming years. A study by the nonprofit LDES (Long Duration Energy Storage) Council pegs the long-duration energy storage market at between 80 and 140 terawatt-hours by 2040. “That’s a really big number,” Chiang notes. “Every 10 people on the planet will need access to the equivalent of one EV [electric vehicle] battery to support their energy needs.”

    In 2017, one year after they met in Houston, Chiang and Jaramillo joined forces to co-found Form Energy in Somerville, Massachusetts, with MIT graduates Marco Ferrara SM ’06, PhD ’08 and William Woodford PhD ’13, and energy storage veteran Ted Wiley.

    “There is a burgeoning market for electrical energy storage because we want to achieve decarbonization as fast and as cost-effectively as possible,” says Ferrara, Form’s senior vice president in charge of software and analytics.

    Investors agreed. Over the next six years, Form Energy would raise more than $800 million in venture capital.

    Bridging gaps

    The simplest battery consists of an anode, a cathode, and an electrolyte. During discharge, with the help of the electrolyte, electrons flow from the negative anode to the positive cathode. During charge, external voltage reverses the process. The anode becomes the positive terminal, the cathode becomes the negative terminal, and electrons move back to where they started. Materials used for the anode, cathode, and electrolyte determine the battery’s weight, power, and cost “entitlement,” which is the total cost at the component level.

    During the 1980s and 1990s, the use of lithium revolutionized batteries, making them smaller, lighter, and able to hold a charge for longer. The storage devices Form Energy has devised are rechargeable batteries based on iron, which has several advantages over lithium. A big one is cost.

    Chiang once declared to the MIT Club of Northern California, “I love lithium-ion.” Two of the four MIT spinoffs Chiang founded center on innovative lithium-ion batteries. But at hundreds of dollars a kilowatt-hour (kWh) and with a storage capacity typically measured in hours, lithium-ion was ill-suited for the use he now had in mind.

    The approach Chiang envisioned had to be cost-effective enough to boost the attractiveness of renewables. Making solar and wind energy reliable enough for millions of customers meant storing it long enough to fill the gaps created by extreme weather conditions, grid outages, and when there is a lull in the wind or a few days of clouds.

    To be competitive with legacy power plants, Chiang’s method had to come in at around $20 per kilowatt-hour of stored energy — one-tenth the cost of lithium-ion battery storage.

    But how to transition from expensive batteries that store and discharge over a couple of hours to some as-yet-undefined, cheap, longer-duration technology?

    “One big ball of iron”

    That’s where Ferrara comes in. Ferrara has a PhD in nuclear engineering from MIT and a PhD in electrical engineering and computer science from the University of L’Aquila in his native Italy. In 2017, as a research affiliate at the MIT Department of Materials Science and Engineering, he worked with Chiang to model the grid’s need to manage renewables’ intermittency.

    How intermittent depends on where you are. In the United States, for instance, there’s the windy Great Plains; the sun-drenched, relatively low-wind deserts of Arizona, New Mexico, and Nevada; and the often-cloudy Pacific Northwest.

    Ferrara, in collaboration with Professor Jessika Trancik of MIT’s Institute for Data, Systems, and Society and her MIT team, modeled four representative locations in the United States and concluded that energy storage with capacity costs below roughly $20/kWh and discharge durations of multiple days would allow a wind-solar mix to provide cost-competitive, firm electricity in resource-abundant locations.

    Now that they had a time frame, they turned their attention to materials. At the price point Form Energy was aiming for, lithium was out of the question. Chiang looked at plentiful and cheap sulfur. But a sulfur, sodium, water, and air battery had technical challenges.

    Thomas Edison once used iron as an electrode, and iron-air batteries were first studied in the 1960s. They were too heavy to make good transportation batteries. But this time, Chiang and team were looking at a battery that sat on the ground, so weight didn’t matter. Their priorities were cost and availability.

    “Iron is produced, mined, and processed on every continent,” Chiang says. “The Earth is one big ball of iron. We wouldn’t ever have to worry about even the most ambitious projections of how much storage that the world might use by mid-century.” If Form ever moves into the residential market, “it’ll be the safest battery you’ve ever parked at your house,” Chiang laughs. “Just iron, air, and water.”

    Scientists call it reversible rusting. While discharging, the battery takes in oxygen and converts iron to rust. Applying an electrical current converts the rusty pellets back to iron, and the battery “breathes out” oxygen as it charges. “In chemical terms, you have iron, and it becomes iron hydroxide,” Chiang says. “That means electrons were extracted. You get those electrons to go through the external circuit, and now you have a battery.”

    Form Energy’s battery modules are approximately the size of a washer-and-dryer unit. They are stacked in 40-foot containers, and several containers are electrically connected with power conversion systems to build storage plants that can cover several acres.

    The right place at the right time

    The modules don’t look or act like anything utilities have contracted for before.

    That’s one of Form’s key challenges. “There is not widespread knowledge of needing these new tools for decarbonized grids,” Ferrara says. “That’s not the way utilities have typically planned. They’re looking at all the tools in the toolkit that exist today, which may not contemplate a multi-day energy storage asset.”

    Form Energy’s customers are largely traditional power companies seeking to expand their portfolios of renewable electricity. Some are in the process of decommissioning coal plants and shifting to renewables.

    Ferrara’s research pinpointing the need for very low-cost multi-day storage provides key data for power suppliers seeking to determine the most cost-effective way to integrate more renewable energy.

    Using the same modeling techniques, Ferrara and team show potential customers how the technology fits in with their existing system, how it competes with other technologies, and how, in some cases, it can operate synergistically with other storage technologies.

    “They may need a portfolio of storage technologies to fully balance renewables on different timescales of intermittency,” he says. But other than the technology developed at Form, “there isn’t much out there, certainly not within the cost entitlement of what we’re bringing to market.”  Thanks to Chiang and Jaramillo’s chance encounter in Houston, Form has a several-year lead on other companies working to address this challenge. 

    In June 2023, Form Energy closed its biggest deal to date for a single project: Georgia Power’s order for a 15-megawatt/1,500-megawatt-hour system. That order brings Form’s total amount of energy storage under contracts with utility customers to 40 megawatts/4 gigawatt-hours. To meet the demand, Form is building a new commercial-scale battery manufacturing facility in West Virginia.

    The fact that Form Energy is creating jobs in an area that lost more than 10,000 steel jobs over the past decade is not lost on Chiang. “And these new jobs are in clean tech. It’s super exciting to me personally to be doing something that benefits communities outside of our traditional technology centers.

    “This is the right time for so many reasons,” Chiang says. He says he and his Form Energy co-founders feel “tremendous urgency to get these batteries out into the world.”

    This article appears in the Winter 2024 issue of Energy Futures, the magazine of the MIT Energy Initiative. More

  • in

    On the hunt for sustainable materials

    By the time she started high school, Avni Singhal had attended six different schools in a variety of settings, from a traditional public school to a self-paced program. The transitions opened her eyes to how widely educational environments can vary, and made her think about that impact on students.

    “Experiencing so many different types of educational systems exposed me to different ways of looking at things and how that shapes people’s worldviews,” says Singhal.

    Now a fourth-year PhD student in the Department of Materials Science and Engineering, Singhal is still thinking about increasing opportunities for her fellow students, while also pursuing her research. She devotes herself to both developing sustainable materials and improving the graduate experience in her department.

    She recently completed her two-year term as a student representative on the department’s graduate studies committee. In this role, she helped revamp the communication around the qualifying exams and introducing student input to the faculty search process.

    “It’s given me a lot of insight into how our department works,” says Singhal. “It’s a chance to get to know faculty, bring up issues that students experience, and work on changing things that we think could be improved.”

    At the same time, Singhal uses atomistic simulations to model material properties, with an eye toward sustainability. She is a part of the Learning Matter Lab, a group that merges data science tools with engineering and physics-based simulation to better design and understand materials. As part of a computational group, Singhal has worked on a range of projects in collaboration with other labs that are looking to combine computing with other disciplines. Some of this work is sponsored by the MIT Climate and Sustainability Consortium, which facilitates connections across MIT labs and industry.

    Joining the Learning Matter Lab was a step out of Singhal’s comfort zone. She arrived at MIT from the University of California at Berkeley with a joint degree in materials science and bioengineering, as well as a degree in electrical engineering and computer science.

    “I was generally interested in doing work on environment-related applications,” says Singhal. “I was pretty hesitant at first to switch entirely to computation because it’s a very different type of lifestyle of research than what I was doing before.”

    Singhal has taken the challenge in stride, contributing to projects including improving carbon capture molecules and developing new deconstructable, degradable plastics. Not only does Singhal have to understand the technical details of her own work, she also needs to understand the big picture and how to best wield the expertise of her collaborators.

    “When I came in, I was very wide-eyed, thinking computation can do everything because I had never done it before,” says Singhal. “It’s that curve where you know a little bit about something, and you think it can do everything. And then as you learn more, you learn where it can and can’t help us, where it can be valuable, and how to figure out in what part of a project it’s useful.”

    Singhal applies a similarly critical lens when thinking about graduate school as a whole. She notes that access to information and resources is often the main factor determining who enters selective educational programs, and that such access becomes increasingly limited at the graduate level.

    “I realized just how much applying is a function of knowing how to do it,” says Singhal, who co-organized and volunteers with the DMSE Application Assistance Program. The program matches prospective applicants with current students to give feedback on their application materials and provide insight into what it’s like attending MIT. Some of the first students Singhal mentored through the program are now also participants as well.

    “The further you get in your educational career, the more you realize how much assistance you got along the way to get where you are,” says Singhal. “That happens at every stage.”

    Looking toward the future, Singhal wants to continue to pursue research with a sustainability impact. She also wants to continue mentoring in some capacity but isn’t in a rush to figure out exactly what that will look like.

    “Grad school doesn’t mean I have to do one thing. I can stay open to all the possibilities of what comes next.”  More

  • in

    Learning the language of molecules to predict their properties

    Discovering new materials and drugs typically involves a manual, trial-and-error process that can take decades and cost millions of dollars. To streamline this process, scientists often use machine learning to predict molecular properties and narrow down the molecules they need to synthesize and test in the lab.

    Researchers from MIT and the MIT-Watson AI Lab have developed a new, unified framework that can simultaneously predict molecular properties and generate new molecules much more efficiently than these popular deep-learning approaches.

    To teach a machine-learning model to predict a molecule’s biological or mechanical properties, researchers must show it millions of labeled molecular structures — a process known as training. Due to the expense of discovering molecules and the challenges of hand-labeling millions of structures, large training datasets are often hard to come by, which limits the effectiveness of machine-learning approaches.

    By contrast, the system created by the MIT researchers can effectively predict molecular properties using only a small amount of data. Their system has an underlying understanding of the rules that dictate how building blocks combine to produce valid molecules. These rules capture the similarities between molecular structures, which helps the system generate new molecules and predict their properties in a data-efficient manner.

    This method outperformed other machine-learning approaches on both small and large datasets, and was able to accurately predict molecular properties and generate viable molecules when given a dataset with fewer than 100 samples.

    “Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments,” says lead author Minghao Guo, a computer science and electrical engineering (EECS) graduate student.

    Guo’s co-authors include MIT-IBM Watson AI Lab research staff members Veronika Thost, Payel Das, and Jie Chen; recent MIT graduates Samuel Song ’23 and Adithya Balachandran ’23; and senior author Wojciech Matusik, a professor of electrical engineering and computer science and a member of the MIT-IBM Watson AI Lab, who leads the Computational Design and Fabrication Group within the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference for Machine Learning.

    Learning the language of molecules

    To achieve the best results with machine-learning models, scientists need training datasets with millions of molecules that have similar properties to those they hope to discover. In reality, these domain-specific datasets are usually very small. So, researchers use models that have been pretrained on large datasets of general molecules, which they apply to a much smaller, targeted dataset. However, because these models haven’t acquired much domain-specific knowledge, they tend to perform poorly.

    The MIT team took a different approach. They created a machine-learning system that automatically learns the “language” of molecules — what is known as a molecular grammar — using only a small, domain-specific dataset. It uses this grammar to construct viable molecules and predict their properties.

    In language theory, one generates words, sentences, or paragraphs based on a set of grammar rules. You can think of a molecular grammar the same way. It is a set of production rules that dictate how to generate molecules or polymers by combining atoms and substructures.

    Just like a language grammar, which can generate a plethora of sentences using the same rules, one molecular grammar can represent a vast number of molecules. Molecules with similar structures use the same grammar production rules, and the system learns to understand these similarities.

    Since structurally similar molecules often have similar properties, the system uses its underlying knowledge of molecular similarity to predict properties of new molecules more efficiently. 

    “Once we have this grammar as a representation for all the different molecules, we can use it to boost the process of property prediction,” Guo says.

    The system learns the production rules for a molecular grammar using reinforcement learning — a trial-and-error process where the model is rewarded for behavior that gets it closer to achieving a goal.

    But because there could be billions of ways to combine atoms and substructures, the process to learn grammar production rules would be too computationally expensive for anything but the tiniest dataset.

    The researchers decoupled the molecular grammar into two parts. The first part, called a metagrammar, is a general, widely applicable grammar they design manually and give the system at the outset. Then it only needs to learn a much smaller, molecule-specific grammar from the domain dataset. This hierarchical approach speeds up the learning process.

    Big results, small datasets

    In experiments, the researchers’ new system simultaneously generated viable molecules and polymers, and predicted their properties more accurately than several popular machine-learning approaches, even when the domain-specific datasets had only a few hundred samples. Some other methods also required a costly pretraining step that the new system avoids.

    The technique was especially effective at predicting physical properties of polymers, such as the glass transition temperature, which is the temperature required for a material to transition from solid to liquid. Obtaining this information manually is often extremely costly because the experiments require extremely high temperatures and pressures.

    To push their approach further, the researchers cut one training set down by more than half — to just 94 samples. Their model still achieved results that were on par with methods trained using the entire dataset.

    “This grammar-based representation is very powerful. And because the grammar itself is a very general representation, it can be deployed to different kinds of graph-form data. We are trying to identify other applications beyond chemistry or material science,” Guo says.

    In the future, they also want to extend their current molecular grammar to include the 3D geometry of molecules and polymers, which is key to understanding the interactions between polymer chains. They are also developing an interface that would show a user the learned grammar production rules and solicit feedback to correct rules that may be wrong, boosting the accuracy of the system.

    This work is funded, in part, by the MIT-IBM Watson AI Lab and its member company, Evonik. More

  • in

    Is it topological? A new materials database has the answer

    What will it take to make our electronics smarter, faster, and more resilient? One idea is to build them from materials that are topological.

    Topology stems from a branch of mathematics that studies shapes that can be manipulated or deformed without losing certain core properties. A donut is a common example: If it were made of rubber, a donut could be twisted and squeezed into a completely new shape, such as a coffee mug, while retaining a key trait — namely, its center hole, which takes the form of the cup’s handle. The hole, in this case, is a topological trait, robust against certain deformations.

    In recent years, scientists have applied concepts of topology to the discovery of materials with similarly robust electronic properties. In 2007, researchers predicted the first electronic topological insulators — materials in which electrons that behave in ways that are “topologically protected,” or persistent in the face of certain disruptions.

    Since then, scientists have searched for more topological materials with the aim of building better, more robust electronic devices. Until recently, only a handful of such materials were identified, and were therefore assumed to be a rarity.

    Now researchers at MIT and elsewhere have discovered that, in fact, topological materials are everywhere, if you know how to look for them.

    In a paper published today in Science, the team, led by Nicolas Regnault of Princeton University and the École Normale Supérieure Paris, reports harnessing the power of multiple supercomputers to map the electronic structure of more than 96,000 natural and synthetic crystalline materials. They applied sophisticated filters to determine whether and what kind of topological traits exist in each structure.

    Overall, they found that 90 percent of all known crystalline structures contain at least one topological property, and more than 50 percent of all naturally occurring materials exhibit some sort of topological behavior.

    “We found there’s a ubiquity — topology is everywhere,” says Benjamin Wieder, the study’s co-lead, and a postdoc in MIT’s Department of Physics.

    The team has compiled the newly identified materials into a new, freely accessible Topological Materials Database resembling a periodic table of topology. With this new library, scientists can quickly search materials of interest for any topological properties they might hold, and harness them to build ultra-low-power transistors, new magnetic memory storage, and other devices with robust electronic properties.

    The paper includes co-lead author Maia Vergniory of the Donostia International Physics Center, Luis Elcoro of the University of Basque Country, Stuart Parkin and Claudia Felser of the Max Planck Institute, and Andrei Bernevig of Princeton University.

    Beyond intuition

    The new study was motivated by a desire to speed up the traditional search for topological materials.

    “The way the original materials were found was through chemical intuition,” Wieder says. “That approach had a lot of early successes. But as we theoretically predicted more kinds of topological phases, it seemed intuition wasn’t getting us very far.”

    Wieder and his colleagues instead utilized an efficient and systematic method to root out signs of topology, or robust electronic behavior, in all known crystalline structures, also known as inorganic solid-state materials.

    For their study, the researchers looked to the Inorganic Crystal Structure Database, or ICSD, a repository into which researchers enter the atomic and chemical structures of crystalline materials that they have studied. The database includes materials found in nature, as well as those that have been synthesized and manipulated in the lab. The ICSD is currently the largest materials database in the world, containing over 193,000 crystals whose structures have been mapped and characterized.

    The team downloaded the entire ICSD, and after performing some data cleaning to weed out structures with corrupted files or incomplete data, the researchers were left with just over 96,000 processable structures. For each of these structures, they performed a set of calculations based on fundamental knowledge of the relation between chemical constituents, to produce a map of the material’s electronic structure, also known as the electron band structure.

    The team was able to efficiently carry out the complicated calculations for each structure using multiple supercomputers, which they then employed to perform a second set of operations, this time to screen for various known topological phases, or persistent electrical behavior in each crystal material.

    “We’re looking for signatures in the electronic structure in which certain robust phenomena should occur in this material,” explains Wieder, whose previous work involved refining and expanding the screening technique, known as topological quantum chemistry.

    From their high-throughput analysis, the team quickly discovered a surprisingly large number of materials that are naturally topological, without any experimental manipulation, as well as materials that can be manipulated, for instance with light or chemical doping, to exhibit some sort of robust electronic behavior. They also discovered a handful of materials that contained more than one topological state when exposed to certain conditions.

    “Topological phases of matter in 3D solid-state materials have been proposed as venues for observing and manipulating exotic effects, including the interconversion of electrical current and electron spin, the tabletop simulation of exotic theories from high-energy physics, and even, under the right conditions, the storage and manipulation of quantum information,” Wieder notes. 

    For experimentalists who are studying such effects, Wieder says the team’s new database now reveals a menagerie of new materials to explore.

    This research was funded, in part, by the U.S. Department of Energy, the National Science Foundation, and the Office of Naval Research. More

  • in

    Generating new molecules with graph grammar

    Chemical engineers and materials scientists are constantly looking for the next revolutionary material, chemical, and drug. The rise of machine-learning approaches is expediting the discovery process, which could otherwise take years. “Ideally, the goal is to train a machine-learning model on a few existing chemical samples and then allow it to produce as many manufacturable molecules of the same class as possible, with predictable physical properties,” says Wojciech Matusik, professor of electrical engineering and computer science at MIT. “If you have all these components, you can build new molecules with optimal properties, and you also know how to synthesize them. That’s the overall vision that people in that space want to achieve”

    However, current techniques, mainly deep learning, require extensive datasets for training models, and many class-specific chemical datasets contain a handful of example compounds, limiting their ability to generalize and generate physical molecules that could be created in the real world.

    Now, a new paper from researchers at MIT and IBM tackles this problem using a generative graph model to build new synthesizable molecules within the same chemical class as their training data. To do this, they treat the formation of atoms and chemical bonds as a graph and develop a graph grammar — a linguistics analogy of systems and structures for word ordering — that contains a sequence of rules for building molecules, such as monomers and polymers. Using the grammar and production rules that were inferred from the training set, the model can not only reverse engineer its examples, but can create new compounds in a systematic and data-efficient way. “We basically built a language for creating molecules,” says Matusik “This grammar essentially is the generative model.”

    Matusik’s co-authors include MIT graduate students Minghao Guo, who is the lead author, and Beichen Li as well as Veronika Thost, Payal Das, and Jie Chen, research staff members with IBM Research. Matusik, Thost, and Chen are affiliated with the MIT-IBM Watson AI Lab. Their method, which they’ve called data-efficient graph grammar (DEG), will be presented at the International Conference on Learning Representations.

    “We want to use this grammar representation for monomer and polymer generation, because this grammar is explainable and expressive,” says Guo. “With only a few number of the production rules, we can generate many kinds of structures.”

    A molecular structure can be thought of as a symbolic representation in a graph — a string of atoms (nodes) joined together by chemical bonds (edges). In this method, the researchers allow the model to take the chemical structure and collapse a substructure of the molecule down to one node; this may be two atoms connected by a bond, a short sequence of bonded atoms, or a ring of atoms. This is done repeatedly, creating the production rules as it goes, until a single node remains. The rules and grammar then could be applied in the reverse order to recreate the training set from scratch or combined in different combinations to produce new molecules of the same chemical class.

    “Existing graph generation methods would produce one node or one edge sequentially at a time, but we are looking at higher-level structures and, specifically, exploiting chemistry knowledge, so that we don’t treat the individual atoms and bonds as the unit. This simplifies the generation process and also makes it more data-efficient to learn,” says Chen.

    Further, the researchers optimized the technique so that the bottom-up grammar was relatively simple and straightforward, such that it fabricated molecules that could be made.

    “If we switch the order of applying these production rules, we would get another molecule; what’s more, we can enumerate all the possibilities and generate tons of them,” says Chen. “Some of these molecules are valid and some of them not, so the learning of the grammar itself is actually to figure out a minimal collection of production rules, such that the percentage of molecules that can actually be synthesized is maximized.” While the researchers concentrated on three training sets of less than 33 samples each — acrylates, chain extenders, and isocyanates — they note that the process could be applied to any chemical class.

    To see how their method performed, the researchers tested DEG against other state-of-the-art models and techniques, looking at percentages of chemically valid and unique molecules, diversity of those created, success rate of retrosynthesis, and percentage of molecules belonging to the training data’s monomer class.

    “We clearly show that, for the synthesizability and membership, our algorithm outperforms all the existing methods by a very large margin, while it’s comparable for some other widely-used metrics,” says Guo. Further, “what is amazing about our algorithm is that we only need about 0.15 percent of the original dataset to achieve very similar results compared to state-of-the-art approaches that train on tens of thousands of samples. Our algorithm can specifically handle the problem of data sparsity.”

    In the immediate future, the team plans to address scaling up this grammar learning process to be able to generate large graphs, as well as produce and identify chemicals with desired properties.

    Down the road, the researchers see many applications for the DEG method, as it’s adaptable beyond generating new chemical structures, the team points out. A graph is a very flexible representation, and many entities can be symbolized in this form — robots, vehicles, buildings, and electronic circuits, for example. “Essentially, our goal is to build up our grammar, so that our graphic representation can be widely used across many different domains,” says Guo, as “DEG can automate the design of novel entities and structures,” says Chen.

    This research was supported, in part, by the MIT-IBM Watson AI Lab and Evonik. More

  • in

    Computational modeling guides development of new materials

    Metal-organic frameworks, a class of materials with porous molecular structures, have a variety of possible applications, such as capturing harmful gases and catalyzing chemical reactions. Made of metal atoms linked by organic molecules, they can be configured in hundreds of thousands of different ways.

    To help researchers sift through all of the possible metal-organic framework (MOF) structures and help identify the ones that would be most practical for a particular application, a team of MIT computational chemists has developed a model that can analyze the features of a MOF structure and predict if it will be stable enough to be useful.

    The researchers hope that these computational predictions will help cut the development time of new MOFs.

    “This will allow researchers to test the promise of specific materials before they go through the trouble of synthesizing them,” says Heather Kulik, an associate professor of chemical engineering at MIT.

    The MIT team is now working to develop MOFs that could be used to capture methane gas and convert it to useful compounds such as fuels.

    The researchers described their new model in two papers, one in the Journal of the American Chemical Society and one in Scientific Data. Graduate students Aditya Nandy and Gianmarco Terrones are the lead authors of the Scientific Data paper, and Nandy is also the lead author of the JACS paper. Kulik is the senior author of both papers.

    Modeling structure

    MOFs consist of metal atoms joined by organic molecules called linkers to create a rigid, cage-like structure. The materials also have many pores, which makes them useful for catalyzing reactions involving gases but can also make them less structurally stable.

    “The limitation in seeing MOFs realized at industrial scale is that although we can control their properties by controlling where each atom is in the structure, they’re not necessarily that stable, as far as materials go,” Kulik says. “They’re very porous and they can degrade under realistic conditions that we need for catalysis.”

    Scientists have been working on designing MOFs for more than 20 years, and thousands of possible structures have been published. A centralized repository contains about 10,000 of these structures but is not linked to any of the published findings on the properties of those structures.

    Kulik, who specializes in using computational modeling to discover structure-property relationships of materials, wanted to take a more systematic approach to analyzing and classifying the properties of MOFs.

    “When people make these now, it’s mostly trial and error. The MOF dataset is really promising because there are so many people excited about MOFs, so there’s so much to learn from what everyone’s been working on, but at the same time, it’s very noisy and it’s not systematic the way it’s reported,” she says.

    Kulik and her colleagues set out to analyze published reports of MOF structures and properties using a natural-language-processing algorithm. Using this algorithm, they scoured nearly 4,000 published papers, extracting information on the temperature at which a given MOF would break down. They also pulled out data on whether particular MOFs can withstand the conditions needed to remove solvents used to synthesize them and make sure they become porous.

    Once the researchers had this information, they used it to train two neural networks to predict MOFs’ thermal stability and stability during solvent removal, based on the molecules’ structure.

    “Before you start working with a material and thinking about scaling it up for different applications, you want to know will it hold up, or is it going to degrade in the conditions I would want to use it in?” Kulik says. “Our goal was to get better at predicting what makes a stable MOF.”

    Better stability

    Using the model, the researchers were able to identify certain features that influence stability. In general, simpler linkers with fewer chemical groups attached to them are more stable. Pore size is also important: Before the researchers did their analysis, it had been thought that MOFs with larger pores might be too unstable. However, the MIT team found that large-pore MOFs can be stable if other aspects of their structure counteract the large pore size.

    “Since MOFs have so many things that can vary at the same time, such as the metal, the linkers, the connectivity, and the pore size, it is difficult to nail down what governs stability across different families of MOFs,” Nandy says. “Our models enable researchers to make predictions on existing or new materials, many of which have yet to be made.”

    The researchers have made their data and models available online. Scientists interested in using the models can get recommendations for strategies to make an existing MOF more stable, and they can also add their own data and feedback on the predictions of the models.

    The MIT team is now using the model to try to identify MOFs that could be used to catalyze the conversion of methane gas to methanol, which could be used as fuel. Kulik also plans to use the model to create a new dataset of hypothetical MOFs that haven’t been built before but are predicted to have high stability. Researchers could then screen this dataset for a variety of properties.

    “People are interested in MOFs for things like quantum sensing and quantum computing, all sorts of different applications where you need metals distributed in this atomically precise way,” Kulik says.

    The research was funded by DARPA, the U.S. Office of Naval Research, the U.S. Department of Energy, a National Science Foundation Graduate Research Fellowship, a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, and an AAAS Marion Milligan Mason Award. More

  • in

    Using adversarial attacks to refine molecular energy predictions

    Neural networks (NNs) are increasingly being used to predict new materials, the rate and yield of chemical reactions, and drug-target interactions, among others. For these applications, they are orders of magnitude faster than traditional methods such as quantum mechanical simulations. 

    The price for this agility, however, is reliability. Because machine learning models only interpolate, they may fail when used outside the domain of training data.

    But the part that worried Rafael Gómez-Bombarelli, the Jeffrey Cheah Career Development Professor in the MIT Department of Materials Science and Engineering, and graduate students Daniel Schwalbe-Koda and Aik Rui Tan was that establishing the limits of these machine learning (ML) models is tedious and labor-intensive. 

    This is particularly true for predicting ‘‘potential energy surfaces” (PES), or the map of a molecule’s energy in all its configurations. These surfaces encode the complexities of a molecule into flatlands, valleys, peaks, troughs, and ravines. The most stable configurations of a system are usually in the deep pits — quantum mechanical chasms from which atoms and molecules typically do not escape. 

    In a recent Nature Communications paper, the research team presented a way to demarcate the “safe zone” of a neural network by using “adversarial attacks.” Adversarial attacks have been studied for other classes of problems, such as image classification, but this is the first time that they are being used to sample molecular geometries in a PES. 

    “People have been using uncertainty for active learning for years in ML potentials. The key difference is that they need to run the full ML simulation and evaluate if the NN was reliable, and if it wasn’t, acquire more data, retrain and re-simulate. Meaning that it takes a long time to nail down the right model, and one has to run the ML simulation many times” explains Gómez-Bombarelli.

    The Gómez-Bombarelli lab at MIT works on a synergistic synthesis of first-principles simulation and machine learning that greatly speeds up this process. The actual simulations are run only for a small fraction of these molecules, and all those data are fed into a neural network that learns how to predict the same properties for the rest of the molecules. They have successfully demonstrated these methods for a growing class of novel materials that includes catalysts for producing hydrogen from water, cheaper polymer electrolytes for electric vehicles,  zeolites for molecular sieving, magnetic materials, and more. 

    The challenge, however, is that these neural networks are only as smart as the data they are trained on.  Considering the PES map, 99 percent of the data may fall into one pit, totally missing valleys that are of more interest. 

    Such wrong predictions can have disastrous consequences — think of a self-driving car that fails to identify a person crossing the street.

    One way to find out the uncertainty of a model is to run the same data through multiple versions of it. 

    For this project, the researchers had multiple neural networks predict the potential energy surface from the same data. Where the network is fairly sure of the prediction, the variation between the outputs of different networks is minimal and the surfaces largely converge. When the network is uncertain, the predictions of different models vary widely, producing a range of outputs, any of which could be the correct surface. 

    The spread in the predictions of a “committee of neural networks” is the “uncertainty” at that point. A good model should not just indicate the best prediction, but also indicates the uncertainty about each of these predictions. It’s like the neural network says “this property for material A will have a value of X and I’m highly confident about it.”

    This could have been an elegant solution but for the sheer scale of the combinatorial space. “Each simulation (which is ground feed for the neural network) may take from tens to thousands of CPU hours,” explains Schwalbe-Koda. For the results to be meaningful, multiple models must be run over a sufficient number of points in the PES, an extremely time-consuming process. 

    Instead, the new approach only samples data points from regions of low prediction confidence, corresponding to specific geometries of a molecule. These molecules are then stretched or deformed slightly so that the uncertainty of the neural network committee is maximized. Additional data are computed for these molecules through simulations and then added to the initial training pool. 

    The neural networks are trained again, and a new set of uncertainties are calculated. This process is repeated until the uncertainty associated with various points on the surface becomes well-defined and cannot be decreased any further. 

    Gómez-Bombarelli explains, “We aspire to have a model that is perfect in the regions we care about (i.e., the ones that the simulation will visit) without having had to run the full ML simulation, by making sure that we make it very good in high-likelihood regions where it isn’t.”

    The paper presents several examples of this approach, including predicting complex supramolecular interactions in zeolites. These materials are cavernous crystals that act as molecular sieves with high shape selectivity. They find applications in catalysis, gas separation, and ion exchange, among others.

    Because performing simulations of large zeolite structures is very costly, the researchers show how their method can provide significant savings in computational simulations. They used more than 15,000 examples to train a neural network to predict the potential energy surfaces for these systems. Despite the large cost required to generate the dataset, the final results are mediocre, with only around 80 percent of the neural network-based simulations being successful. To improve the performance of the model using traditional active learning methods, the researchers calculated an additional 5,000 data points, which improved the performance of the neural network potentials to 92 percent.

    However, when the adversarial approach is used to retrain the neural networks, the authors saw a performance jump to 97 percent using only 500 extra points. That’s a remarkable result, the researchers say, especially considering that each of these extra points takes hundreds of CPU hours. 

    This could be the most realistic method to probe the limits of models that researchers use to predict the behavior of materials and the progress of chemical reactions. More