More stories

  • in

    Fast-tracking fusion energy’s arrival with AI and accessibility

    As the impacts of climate change continue to grow, so does interest in fusion’s potential as a clean energy source. While fusion reactions have been studied in laboratories since the 1930s, there are still many critical questions scientists must answer to make fusion power a reality, and time is of the essence. As part of their strategy to accelerate fusion energy’s arrival and reach carbon neutrality by 2050, the U.S. Department of Energy (DoE) has announced new funding for a project led by researchers at MIT’s Plasma Science and Fusion Center (PSFC) and four collaborating institutions.

    Cristina Rea, a research scientist and group leader at the PSFC, will serve as the primary investigator for the newly funded three-year collaboration to pilot the integration of fusion data into a system that can be read by AI-powered tools. The PSFC, together with scientists from William & Mary, the University of Wisconsin at Madison, Auburn University, and the nonprofit HDF Group, plan to create a holistic fusion data platform, the elements of which could offer unprecedented access for researchers, especially underrepresented students. The project aims to encourage diverse participation in fusion and data science, both in academia and the workforce, through outreach programs led by the group’s co-investigators, of whom four out of five are women. 

    The DoE’s award, part of a $29 million funding package for seven projects across 19 institutions, will support the group’s efforts to distribute data produced by fusion devices like the PSFC’s Alcator C-Mod, a donut-shaped “tokamak” that utilized powerful magnets to control and confine fusion reactions. Alcator C-Mod operated from 1991 to 2016 and its data are still being studied, thanks in part to the PSFC’s commitment to the free exchange of knowledge.

    Currently, there are nearly 50 public experimental magnetic confinement-type fusion devices; however, both historical and current data from these devices can be difficult to access. Some fusion databases require signing user agreements, and not all data are catalogued and organized the same way. Moreover, it can be difficult to leverage machine learning, a class of AI tools, for data analysis and to enable scientific discovery without time-consuming data reorganization. The result is fewer scientists working on fusion, greater barriers to discovery, and a bottleneck in harnessing AI to accelerate progress.

    The project’s proposed data platform addresses technical barriers by being FAIR — Findable, Interoperable, Accessible, Reusable — and by adhering to UNESCO’s Open Science (OS) recommendations to improve the transparency and inclusivity of science; all of the researchers’ deliverables will adhere to FAIR and OS principles, as required by the DoE. The platform’s databases will be built using MDSplusML, an upgraded version of the MDSplus open-source software developed by PSFC researchers in the 1980s to catalogue the results of Alcator C-Mod’s experiments. Today, nearly 40 fusion research institutes use MDSplus to store and provide external access to their fusion data. The release of MDSplusML aims to continue that legacy of open collaboration.

    The researchers intend to address barriers to participation for women and disadvantaged groups not only by improving general access to fusion data, but also through a subsidized summer school that will focus on topics at the intersection of fusion and machine learning, which will be held at William & Mary for the next three years.

    Of the importance of their research, Rea says, “This project is about responding to the fusion community’s needs and setting ourselves up for success. Scientific advancements in fusion are enabled via multidisciplinary collaboration and cross-pollination, so accessibility is absolutely essential. I think we all understand now that diverse communities have more diverse ideas, and they allow faster problem-solving.”

    The collaboration’s work also aligns with vital areas of research identified in the International Atomic Energy Agency’s “AI for Fusion” Coordinated Research Project (CRP). Rea was selected as the technical coordinator for the IAEA’s CRP emphasizing community engagement and knowledge access to accelerate fusion research and development. In a letter of support written for the group’s proposed project, the IAEA stated that, “the work [the researchers] will carry out […] will be beneficial not only to our CRP but also to the international fusion community in large.”

    PSFC Director and Hitachi America Professor of Engineering Dennis Whyte adds, “I am thrilled to see PSFC and our collaborators be at the forefront of applying new AI tools while simultaneously encouraging and enabling extraction of critical data from our experiments.”

    “Having the opportunity to lead such an important project is extremely meaningful, and I feel a responsibility to show that women are leaders in STEM,” says Rea. “We have an incredible team, strongly motivated to improve our fusion ecosystem and to contribute to making fusion energy a reality.” More

  • in

    A new dataset of Arctic images will spur artificial intelligence research

    As the U.S. Coast Guard (USCG) icebreaker Healy takes part in a voyage across the North Pole this summer, it is capturing images of the Arctic to further the study of this rapidly changing region. Lincoln Laboratory researchers installed a camera system aboard the Healy while at port in Seattle before it embarked on a three-month science mission on July 11. The resulting dataset, which will be one of the first of its kind, will be used to develop artificial intelligence tools that can analyze Arctic imagery.

    “This dataset not only can help mariners navigate more safely and operate more efficiently, but also help protect our nation by providing critical maritime domain awareness and an improved understanding of how AI analysis can be brought to bear in this challenging and unique environment,” says Jo Kurucar, a researcher in Lincoln Laboratory’s AI Software Architectures and Algorithms Group, which led this project.

    As the planet warms and sea ice melts, Arctic passages are opening up to more traffic, both to military vessels and ships conducting illegal fishing. These movements may pose national security challenges to the United States. The opening Arctic also leaves questions about how its climate, wildlife, and geography are changing.

    Today, very few imagery datasets of the Arctic exist to study these changes. Overhead images from satellites or aircraft can only provide limited information about the environment. An outward-looking camera attached to a ship can capture more details of the setting and different angles of objects, such as other ships, in the scene. These types of images can then be used to train AI computer-vision tools, which can help the USCG plan naval missions and automate analysis. According to Kurucar, USCG assets in the Arctic are spread thin and can benefit greatly from AI tools, which can act as a force multiplier.

    The Healy is the USCG’s largest and most technologically advanced icebreaker. Given its current mission, it was a fitting candidate to be equipped with a new sensor to gather this dataset. The laboratory research team collaborated with the USCG Research and Development Center to determine the sensor requirements. Together, they developed the Cold Region Imaging and Surveillance Platform (CRISP).

    “Lincoln Laboratory has an excellent relationship with the Coast Guard, especially with the Research and Development Center. Over a decade, we’ve established ties that enabled the deployment of the CRISP system,” says Amna Greaves, the CRISP project lead and an assistant leader in the AI Software Architectures and Algorithms Group. “We have strong ties not only because of the USCG veterans working at the laboratory and in our group, but also because our technology missions are complementary. Today it was deploying infrared sensing in the Arctic; tomorrow it could be operating quadruped robot dogs on a fast-response cutter.”

    The CRISP system comprises a long-wave infrared camera, manufactured by Teledyne FLIR (for forward-looking infrared), that is designed for harsh maritime environments. The camera can stabilize itself during rough seas and image in complete darkness, fog, and glare. It is paired with a GPS-enabled time-synchronized clock and a network video recorder to record both video and still imagery along with GPS-positional data.  

    The camera is mounted at the front of the ship’s fly bridge, and the electronics are housed in a ruggedized rack on the bridge. The system can be operated manually from the bridge or be placed into an autonomous surveillance mode, in which it slowly pans back and forth, recording 15 minutes of video every three hours and a still image once every 15 seconds.

    “The installation of the equipment was a unique and fun experience. As with any good project, our expectations going into the install did not meet reality,” says Michael Emily, the project’s IT systems administrator who traveled to Seattle for the install. Working with the ship’s crew, the laboratory team had to quickly adjust their route for running cables from the camera to the observation station after they discovered that the expected access points weren’t in fact accessible. “We had 100-foot cables made for this project just in case of this type of scenario, which was a good thing because we only had a few inches to spare,” Emily says.

    The CRISP project team plans to publicly release the dataset, anticipated to be about 4 terabytes in size, once the USCG science mission concludes in the fall.

    The goal in releasing the dataset is to enable the wider research community to develop better tools for those operating in the Arctic, especially as this region becomes more navigable. “Collecting and publishing the data allows for faster and greater progress than what we could accomplish on our own,” Kurucar adds. “It also enables the laboratory to engage in more advanced AI applications while others make more incremental advances using the dataset.”

    On top of providing the dataset, the laboratory team plans to provide a baseline object-detection model, from which others can make progress on their own models. More advanced AI applications planned for development are classifiers for specific objects in the scene and the ability to identify and track objects across images.

    Beyond assisting with USCG missions, this project could create an influential dataset for researchers looking to apply AI to data from the Arctic to help combat climate change, says Paul Metzger, who leads the AI Software Architectures and Algorithms Group.

    Metzger adds that the group was honored to be a part of this project and is excited to see the advances that come from applying AI to novel challenges facing the United States: “I’m extremely proud of how our group applies AI to the highest-priority challenges in our nation, from predicting outbreaks of Covid-19 and assisting the U.S. European Command in their support of Ukraine to now employing AI in the Arctic for maritime awareness.”

    Once the dataset is available, it will be free to download on the Lincoln Laboratory dataset website. More

  • in

    Celebrating open data

    The inaugural MIT Prize for Open Data, which included a $2,500 cash prize, was recently awarded to 10 individual and group research projects. Presented jointly by the School of Science and the MIT Libraries, the prize recognizes MIT-affiliated researchers who make their data openly accessible and reusable by others. The prize winners and 16 honorable mention recipients were honored at the Open Data @ MIT event held Oct. 28 at Hayden Library. 

    “By making data open, researchers create opportunities for novel uses of their data and for new insights to be gleaned,” says Chris Bourg, director of MIT Libraries. “Open data accelerates scholarly progress and discovery, advances equity in scholarly participation, and increases transparency, replicability, and trust in science.” 

    Recognizing shared values

    Spearheaded by Bourg and Rebecca Saxe, associate dean of the School of Science and John W. Jarve (1978) Professor of Brain and Cognitive Sciences, the MIT Prize for Open Data was launched to highlight the value of open data at MIT and to encourage the next generation of researchers. Nominations were solicited from across the Institute, with a focus on trainees: research technicians, undergraduate or graduate students, or postdocs.

    “By launching an MIT-wide prize and event, we aimed to create visibility for the scholars who create, use, and advocate for open data,” says Saxe. “Highlighting this research and creating opportunities for networking would also help open-data advocates across campus find each other.” 

    Recognizing researchers who share data was also one of the recommendations of the Ad Hoc Task Force on Open Access to MIT’s Research, which Bourg co-chaired with Hal Abelson, Class of 1922 Professor, Department of Electrical Engineering and Computer Science. An annual award was one of the strategies put forth by the task force to further the Institute’s mission to disseminate the fruits of its research and scholarship as widely as possible.

    Strong competition

    Winners and honorable mentions were chosen from more than 70 nominees, representing all five schools, the MIT Schwarzman College of Computing, and several research centers across MIT. A committee composed of faculty, staff, and a graduate student made the selections:

    Yunsie Chung, graduate student in the Department of Chemical Engineering, won for SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds. 
    Matthew Groh, graduate student, MIT Media Lab, accepted on behalf of the team behind the Fitzpatrick 17k dataset, an open dataset consisting of nearly 17,000 images of skin disease alongside skin disease and skin tone annotations. 
    Tom Pollard, research scientist at the Institute for Medical Engineering and Science, accepted on behalf of the PhysioNet team. This data-sharing platform enables thousands of clinical and machine-learning research studies each year and allows researchers to share sensitive resources that would not be possible through typical data sharing platforms. 
    Joseph Replogle, graduate student with the Whitehead Institute for Biomedical Research, was recognized for the Genome-wide Perturb-seq dataset, the largest publicly available, single-cell transcriptional dataset collected to date. 
    Pedro Reynolds-Cuéllar, graduate student with the MIT Media Lab/Art, Culture, and Technology, and Diana Duarte, co-founder at Diversa, won for Retos, an open-data platform for detailed documentation and sharing of local innovations from under-resourced settings. 
    Maanas Sharma, an undergraduate student, led States of Emergency, a nationwide project analyzing and grading the responses of prison systems to Covid-19 using data scraped from public databases and manually collected data. 
    Djuna von Maydell, graduate student in the Department of Brain and Cognitive Sciences, created the first publicly available dataset of single-cell gene expression from postmortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene. 
    Raechel Walker, graduate researcher in the MIT Media Lab, and her collaborators created a Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program in Cambridge, Massachusetts. Students learned how to use data science to recognize, mitigate, and advocate for people who are disproportionately impacted by systemic inequality. 
    Suyeol Yun, graduate student in the Department of Political Science, was recognized for DeepWTO, a project creating open data for use in legal natural language processing research using cases from the World Trade Organization. 
    Jonathan Zheng, graduate student in the Department of Chemical Engineering, won for an open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution.
    A full list of winners and honorable mentions is available on the Open Data @ MIT website.

    A campus-wide celebration

    Awards were presented at a celebratory event held in the Nexus in Hayden Library during International Open Access Week. School of Science Dean Nergis Mavalvala kicked off the program by describing the long and proud history of open scholarship at MIT, citing the Institute-wide faculty open access policy and the launch of the open-source digital repository DSpace. “When I was a graduate student, we were trying to figure out how to share our theses during the days of the nascent internet,” she said, “With DSpace, MIT was figuring it out for us.” 

    The centerpiece of the program was a series of five-minute presentations from the prize winners on their research. Presenters detailed the ways they created, used, or advocated for open data, and the value that openness brings to their respective fields. Winner Djuna von Maydell, a graduate student in Professor Li-Huei Tsai’s lab who studies the genetic causes of neurodegeneration, underscored why it is important to share data, particularly data obtained from postmortem human brains. 

    “This is data generated from human brains, so every data point stems from a living, breathing human being, who presumably made this donation in the hope that we would use it to advance knowledge and uncover truth,” von Maydell said. “To maximize the probability of that happening, we have to make it available to the scientific community.” 

    MIT community members who would like to learn more about making their research data open can consult MIT Libraries’ Data Services team.  More

  • in

    Researchers release open-source photorealistic simulator for autonomous driving

    Hyper-realistic virtual worlds have been heralded as the best driving schools for autonomous vehicles (AVs), since they’ve proven fruitful test beds for safely trying out dangerous driving scenarios. Tesla, Waymo, and other self-driving companies all rely heavily on data to enable expensive and proprietary photorealistic simulators, since testing and gathering nuanced I-almost-crashed data usually isn’t the most easy or desirable to recreate. 

    To that end, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being open-sourced to the public. 

    “Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says MIT Professor and CSAIL Director Daniela Rus, senior author on a paper about the research. 

    Play video

    VISTA is a data-driven, photorealistic simulator for autonomous driving. It can simulate not just live video but LiDAR data and event cameras, and also incorporate other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found below.

    VISTA 2.0 builds off of the team’s previous model, VISTA, and it’s fundamentally different from existing AV simulators since it’s data-driven — meaning it was built and photorealistically rendered from real-world data — thereby enabling direct transfer to reality. While the initial iteration supported only single car lane-following with one camera sensor, achieving high-fidelity data-driven simulation required rethinking the foundations of how different sensors and behavioral interactions can be synthesized. 

    Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massively interactive scenarios and intersections at scale. With much less data than previous models, the team was able to train autonomous vehicles that could be substantially more robust than those trained on large amounts of real-world data. 

    “This is a massive jump in capabilities of data-driven simulation for autonomous vehicles, as well as the increase of scale and ability to handle greater driving complexity,” says Alexander Amini, CSAIL PhD student and co-lead author on two new papers, together with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high dimensional 3D lidars with millions of points, irregularly timed event-based cameras, and even interactive and dynamic scenarios with other vehicles as well.” 

    The team was able to scale the complexity of the interactive driving tasks for things like overtaking, following, and negotiating, including multiagent scenarios in highly photorealistic environments. 

    Training AI models for autonomous vehicles involves hard-to-secure fodder of different varieties of edge cases and strange, dangerous scenarios, because most of our data (thankfully) is just run-of-the-mill, day-to-day driving. Logically, we can’t just crash into other cars just to teach a neural network how to not crash into other cars.

    Recently, there’s been a shift away from more classic, human-designed simulation environments to those built up from real-world data. The latter have immense photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question has emerged: Can the richness and complexity of all of the sensors that autonomous vehicles need, such as lidar and event-based cameras that are more sparse, accurately be synthesized? 

    Lidar sensor data is much harder to interpret in a data-driven world — you’re effectively trying to generate brand-new 3D point clouds with millions of points, only from sparse views of the world. To synthesize 3D lidar point clouds, the team used the data that the car collected, projected it into a 3D space coming from the lidar data, and then let a new virtual vehicle drive around locally from where that original vehicle was. Finally, they projected all of that sensory information back into the frame of view of this new virtual vehicle, with the help of neural networks. 

    Together with the simulation of event-based cameras, which operate at speeds greater than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but also doing so all in real time — making it possible to train neural nets offline, but also test online on the car in augmented reality setups for safe evaluations. “The question of if multisensor simulation at this scale of complexity and photorealism was possible in the realm of data-driven simulation was very much an open question,” says Amini. 

    With that, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and just drop in brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following, and more dicey scenarios like static and dynamic overtaking (seeing obstacles and moving around so you don’t collide). With the multi-agency, both real and simulated agents interact, and new agents can be dropped into the scene and controlled any which way. 

    Taking their full-scale car out into the “wild” — a.k.a. Devens, Massachusetts — the team saw  immediate transferability of results, with both failures and successes. They were also able to demonstrate the bodacious, magic word of self-driving car models: “robust.” They showed that AVs, trained entirely in VISTA 2.0, were so robust in the real world that they could handle that elusive tail of challenging failures. 

    Now, one guardrail humans rely on that can’t yet be simulated is human emotion. It’s the friendly wave, nod, or blinker switch of acknowledgement, which are the type of nuances the team wants to implement in future work. 

    “The central algorithm of this research is how we can take a dataset and build a completely synthetic world for learning and autonomy,” says Amini. “It’s a platform that I believe one day could extend in many different axes across robotics. Not just autonomous driving, but many areas that rely on vision and complex behaviors. We’re excited to release VISTA 2.0 to help enable the community to collect their own datasets and convert them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds, and then can directly transfer them to full-sized, real self-driving cars.” 

    Amini and Wang wrote the paper alongside Zhijian Liu, MIT CSAIL PhD student; Igor Gilitschenski, assistant professor in computer science at the University of Toronto; Wilko Schwarting, AI research scientist and MIT CSAIL PhD ’20; Song Han, associate professor at MIT’s Department of Electrical Engineering and Computer Science; Sertac Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Rus, MIT professor and CSAIL director. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia. 

    This work was supported by the National Science Foundation and Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of the Drive AGX Pegasus. More