More stories

  • in

    3 Questions: Marking the 10th anniversary of the Higgs boson discovery

    This July 4 marks 10 years since the discovery of the Higgs boson, the long-sought particle that imparts mass to all elementary particles. The elusive particle was the last missing piece in the Standard Model of particle physics, which is our most complete model of the universe.

    In early summer of 2012, signs of the Higgs particle were detected in the Large Hadron Collider (LHC), the world’s largest particle accelerator, which is operated by CERN, the European Organization for Nuclear Research. The LHC is engineered to smash together billions upon billions of protons for the chance at producing the Higgs boson and other particles that are predicted to have been created in the early universe.

    In analyzing the products of countless proton-on-proton collisions, scientists registered a Higgs-like signal in the accelerator’s two independent detectors, ATLAS and CMS (the Compact Muon Solenoid). Specifically, the teams observed signs that a new particle had been created and then decayed to two photons, two Z bosons or two W bosons, and that this new particle was likely the Higgs boson.

    The discovery was revealed within the CMS collaboration, including over 3,000 scientists, on June 15, and ATLAS and CMS announced their respective observations to the world on July 4. More than 50 MIT physicists and students contributed to the CMS experiment, including Christoph Paus, professor of physics, who was one of the experiment’s two lead investigators to organize the search for the Higgs boson.

    As the LHC prepares to start back up on July 5 with “Run 3,” MIT News spoke with Paus about what physicists have learned about the Higgs particle in the last 10 years, and what they hope to discover with this next deluge of particle data.

    Q: Looking back, what do you remember as the key moments leading up to the Higgs boson’s discovery?

    A: I remember that by the end of 2011, we had taken a significant amount of data, and there were some first hints that there could be something, but nothing that was conclusive enough. It was clear to everybody that we were entering the critical phase of a potential discovery. We still wanted to improve our searches, and so we decided, which I felt was one of the most important decisions we took, that we had to remove the bias — that is, remove our knowledge about where the signal could appear. Because it’s dangerous as a scientist to say, “I know the solution,” which can influence the result unconsciously. So, we made that decision together in the coordination group and said, we are going to get rid of this bias by doing what people refer to as a “blind” analysis. This allowed the analyzers to focus on the technical aspects, making sure everything was correct without having to worry about being influenced by what they saw.

    Then, of course, there had to be the moment where we unblind the data and really look to see, is the Higgs there or not. And about two weeks before the scheduled presentations on July 4 where we eventually announced the discovery, there was a meeting on June 15 to show the analysis with its results to the collaboration. The most significant analysis turned out to be the two-photon analysis. One of my students, Joshua Bendavid PhD ’13, was leading that analysis, and the night before the meeting, only he and another person on the team were allowed to unblind the data. They were working until 2 in the morning, when they finally pushed a button to see what it looks like. And they were the first in CMS to have that moment of seeing that [the Higgs boson] was there. Another student of mine who was working on this analysis, Mingming Yang PhD ’15, presented the results of that search to the Collaboration at CERN that following afternoon. It was a very exciting moment for all of us. The room was hot and filled with electricity.

    The scientific process of the discovery was very well-designed and executed, and I think it can serve as a blueprint for how people should do such searches.

    Q: What more have scientists learned of the Higgs boson since the particle’s detection?

    A: At the time of the discovery, something interesting happened I did not really expect. While we were always talking about the Higgs boson before, we became very careful once we saw that “narrow peak.” How could we be sure that it was the Higgs boson and not something else? It certainly looked like the Higgs boson, but our vision was quite blurry. It could have turned out in the following years that it was not the Higgs boson. But as we now know, with so much more data, everything is completely consistent with what the Higgs boson is predicted to look like, so we became comfortable with calling the narrow resonance not just a Higgs-like particle but rather simply the Higgs boson. And there were a few milestones that made sure this is really the Higgs as we know it.

    The initial discovery was based on Higgs bosons decaying to two photons, two Z bosons or two W bosons. That was only a small fraction of decays that the Higgs could undergo. There are many more. The amount of decays of the Higgs boson into a particular set of particles depends critically on their masses. This characteristic feature is essential to confirm that we are really dealing with the Higgs boson.

    What we found since then is that the Higgs boson does not only decay to bosons, but also to fermions, which is not obvious because bosons are force carrier particles while fermions are matter particles. The first new decay was the decay to tau leptons, the heavier sibling of the electron. The next step was the observation of the Higgs boson decaying to b quarks, the heaviest quark that the Higgs can decay to. The b quark is the heaviest sibling of the down quark, which is a building block of protons and neutrons and thus all atomic nuclei around us. These two fermions are part of the heaviest generation of fermions in the standard model. Only recently the Higgs boson was observed to decay to muons, the charge lepton of the second and thus lighter generation, at the expected rate. Also, the direct coupling to the heaviest  top quark was established, which spans together with the muons four orders of magnitudes in terms of their masses, and the Higgs coupling behaves as expected over this wide range.

    Q: As the Large Hadron Collider gears up for its new “Run 3,” what do you hope to discover next?

    One very interesting question that Run 3 might give us some first hints on is the self-coupling of the Higgs boson. As the Higgs couples to any massive particle, it can also couple to itself. It is unlikely that there is enough data to make a discovery, but first hints of this coupling would be very exciting to see, and this constitutes a fundamentally different test than what has been done so far.

    Another interesting aspect that more data will help to elucidate is the question of whether the Higgs boson might be a portal and decay to invisible particles that could be candidates for explaining the mystery of dark matter in the universe. This is not predicted in our standard model and thus would unveil the Higgs boson as an imposter.

    Of course, we want to double down on all the measurements we have made so far and see whether they continue to line up with our expectations.

    This is true also for the upcoming major upgrade of the LHC (runs starting in 2029) for what we refer to as the High Luminosity LHC (HL-LHC). Another factor of 10 more events will be accumulated during this program, which for the Higgs boson means we will be able to observe its self-coupling. For the far future, there are plans for a Future Circular Collider, which could ultimately measure the total decay width of the Higgs boson independent of its decay mode, which would be another important and very precise test whether the Higgs boson is an imposter.

    As any other good physicist, I hope though that we can find a crack in the armor of the Standard Model, which is so far holding up all too well. There are a number of very important observations, for example the nature of dark matter, that cannot be explained using the Standard Model. All of our future studies, from Run 3 starting on July 5 to the very in the future FCC, will give us access to entirely uncharted territory. New phenomena can pop up, and I like to be optimistic. More

  • in

    Exploring emerging topics in artificial intelligence policy

    Members of the public sector, private sector, and academia convened for the second AI Policy Forum Symposium last month to explore critical directions and questions posed by artificial intelligence in our economies and societies.

    The virtual event, hosted by the AI Policy Forum (AIPF) — an undertaking by the MIT Schwarzman College of Computing to bridge high-level principles of AI policy with the practices and trade-offs of governing — brought together an array of distinguished panelists to delve into four cross-cutting topics: law, auditing, health care, and mobility.

    In the last year there have been substantial changes in the regulatory and policy landscape around AI in several countries — most notably in Europe with the development of the European Union Artificial Intelligence Act, the first attempt by a major regulator to propose a law on artificial intelligence. In the United States, the National AI Initiative Act of 2020, which became law in January 2021, is providing a coordinated program across federal government to accelerate AI research and application for economic prosperity and security gains. Finally, China recently advanced several new regulations of its own.

    Each of these developments represents a different approach to legislating AI, but what makes a good AI law? And when should AI legislation be based on binding rules with penalties versus establishing voluntary guidelines?

    Jonathan Zittrain, professor of international law at Harvard Law School and director of the Berkman Klein Center for Internet and Society, says the self-regulatory approach taken during the expansion of the internet had its limitations with companies struggling to balance their interests with those of their industry and the public.

    “One lesson might be that actually having representative government take an active role early on is a good idea,” he says. “It’s just that they’re challenged by the fact that there appears to be two phases in this environment of regulation. One, too early to tell, and two, too late to do anything about it. In AI I think a lot of people would say we’re still in the ‘too early to tell’ stage but given that there’s no middle zone before it’s too late, it might still call for some regulation.”

    A theme that came up repeatedly throughout the first panel on AI laws — a conversation moderated by Dan Huttenlocher, dean of the MIT Schwarzman College of Computing and chair of the AI Policy Forum — was the notion of trust. “If you told me the truth consistently, I would say you are an honest person. If AI could provide something similar, something that I can say is consistent and is the same, then I would say it’s trusted AI,” says Bitange Ndemo, professor of entrepreneurship at the University of Nairobi and the former permanent secretary of Kenya’s Ministry of Information and Communication.

    Eva Kaili, vice president of the European Parliament, adds that “In Europe, whenever you use something, like any medication, you know that it has been checked. You know you can trust it. You know the controls are there. We have to achieve the same with AI.” Kalli further stresses that building trust in AI systems will not only lead to people using more applications in a safe manner, but that AI itself will reap benefits as greater amounts of data will be generated as a result.

    The rapidly increasing applicability of AI across fields has prompted the need to address both the opportunities and challenges of emerging technologies and the impact they have on social and ethical issues such as privacy, fairness, bias, transparency, and accountability. In health care, for example, new techniques in machine learning have shown enormous promise for improving quality and efficiency, but questions of equity, data access and privacy, safety and reliability, and immunology and global health surveillance remain at large.

    MIT’s Marzyeh Ghassemi, an assistant professor in the Department of Electrical Engineering and Computer Science and the Institute for Medical Engineering and Science, and David Sontag, an associate professor of electrical engineering and computer science, collaborated with Ziad Obermeyer, an associate professor of health policy and management at the University of California Berkeley School of Public Health, to organize AIPF Health Wide Reach, a series of sessions to discuss issues of data sharing and privacy in clinical AI. The organizers assembled experts devoted to AI, policy, and health from around the world with the goal of understanding what can be done to decrease barriers to access to high-quality health data to advance more innovative, robust, and inclusive research results while being respectful of patient privacy.

    Over the course of the series, members of the group presented on a topic of expertise and were tasked with proposing concrete policy approaches to the challenge discussed. Drawing on these wide-ranging conversations, participants unveiled their findings during the symposium, covering nonprofit and government success stories and limited access models; upside demonstrations; legal frameworks, regulation, and funding; technical approaches to privacy; and infrastructure and data sharing. The group then discussed some of their recommendations that are summarized in a report that will be released soon.

    One of the findings calls for the need to make more data available for research use. Recommendations that stem from this finding include updating regulations to promote data sharing to enable easier access to safe harbors such as the Health Insurance Portability and Accountability Act (HIPAA) has for de-identification, as well as expanding funding for private health institutions to curate datasets, amongst others. Another finding, to remove barriers to data for researchers, supports a recommendation to decrease obstacles to research and development on federally created health data. “If this is data that should be accessible because it’s funded by some federal entity, we should easily establish the steps that are going to be part of gaining access to that so that it’s a more inclusive and equitable set of research opportunities for all,” says Ghassemi. The group also recommends taking a careful look at the ethical principles that govern data sharing. While there are already many principles proposed around this, Ghassemi says that “obviously you can’t satisfy all levers or buttons at once, but we think that this is a trade-off that’s very important to think through intelligently.”

    In addition to law and health care, other facets of AI policy explored during the event included auditing and monitoring AI systems at scale, and the role AI plays in mobility and the range of technical, business, and policy challenges for autonomous vehicles in particular.

    The AI Policy Forum Symposium was an effort to bring together communities of practice with the shared aim of designing the next chapter of AI. In his closing remarks, Aleksander Madry, the Cadence Designs Systems Professor of Computing at MIT and faculty co-lead of the AI Policy Forum, emphasized the importance of collaboration and the need for different communities to communicate with each other in order to truly make an impact in the AI policy space.

    “The dream here is that we all can meet together — researchers, industry, policymakers, and other stakeholders — and really talk to each other, understand each other’s concerns, and think together about solutions,” Madry said. “This is the mission of the AI Policy Forum and this is what we want to enable.” More

  • in

    Robots play with play dough

    The inner child in many of us feels an overwhelming sense of joy when stumbling across a pile of the fluorescent, rubbery mixture of water, salt, and flour that put goo on the map: play dough. (Even if this happens rarely in adulthood.)

    While manipulating play dough is fun and easy for 2-year-olds, the shapeless sludge is hard for robots to handle. Machines have become increasingly reliable with rigid objects, but manipulating soft, deformable objects comes with a laundry list of technical challenges, and most importantly, as with most flexible structures, if you move one part, you’re likely affecting everything else. 

    Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stanford University recently let robots take their hand at playing with the modeling compound, but not for nostalgia’s sake. Their new system learns directly from visual inputs to let a robot with a two-fingered gripper see, simulate, and shape doughy objects. “RoboCraft” could reliably plan a robot’s behavior to pinch and release play dough to make various letters, including ones it had never seen. With just 10 minutes of data, the two-finger gripper rivaled human counterparts that teleoperated the machine — performing on-par, and at times even better, on the tested tasks. 

    “Modeling and manipulating objects with high degrees of freedom are essential capabilities for robots to learn how to enable complex industrial and household interaction tasks, like stuffing dumplings, rolling sushi, and making pottery,” says Yunzhu Li, CSAIL PhD student and author on a new paper about RoboCraft. “While there’s been recent advances in manipulating clothes and ropes, we found that objects with high plasticity, like dough or plasticine — despite ubiquity in those household and industrial settings — was a largely underexplored territory. With RoboCraft, we learn the dynamics models directly from high-dimensional sensory data, which offers a promising data-driven avenue for us to perform effective planning.” 

    Play video

    With undefined, smooth material, the whole structure needs to be accounted for before you can do any type of efficient and effective modeling and planning. By turning the images into graphs of little particles, coupled with algorithms, RoboCraft, using a graph neural network as the dynamics model, makes more accurate predictions about the material’s change of shapes. 

    Typically, researchers have used complex physics simulators to model and understand force and dynamics being applied to objects, but RoboCraft simply uses visual data. The inner-workings of the system relies on three parts to shape soft material into, say, an “R.” 

    The first part — perception — is all about learning to “see.” It uses cameras to collect raw, visual sensor data from the environment, which are then turned into little clouds of particles to represent the shapes. A graph-based neural network then uses said particle data to learn to “simulate” the object’s dynamics, or how it moves. Then, algorithms help plan the robot’s behavior so it learns to “shape” a blob of dough, armed with the training data from the many pinches. While the letters are a bit loose, they’re indubitably representative. 

    Besides cutesy shapes, the team is (actually) working on making dumplings from dough and a prepared filling. Right now, with just a two finger gripper, it’s a big ask. RoboCraft would need additional tools (a baker needs multiple tools to cook; so do robots) — a rolling pin, a stamp, and a mold. 

    A more far in the future domain the scientists envision is using RoboCraft for assistance with household tasks and chores, which could be of particular help to the elderly or those with limited mobility. To accomplish this, given the many obstructions that could take place, a much more adaptive representation of the dough or item would be needed, and as well as exploration into what class of models might be suitable to capture the underlying structural systems. 

    “RoboCraft essentially demonstrates that this predictive model can be learned in very data-efficient ways to plan motion. In the long run, we are thinking about using various tools to manipulate materials,” says Li. “If you think about dumpling or dough making, just one gripper wouldn’t be able to solve it. Helping the model understand and accomplish longer-horizon planning tasks, such as, how the dough will deform given the current tool, movements and actions, is a next step for future work.” 

    Li wrote the paper alongside Haochen Shi, Stanford master’s student; Huazhe Xu, Stanford postdoc; Zhiao Huang, PhD student at the University of California at San Diego; and Jiajun Wu, assistant professor at Stanford. They will present the research at the Robotics: Science and Systems conference in New York City. The work is in part supported by the Stanford Institute for Human-Centered AI (HAI), the Samsung Global Research Outreach (GRO) Program, the Toyota Research Institute (TRI), and Amazon, Autodesk, Salesforce, and Bosch. More

  • in

    Mining social media data for social good

    For Erin Walk, who has loved school since she was a little girl, pursuing a graduate degree always seemed like a given. As a mechanical engineering major at Harvard University with a minor in government, she figured that going to graduate school in engineering would be the next logical step. However, during her senior year, a class on the “Technology of War” changed her trajectory, sparking her interest in technology and policy.

    “[Warfare] seems like a very dark reason for this interest to blossom … but I was so interested in how these technological developments including cyberwar had such a large impact on the entire course of world history,” Walk says. The class took a starkly different perspective from her engineering classes, which often focused on how a revolutionary technology was built. Instead, Walk was challenged to think about “the implications of what this [technology] could do.” 

    Now, Walk is studying the intersection between data science, policy, and technology as a graduate student in the Social and Engineering Systems program (SES), part of the Institute for Data, Systems, and Society (IDSS). Her research has demonstrated the value and bias inherent in social media data, with a focus on how to mine social media data to better understand the conflict in Syria. 

    Using data for social good

    With a newfound interest in policy developing just as college was drawing to a close, Walk says, “I realized I did not know what I wanted to do research on for five whole years, and the idea of getting a PhD started to feel very daunting.” Instead, she decided to work for a web security company in Washington, as a member of the policy team. “Being in school can be this fast process where you feel like you are being pushed through a tube and all of a sudden you come out the other end. Work gave me a lot more mental time to think about what I enjoyed and what was important to me,” she says.

    Walk served as a liaison between thinktanks and nonprofits in Washington that worked to provide services and encourage policies that enable equitable technology distribution. The role helped her identify what held her interest: corporate social responsibility projects that addressed access to technology, in this case, by donating free web security services to nonprofit organizations and to election websites. She became curious about how access to data and to the Internet can be beneficial for education, and how such access can be leveraged to establish connections to populations that are otherwise hard-to-reach, such as refugees, marginalized groups, or activist communities that rely on anonymity for safety.

    Walk knew she wanted to pursue this kind of tech activism work, but she also recognized that staying with a company driven by profits would not be the best avenue to fulfill her personal career aspirations. Graduate school seemed like the best option to both learn the data science skills she needed, and pursue full-time research focusing on technology and policy.

    Finding new ways to tap social media data

    With these goals in mind, Walk joined the SES graduate program in IDSS. “This program for me had the most balance,” she says. “I have a lot of leeway to explore whatever kind of research I want, provided it has an impact component and a data component.”

    During her first year, she intended to explore a variety of research advisors to find the right fit. Instead, during her first few months on MIT’s campus, she sat down for an introductory meeting with her now-research advisor, Fotini Christia, the Ford International Professor in the Social Sciences, and walked out with a project. Her new task: analyzing “how different social media sources are used differently by groups within the conflict, and how those different narratives present themselves online. So much social science research tends to use just Twitter, or just Facebook, to draw conclusions. It is important to understand how your data set might be skewed,” she says.

    Walk’s current research focuses on another novel way to tap social media. Scholars traditionally use geographic data to understand population movements, but her research has demonstrated that social media can also be a ripe data source. She is analyzing how social media discussions differ in places with and without refugees, with a particular focus on places where refugees have returned to their homelands, including Syria.

    “Now that the [Syrian] civil war has been going on for so long, there is a lot of discussion on how to bring refugees back in [to their homelands],” Walk says. Her research adds to this discussion by using social media sources to understand and predict the factors that encourage refugees to return, such as economic opportunities and decreases in local violence. Her goal is to harness some of the social media data to provide policymakers and nonprofits with information on how to address repatriation and related issues.

    Walk attributes much of her growth as a graduate student to the influence of collaborators, especially Professor Kiran Garimella at Rutgers’ Department of Library and Information Science. “So much of being a graduate student is feeling like you have a stupid question and figuring out who you can be vulnerable with in asking that stupid question,” she says. “I am very lucky to have a lot of those people in my life.”

    Encouraging the next generation

    Now, as a third-year student, Walk is the one whom others go to with their “stupid questions.” This desire to mentor and share her knowledge extends beyond the laboratory. “Something I discovered is that I really like talking to and advising people who are in a similar position to where I was. It is fulfilling to work with smart people close to my age who are just trying to figure out the answers to these meaty life issues that I have also struggled with,” she says.

    This realization led Walk to a position as a resident advisor at Harvard University’s Mather House, an undergraduate dormitory and community center. Walk became a faculty dean aide during her first year at MIT, and since then has served as a full-time Mather House resident tutor. “Every year I advise a new class of students, and I just become invested in their process. I get to talk to people about their lives, about their classes, about what is making them excited and about what is making them sad,” she says.

    After she graduates, Walk plans to explore issues that have a positive, tangible impact on policy outcomes and people, perhaps in an academic lab or in a nonprofit organization. Two such issues that particularly intrigue her are internet access and privacy for underserved populations. Regardless of the issues, she will continue to draw from both political science and data science. “One of my favorite things about being a part of interdisciplinary research is that [experts in] political science and computer science approach these issues so differently, and it is very grounding to have both of those perspectives. Political science thinks so carefully about measurement, population selection, and research design … [while] computer science has so many interesting methods that should be used in other disciplines,” she says.

    No matter what the future holds, Walk already has a sense of contentment. She admits that “my path was much less linear than I expected. I don’t think I even realized that a field like this existed.” Nevertheless, she says with a laugh, “I think that little-girl me would be very proud of present-day me.” More

  • in

    Researchers release open-source photorealistic simulator for autonomous driving

    Hyper-realistic virtual worlds have been heralded as the best driving schools for autonomous vehicles (AVs), since they’ve proven fruitful test beds for safely trying out dangerous driving scenarios. Tesla, Waymo, and other self-driving companies all rely heavily on data to enable expensive and proprietary photorealistic simulators, since testing and gathering nuanced I-almost-crashed data usually isn’t the most easy or desirable to recreate. 

    To that end, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being open-sourced to the public. 

    “Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says MIT Professor and CSAIL Director Daniela Rus, senior author on a paper about the research. 

    Play video

    VISTA is a data-driven, photorealistic simulator for autonomous driving. It can simulate not just live video but LiDAR data and event cameras, and also incorporate other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found below.

    VISTA 2.0 builds off of the team’s previous model, VISTA, and it’s fundamentally different from existing AV simulators since it’s data-driven — meaning it was built and photorealistically rendered from real-world data — thereby enabling direct transfer to reality. While the initial iteration supported only single car lane-following with one camera sensor, achieving high-fidelity data-driven simulation required rethinking the foundations of how different sensors and behavioral interactions can be synthesized. 

    Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massively interactive scenarios and intersections at scale. With much less data than previous models, the team was able to train autonomous vehicles that could be substantially more robust than those trained on large amounts of real-world data. 

    “This is a massive jump in capabilities of data-driven simulation for autonomous vehicles, as well as the increase of scale and ability to handle greater driving complexity,” says Alexander Amini, CSAIL PhD student and co-lead author on two new papers, together with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high dimensional 3D lidars with millions of points, irregularly timed event-based cameras, and even interactive and dynamic scenarios with other vehicles as well.” 

    The team was able to scale the complexity of the interactive driving tasks for things like overtaking, following, and negotiating, including multiagent scenarios in highly photorealistic environments. 

    Training AI models for autonomous vehicles involves hard-to-secure fodder of different varieties of edge cases and strange, dangerous scenarios, because most of our data (thankfully) is just run-of-the-mill, day-to-day driving. Logically, we can’t just crash into other cars just to teach a neural network how to not crash into other cars.

    Recently, there’s been a shift away from more classic, human-designed simulation environments to those built up from real-world data. The latter have immense photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question has emerged: Can the richness and complexity of all of the sensors that autonomous vehicles need, such as lidar and event-based cameras that are more sparse, accurately be synthesized? 

    Lidar sensor data is much harder to interpret in a data-driven world — you’re effectively trying to generate brand-new 3D point clouds with millions of points, only from sparse views of the world. To synthesize 3D lidar point clouds, the team used the data that the car collected, projected it into a 3D space coming from the lidar data, and then let a new virtual vehicle drive around locally from where that original vehicle was. Finally, they projected all of that sensory information back into the frame of view of this new virtual vehicle, with the help of neural networks. 

    Together with the simulation of event-based cameras, which operate at speeds greater than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but also doing so all in real time — making it possible to train neural nets offline, but also test online on the car in augmented reality setups for safe evaluations. “The question of if multisensor simulation at this scale of complexity and photorealism was possible in the realm of data-driven simulation was very much an open question,” says Amini. 

    With that, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and just drop in brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following, and more dicey scenarios like static and dynamic overtaking (seeing obstacles and moving around so you don’t collide). With the multi-agency, both real and simulated agents interact, and new agents can be dropped into the scene and controlled any which way. 

    Taking their full-scale car out into the “wild” — a.k.a. Devens, Massachusetts — the team saw  immediate transferability of results, with both failures and successes. They were also able to demonstrate the bodacious, magic word of self-driving car models: “robust.” They showed that AVs, trained entirely in VISTA 2.0, were so robust in the real world that they could handle that elusive tail of challenging failures. 

    Now, one guardrail humans rely on that can’t yet be simulated is human emotion. It’s the friendly wave, nod, or blinker switch of acknowledgement, which are the type of nuances the team wants to implement in future work. 

    “The central algorithm of this research is how we can take a dataset and build a completely synthetic world for learning and autonomy,” says Amini. “It’s a platform that I believe one day could extend in many different axes across robotics. Not just autonomous driving, but many areas that rely on vision and complex behaviors. We’re excited to release VISTA 2.0 to help enable the community to collect their own datasets and convert them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds, and then can directly transfer them to full-sized, real self-driving cars.” 

    Amini and Wang wrote the paper alongside Zhijian Liu, MIT CSAIL PhD student; Igor Gilitschenski, assistant professor in computer science at the University of Toronto; Wilko Schwarting, AI research scientist and MIT CSAIL PhD ’20; Song Han, associate professor at MIT’s Department of Electrical Engineering and Computer Science; Sertac Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Rus, MIT professor and CSAIL director. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia. 

    This work was supported by the National Science Foundation and Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of the Drive AGX Pegasus. More

  • in

    Companies use MIT research to identify and respond to supply chain risks

    In February 2020, MIT professor David Simchi-Levi predicted the future. In an article in Harvard Business Review, he and his colleague warned that the new coronavirus outbreak would throttle supply chains and shutter tens of thousands of businesses across North America and Europe by mid-March.

    For Simchi-Levi, who had developed new models of supply chain resiliency and advised major companies on how to best shield themselves from supply chain woes, the signs of disruption were plain to see. Two years later, the professor of engineering systems at the MIT Schwarzman College of Computing and the Department of Civil and Environmental Engineering, and director of the MIT Data Science Lab has found a “flood of interest” from companies anxious to apply his Risk Exposure Index (REI) research to identify and respond to hidden risks in their own supply chains.

    His work on “stress tests” for critical supply chains and ways to guide global supply chain recovery were included in the 2022 Economic Report of the President presented to the U.S. Congress in April.

    It is rare that data science research can influence policy at the highest levels, Simchi-Levi says, but his models reflect something that business needs now: a new world of continuing global crisis, without relying on historical precedent.

    “What the last two years showed is that you cannot plan just based on what happened last year or the last two years,” Simchi-Levi says.

    He recalled the famous quote, sometimes attributed to hockey great Wayne Gretzsky, that good players don’t skate to where the puck is, but where the puck is going to be. “We are not focusing on the state of the supply chain right now, but what may happen six weeks from now, eight weeks from now, to prepare ourselves today to prevent the problems of the future.”

    Finding hidden risks

    At the heart of REI is a mathematical model of the supply chain that focuses on potential failures at different supply chain nodes — a flood at a supplier’s factory, or a shortage of raw materials at another factory, for instance. By calculating variables such as “time-to-recover” (TTR), which measures how long it will take a particular node to be back at full function, and time-to-survive (TTS), which identifies the maximum duration that the supply chain can match supply with demand after a disruption, the model focuses on the impact of disruption on the supply chain, rather than the cause of disruption.

    Even before the pandemic, catastrophic events such as the 2010 Iceland volcanic eruption and the 2011 Tohoku earthquake and tsunami in Japan were threatening these nodes. “For many years, companies from a variety of industries focused mostly on efficiency, cutting costs as much as possible, using strategies like outsourcing and offshoring,” Simchi-Levi says. “They were very successful doing this, but it has dramatically increased their exposure to risk.”

    Using their model, Simchi-Levi and colleagues began working with Ford Motor Company in 2013 to improve the company’s supply chain resiliency. The partnership uncovered some surprising hidden risks.

    To begin with, the researchers found out that Ford’s “strategic suppliers” — the nodes of the supply chain where the company spent large amount of money each year — had only moderate exposure to risk. Instead, the biggest risk “tended to come from tiny suppliers that provide Ford with components that cost about 10 cents,” says Simchi-Levi.

    The analysis also found that risky suppliers are everywhere across the globe. “There is this idea that if you just move suppliers closer to market, to demand, to North America or to Mexico, you increase the resiliency of your supply chain. That is not supported by our data,” he says.

    Rewards of resiliency

    By creating a virtual representation, or “digital twin,” of the Ford supply chain, the researchers were able to test out strategies at each node to see what would increase supply chain resiliency. Should the company invest in more warehouses to store a key component? Should it shift production of a component to another factory?

    Companies are sometimes reluctant to invest in supply chain resiliency, Simchi-Levi says, but the analysis isn’t just about risk. “It’s also going to help you identify savings opportunities. The company may be building a lot of misplaced, costly inventory, for instance, and our method helps them to identify these inefficiencies and cut costs.”

    Since working with Ford, Simchi-Levi and colleagues have collaborated with many other companies, including a partnership with Accenture, to scale the REI technology to a variety of industries including high-tech, industrial equipment, home improvement retailers, fashion retailers, and consumer packaged goods.

    Annette Clayton, the CEO of Schneider Electric North America and previously its chief supply chain officer, has worked with Simchi-Levi for 17 years. “When I first went to work for Schneider, I asked David and his team to help us look at resiliency and inventory positioning in order to make the best cost, delivery, flexibility, and speed trade-offs for the North American supply chain,” she says. “As the pandemic unfolded, the very learnings in supply chain resiliency we had worked on before became even more important and we partnered with David and his team again,”

    “We have used TTR and TTS to determine places where we need to develop and duplicate supplier capability, from raw materials to assembled parts. We increased inventories where our time-to-recover because of extended logistics times exceeded our time-to-survive,” Clayton adds. “We have used TTR and TTS to prioritize our workload in supplier development, procurement and expanding our own manufacturing capacity.”

    The REI approach can even be applied to an entire country’s economy, as the U.N. Office for Disaster Risk Reduction has done for developing countries such as Thailand in the wake of disastrous flooding in 2011.

    Simchi-Levi and colleagues have been motivated by the pandemic to enhance the REI model with new features. “Because we have started collaborating with more companies, we have realized some interesting, company-specific business constraints,” he says, which are leading to more efficient ways of calculating hidden risk. More

  • in

    New CRISPR-based map ties every human gene to its function

    The Human Genome Project was an ambitious initiative to sequence every piece of human DNA. The project drew together collaborators from research institutions around the world, including MIT’s Whitehead Institute for Biomedical Research, and was finally completed in 2003. Now, over two decades later, MIT Professor Jonathan Weissman and colleagues have gone beyond the sequence to present the first comprehensive functional map of genes that are expressed in human cells. The data from this project, published online June 9 in Cell, ties each gene to its job in the cell, and is the culmination of years of collaboration on the single-cell sequencing method Perturb-seq.

    The data are available for other scientists to use. “It’s a big resource in the way the human genome is a big resource, in that you can go in and do discovery-based research,” says Weissman, who is also a member of the Whitehead Institute and an investigator with the Howard Hughes Medical Institute. “Rather than defining ahead of time what biology you’re going to be looking at, you have this map of the genotype-phenotype relationships and you can go in and screen the database without having to do any experiments.”

    The screen allowed the researchers to delve into diverse biological questions. They used it to explore the cellular effects of genes with unknown functions, to investigate the response of mitochondria to stress, and to screen for genes that cause chromosomes to be lost or gained, a phenotype that has proved difficult to study in the past. “I think this dataset is going to enable all sorts of analyses that we haven’t even thought up yet by people who come from other parts of biology, and suddenly they just have this available to draw on,” says former Weissman Lab postdoc Tom Norman, a co-senior author of the paper.

    Pioneering Perturb-seq

    The project takes advantage of the Perturb-seq approach that makes it possible to follow the impact of turning on or off genes with unprecedented depth. This method was first published in 2016 by a group of researchers including Weissman and fellow MIT professor Aviv Regev, but could only be used on small sets of genes and at great expense.

    The massive Perturb-seq map was made possible by foundational work from Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of the present paper. Replogle, in collaboration with Norman, who now leads a lab at Memorial Sloan Kettering Cancer Center; Britt Adamson, an assistant professor in the Department of Molecular Biology at Princeton University; and a group at 10x Genomics, set out to create a new version of Perturb-seq that could be scaled up. The researchers published a proof-of-concept paper in Nature Biotechnology in 2020. 

    The Perturb-seq method uses CRISPR-Cas9 genome editing to introduce genetic changes into cells, and then uses single-cell RNA sequencing to capture information about the RNAs that are expressed resulting from a given genetic change. Because RNAs control all aspects of how cells behave, this method can help decode the many cellular effects of genetic changes.

    Since their initial proof-of-concept paper, Weissman, Regev, and others have used this sequencing method on smaller scales. For example, the researchers used Perturb-seq in 2021 to explore how human and viral genes interact over the course of an infection with HCMV, a common herpesvirus.

    In the new study, Replogle and collaborators including Reuben Saunders, a graduate student in Weissman’s lab and co-first author of the paper, scaled up the method to the entire genome. Using human blood cancer cell lines as well noncancerous cells derived from the retina, he performed Perturb-seq across more than 2.5 million cells, and used the data to build a comprehensive map tying genotypes to phenotypes.

    Delving into the data

    Upon completing the screen, the researchers decided to put their new dataset to use and examine a few biological questions. “The advantage of Perturb-seq is it lets you get a big dataset in an unbiased way,” says Tom Norman. “No one knows entirely what the limits are of what you can get out of that kind of dataset. Now, the question is, what do you actually do with it?”

    The first, most obvious application was to look into genes with unknown functions. Because the screen also read out phenotypes of many known genes, the researchers could use the data to compare unknown genes to known ones and look for similar transcriptional outcomes, which could suggest the gene products worked together as part of a larger complex.

    The mutation of one gene called C7orf26 in particular stood out. Researchers noticed that genes whose removal led to a similar phenotype were part of a protein complex called Integrator that played a role in creating small nuclear RNAs. The Integrator complex is made up of many smaller subunits — previous studies had suggested 14 individual proteins — and the researchers were able to confirm that C7orf26 made up a 15th component of the complex.

    They also discovered that the 15 subunits worked together in smaller modules to perform specific functions within the Integrator complex. “Absent this thousand-foot-high view of the situation, it was not so clear that these different modules were so functionally distinct,” says Saunders.

    Another perk of Perturb-seq is that because the assay focuses on single cells, the researchers could use the data to look at more complex phenotypes that become muddied when they are studied together with data from other cells. “We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” Weissman says. “But sometimes when you knock down a gene, different cells that are losing that same gene behave differently, and that behavior may be missed by the average.”

    The researchers found that a subset of genes whose removal led to different outcomes from cell to cell were responsible for chromosome segregation. Their removal was causing cells to lose a chromosome or pick up an extra one, a condition known as aneuploidy. “You couldn’t predict what the transcriptional response to losing this gene was because it depended on the secondary effect of what chromosome you gained or lost,” Weissman says. “We realized we could then turn this around and create this composite phenotype looking for signatures of chromosomes being gained and lost. In this way, we’ve done the first genome-wide screen for factors that are required for the correct segregation of DNA.”

    “I think the aneuploidy study is the most interesting application of this data so far,” Norman says. “It captures a phenotype that you can only get using a single-cell readout. You can’t go after it any other way.”

    The researchers also used their dataset to study how mitochondria responded to stress. Mitochondria, which evolved from free-living bacteria, carry 13 genes in their genomes. Within the nuclear DNA, around 1,000 genes are somehow related to mitochondrial function. “People have been interested for a long time in how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed,” Replogle says.

    The researchers found that when they perturbed different mitochondria-related genes, the nuclear genome responded similarly to many different genetic changes. However, the mitochondrial genome responses were much more variable. 

    “There’s still an open question of why mitochondria still have their own DNA,” said Replogle. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors.”

    “If you have one mitochondria that’s broken, and another one that is broken in a different way, those mitochondria could be responding differentially,” Weissman says.

    In the future, the researchers hope to use Perturb-seq on different types of cells besides the cancer cell line they started in. They also hope to continue to explore their map of gene functions, and hope others will do the same. “This really is the culmination of many years of work by the authors and other collaborators, and I’m really pleased to see it continue to succeed and expand,” says Norman. More

  • in

    Hallucinating to better text translation

    As babies, we babble and imitate our way to learning languages. We don’t start off reading raw text, which requires fundamental knowledge and understanding about the world, as well as the advanced ability to interpret and infer descriptions and relationships. Rather, humans begin our language journey slowly, by pointing and interacting with our environment, basing our words and perceiving their meaning through the context of the physical and social world. Eventually, we can craft full sentences to communicate complex ideas.

    Similarly, when humans begin learning and translating into another language, the incorporation of other sensory information, like multimedia, paired with the new and unfamiliar words, like flashcards with images, improves language acquisition and retention. Then, with enough practice, humans can accurately translate new, unseen sentences in context without the accompanying media; however, imagining a picture based on the original text helps.

    This is the basis of a new machine learning model, called VALHALLA, by researchers from MIT, IBM, and the University of California at San Diego, in which a trained neural network sees a source sentence in one language, hallucinates an image of what it looks like, and then uses both to translate into a target language. The team found that their method demonstrates improved accuracy of machine translation over text-only translation. Further, it provided an additional boost for cases with long sentences, under-resourced languages, and instances where part of the source sentence is inaccessible to the machine translator.

    As a core task within the AI field of natural language processing (NLP), machine translation is an “eminently practical technology that’s being used by millions of people every day,” says study co-author Yoon Kim, assistant professor in MIT’s Department of Electrical Engineering and Computer Science with affiliations in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT-IBM Watson AI Lab. With recent, significant advances in deep learning, “there’s been an interesting development in how one might use non-text information — for example, images, audio, or other grounding information — to tackle practical tasks involving language” says Kim, because “when humans are performing language processing tasks, we’re doing so within a grounded, situated world.” The pairing of hallucinated images and text during inference, the team postulated, imitates that process, providing context for improved performance over current state-of-the-art techniques, which utilize text-only data.

    This research will be presented at the IEEE / CVF Computer Vision and Pattern Recognition Conference this month. Kim’s co-authors are UC San Diego graduate student Yi Li and Professor Nuno Vasconcelos, along with research staff members Rameswar Panda, Chun-fu “Richard” Chen, Rogerio Feris, and IBM Director David Cox of IBM Research and the MIT-IBM Watson AI Lab.

    Learning to hallucinate from images

    When we learn new languages and to translate, we’re often provided with examples and practice before venturing out on our own. The same is true for machine-translation systems; however, if images are used during training, these AI methods also require visual aids for testing, limiting their applicability, says Panda.

    “In real-world scenarios, you might not have an image with respect to the source sentence. So, our motivation was basically: Instead of using an external image during inference as input, can we use visual hallucination — the ability to imagine visual scenes — to improve machine translation systems?” says Panda.

    To do this, the team used an encoder-decoder architecture with two transformers, a type of neural network model that’s suited for sequence-dependent data, like language, that can pay attention key words and semantics of a sentence. One transformer generates a visual hallucination, and the other performs multimodal translation using outputs from the first transformer.

    During training, there are two streams of translation: a source sentence and a ground-truth image that is paired with it, and the same source sentence that is visually hallucinated to make a text-image pair. First the ground-truth image and sentence are tokenized into representations that can be handled by transformers; for the case of the sentence, each word is a token. The source sentence is tokenized again, but this time passed through the visual hallucination transformer, outputting a hallucination, a discrete image representation of the sentence. The researchers incorporated an autoregression that compares the ground-truth and hallucinated representations for congruency — e.g., homonyms: a reference to an animal “bat” isn’t hallucinated as a baseball bat. The hallucination transformer then uses the difference between them to optimize its predictions and visual output, making sure the context is consistent.

    The two sets of tokens are then simultaneously passed through the multimodal translation transformer, each containing the sentence representation and either the hallucinated or ground-truth image. The tokenized text translation outputs are compared with the goal of being similar to each other and to the target sentence in another language. Any differences are then relayed back to the translation transformer for further optimization.

    For testing, the ground-truth image stream drops off, since images likely wouldn’t be available in everyday scenarios.

    “To the best of our knowledge, we haven’t seen any work which actually uses a hallucination transformer jointly with a multimodal translation system to improve machine translation performance,” says Panda.

    Visualizing the target text

    To test their method, the team put VALHALLA up against other state-of-the-art multimodal and text-only translation methods. They used public benchmark datasets containing ground-truth images with source sentences, and a dataset for translating text-only news articles. The researchers measured its performance over 13 tasks, ranging from translation on well-resourced languages (like English, German, and French), under-resourced languages (like English to Romanian) and non-English (like Spanish to French). The group also tested varying transformer model sizes, how accuracy changes with the sentence length, and translation under limited textual context, where portions of the text were hidden from the machine translators.

    The team observed significant improvements over text-only translation methods, improving data efficiency, and that smaller models performed better than the larger base model. As sentences became longer, VALHALLA’s performance over other methods grew, which the researchers attributed to the addition of more ambiguous words. In cases where part of the sentence was masked, VALHALLA could recover and translate the original text, which the team found surprising.

    Further unexpected findings arose: “Where there weren’t as many training [image and] text pairs, [like for under-resourced languages], improvements were more significant, which indicates that grounding in images helps in low-data regimes,” says Kim. “Another thing that was quite surprising to me was this improved performance, even on types of text that aren’t necessarily easily connectable to images. For example, maybe it’s not so surprising if this helps in translating visually salient sentences, like the ‘there is a red car in front of the house.’ [However], even in text-only [news article] domains, the approach was able to improve upon text-only systems.”

    While VALHALLA performs well, the researchers note that it does have limitations, requiring pairs of sentences to be annotated with an image, which could make it more expensive to obtain. It also performs better in its ground domain and not the text-only news articles. Moreover, Kim and Panda note, a technique like VALHALLA is still a black box, with the assumption that hallucinated images are providing helpful information, and the team plans to investigate what and how the model is learning in order to validate their methods.

    In the future, the team plans to explore other means of improving translation. “Here, we only focus on images, but there are other types of a multimodal information — for example, speech, video or touch, or other sensory modalities,” says Panda. “We believe such multimodal grounding can lead to even more efficient machine translation models, potentially benefiting translation across many low-resource languages spoken in the world.”

    This research was supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation. More