More stories

  • in

    A better way to study ocean currents

    To study ocean currents, scientists release GPS-tagged buoys in the ocean and record their velocities to reconstruct the currents that transport them. These buoy data are also used to identify “divergences,” which are areas where water rises up from below the surface or sinks beneath it.

    By accurately predicting currents and pinpointing divergences, scientists can more precisely forecast the weather, approximate how oil will spread after a spill, or measure energy transfer in the ocean. A new model that incorporates machine learning makes more accurate predictions than conventional models do, a new study reports.

    A multidisciplinary research team including computer scientists at MIT and oceanographers has found that a standard statistical model typically used on buoy data can struggle to accurately reconstruct currents or identify divergences because it makes unrealistic assumptions about the behavior of water.

    The researchers developed a new model that incorporates knowledge from fluid dynamics to better reflect the physics at work in ocean currents. They show that their method, which only requires a small amount of additional computational expense, is more accurate at predicting currents and identifying divergences than the traditional model.

    This new model could help oceanographers make more accurate estimates from buoy data, which would enable them to more effectively monitor the transportation of biomass (such as Sargassum seaweed), carbon, plastics, oil, and nutrients in the ocean. This information is also important for understanding and tracking climate change.

    “Our method captures the physical assumptions more appropriately and more accurately. In this case, we know a lot of the physics already. We are giving the model a little bit of that information so it can focus on learning the things that are important to us, like what are the currents away from the buoys, or what is this divergence and where is it happening?” says senior author Tamara Broderick, an associate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.

    Broderick’s co-authors include lead author Renato Berlinghieri, an electrical engineering and computer science graduate student; Brian L. Trippe, a postdoc at Columbia University; David R. Burt and Ryan Giordano, MIT postdocs; Kaushik Srinivasan, an assistant researcher in atmospheric and ocean sciences at the University of California at Los Angeles; Tamay Özgökmen, professor in the Department of Ocean Sciences at the University of Miami; and Junfei Xia, a graduate student at the University of Miami. The research will be presented at the International Conference on Machine Learning.

    Diving into the data

    Oceanographers use data on buoy velocity to predict ocean currents and identify “divergences” where water rises to the surface or sinks deeper.

    To estimate currents and find divergences, oceanographers have used a machine-learning technique known as a Gaussian process, which can make predictions even when data are sparse. To work well in this case, the Gaussian process must make assumptions about the data to generate a prediction.

    A standard way of applying a Gaussian process to oceans data assumes the latitude and longitude components of the current are unrelated. But this assumption isn’t physically accurate. For instance, this existing model implies that a current’s divergence and its vorticity (a whirling motion of fluid) operate on the same magnitude and length scales. Ocean scientists know this is not true, Broderick says. The previous model also assumes the frame of reference matters, which means fluid would behave differently in the latitude versus the longitude direction.

    “We were thinking we could address these problems with a model that incorporates the physics,” she says.

    They built a new model that uses what is known as a Helmholtz decomposition to accurately represent the principles of fluid dynamics. This method models an ocean current by breaking it down into a vorticity component (which captures the whirling motion) and a divergence component (which captures water rising or sinking).

    In this way, they give the model some basic physics knowledge that it uses to make more accurate predictions.

    This new model utilizes the same data as the old model. And while their method can be more computationally intensive, the researchers show that the additional cost is relatively small.

    Buoyant performance

    They evaluated the new model using synthetic and real ocean buoy data. Because the synthetic data were fabricated by the researchers, they could compare the model’s predictions to ground-truth currents and divergences. But simulation involves assumptions that may not reflect real life, so the researchers also tested their model using data captured by real buoys released in the Gulf of Mexico.

    This shows the trajectories of approximately 300 buoys released during the Grand LAgrangian Deployment (GLAD) in the Gulf of Mexico in the summer of 2013, to learn about ocean surface currents around the Deepwater Horizon oil spill site. The small, regular clockwise rotations are due to Earth’s rotation.Credit: Consortium of Advanced Research for Transport of Hydrocarbons in the Environment

    In each case, their method demonstrated superior performance for both tasks, predicting currents and identifying divergences, when compared to the standard Gaussian process and another machine-learning approach that used a neural network. For example, in one simulation that included a vortex adjacent to an ocean current, the new method correctly predicted no divergence while the previous Gaussian process method and the neural network method both predicted a divergence with very high confidence.

    The technique is also good at identifying vortices from a small set of buoys, Broderick adds.

    Now that they have demonstrated the effectiveness of using a Helmholtz decomposition, the researchers want to incorporate a time element into their model, since currents can vary over time as well as space. In addition, they want to better capture how noise impacts the data, such as winds that sometimes affect buoy velocity. Separating that noise from the data could make their approach more accurate.

    “Our hope is to take this noisily observed field of velocities from the buoys, and then say what is the actual divergence and actual vorticity, and predict away from those buoys, and we think that our new technique will be helpful for this,” she says.

    “The authors cleverly integrate known behaviors from fluid dynamics to model ocean currents in a flexible model,” says Massimiliano Russo, an associate biostatistician at Brigham and Women’s Hospital and instructor at Harvard Medical School, who was not involved with this work. “The resulting approach retains the flexibility to model the nonlinearity in the currents but can also characterize phenomena such as vortices and connected currents that would only be noticed if the fluid dynamic structure is integrated into the model. This is an excellent example of where a flexible model can be substantially improved with a well thought and scientifically sound specification.”

    This research is supported, in part, by the Office of Naval Research, a National Science Foundation (NSF) CAREER Award, and the Rosenstiel School of Marine, Atmospheric, and Earth Science at the University of Miami. More

  • in

    Success at the intersection of technology and finance

    Citadel founder and CEO Ken Griffin had some free advice for an at-capacity crowd of MIT students at the Wong Auditorium during a campus visit in April. “If you find yourself in a career where you’re not learning,” he told them, “it’s time to change jobs. In this world, if you’re not learning, you can find yourself irrelevant in the blink of an eye.”

    During a conversation with Bryan Landman ’11, senior quantitative research lead for Citadel’s Global Quantitative Strategies business, Griffin reflected on his career and offered predictions for the impact of technology on the finance sector. Citadel, which he launched in 1990, is now one of the world’s leading investment firms. Griffin also serves as non-executive chair of Citadel Securities, a market maker that is known as a key player in the modernization of markets and market structures.

    “We are excited to hear Ken share his perspective on how technology continues to shape the future of finance, including the emerging trends of quantum computing and AI,” said David Schmittlein, the John C Head III Dean and professor of marketing at MIT Sloan School of Management, who kicked off the program. The presentation was jointly sponsored by MIT Sloan, the MIT Schwarzman College of Computing, the School of Engineering, MIT Career Advising and Professional Development, and Citadel Securities Campus Recruiting.

    The future, in Griffin’s view, “is all about the application of engineering, software, and mathematics to markets. Successful entrepreneurs are those who have the tools to solve the unsolved problems of that moment in time.” He launched Citadel only one year after graduating from college. “History so far has been kind to the vision I had back in the late ’80s,” he said.

    Griffin realized very early in his career “that you could use a personal computer and quantitative finance to price traded securities in a way that was much more advanced than you saw on your typical equity trading desk on Wall Street.” Both businesses, he told the audience, are ultimately driven by research. “That’s where we formulate the ideas, and trading is how we monetize that research.”

    It’s also why Citadel and Citadel Securities employ several hundred software engineers. “We have a huge investment today in using modern technology to power our decision-making and trading,” said Griffin.

    One example of Citadel’s application of technology and science is the firm’s hiring of a meteorological team to expand the weather analytics expertise within its commodities business. While power supply is relatively easy to map and analyze, predicting demand is much more difficult. Citadel’s weather team feeds forecast data obtained from supercomputers to its traders. “Wind and solar are huge commodities,” Griffin explained, noting that the days with highest demand in the power market are cloudy, cold days with no wind. When you can forecast those days better than the market as a whole, that’s where you can identify opportunities, he added.

    Pros and cons of machine learning

    Asking about the impact of new technology on their sector, Landman noted that both Citadel and Citadel Securities are already leveraging machine learning. “In the market-making business,” Griffin said, “you see a real application for machine learning because you have so much data to parametrize the models with. But when you get into longer time horizon problems, machine learning starts to break down.”

    Griffin noted that the data obtained through machine learning is most helpful for investments with short time horizons, such as in its quantitative strategies business. “In our fundamental equities business,” he said, “machine learning is not as helpful as you would want because the underlying systems are not stationary.”

    Griffin was emphatic that “there has been a moment in time where being a really good statistician or really understanding machine-learning models was sufficient to make money. That won’t be the case for much longer.” One of the guiding principles at Citadel, he and Landman agreed, was that machine learning and other methodologies should not be used blindly. Each analyst has to cite the underlying economic theory driving their argument on investment decisions. “If you understand the problem in a different way than people who are just using the statistical models,” he said, “you have a real chance for a competitive advantage.”

    ChatGPT and a seismic shift

    Asked if ChatGPT will change history, Griffin predicted that the rise of capabilities in large language models will transform a substantial number of white collar jobs. “With open AI for most routine commercial legal documents, ChatGPT will do a better job writing a lease than a young lawyer. This is the first time we are seeing traditionally white-collar jobs at risk due to technology, and that’s a sea change.”

    Griffin urged MIT students to work with the smartest people they can find, as he did: “The magic of Citadel has been a testament to the idea that by surrounding yourself with bright, ambitious people, you can accomplish something special. I went to great lengths to hire the brightest people I could find and gave them responsibility and trust early in their careers.”

    Even more critical to success is the willingness to advocate for oneself, Griffin said, using Gerald Beeson, Citadel’s chief operating officer, as an example. Beeson, who started as an intern at the firm, “consistently sought more responsibility and had the foresight to train his own successors.” Urging students to take ownership of their careers, Griffin advised: “Make it clear that you’re willing to take on more responsibility, and think about what the roadblocks will be.”

    When microphones were handed to the audience, students inquired what changes Griffin would like to see in the hedge fund industry, how Citadel assesses the risk and reward of potential projects, and whether hedge funds should give back to the open source community. Asked about the role that Citadel — and its CEO — should play in “the wider society,” Griffin spoke enthusiastically of his belief in participatory democracy. “We need better people on both sides of the aisle,” he said. “I encourage all my colleagues to be politically active. It’s unfortunate when firms shut down political dialogue; we actually embrace it.”

    Closing on an optimistic note, Griffin urged the students in the audience to go after success, declaring, “The world is always awash in challenge and its shortcomings, but no matter what anybody says, you live at the greatest moment in the history of the planet. Make the most of it.” More

  • in

    Study: AI models fail to reproduce human judgements about rule violations

    In an effort to improve fairness or reduce backlogs, machine-learning models are sometimes designed to mimic human decision making, such as deciding whether social media posts violate toxic content policies.

    But researchers from MIT and elsewhere have found that these models often do not replicate human decisions about rule violations. If models are not trained with the right data, they are likely to make different, often harsher judgements than humans would.

    In this case, the “right” data are those that have been labeled by humans who were explicitly asked whether items defy a certain rule. Training involves showing a machine-learning model millions of examples of this “normative data” so it can learn a task.

    But data used to train machine-learning models are typically labeled descriptively — meaning humans are asked to identify factual features, such as, say, the presence of fried food in a photo. If “descriptive data” are used to train models that judge rule violations, such as whether a meal violates a school policy that prohibits fried food, the models tend to over-predict rule violations.

    This drop in accuracy could have serious implications in the real world. For instance, if a descriptive model is used to make decisions about whether an individual is likely to reoffend, the researchers’ findings suggest it may cast stricter judgements than a human would, which could lead to higher bail amounts or longer criminal sentences.

    “I think most artificial intelligence/machine-learning researchers assume that the human judgements in data and labels are biased, but this result is saying something worse. These models are not even reproducing already-biased human judgments because the data they’re being trained on has a flaw: Humans would label the features of images and text differently if they knew those features would be used for a judgment. This has huge ramifications for machine learning systems in human processes,” says Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Ghassemi is senior author of a new paper detailing these findings, which was published today in Science Advances. Joining her on the paper are lead author Aparna Balagopalan, an electrical engineering and computer science graduate student; David Madras, a graduate student at the University of Toronto; David H. Yang, a former graduate student who is now co-founder of ML Estimation; Dylan Hadfield-Menell, an MIT assistant professor; and Gillian K. Hadfield, Schwartz Reisman Chair in Technology and Society and professor of law at the University of Toronto.

    Labeling discrepancy

    This study grew out of a different project that explored how a machine-learning model can justify its predictions. As they gathered data for that study, the researchers noticed that humans sometimes give different answers if they are asked to provide descriptive or normative labels about the same data.

    To gather descriptive labels, researchers ask labelers to identify factual features — does this text contain obscene language? To gather normative labels, researchers give labelers a rule and ask if the data violates that rule — does this text violate the platform’s explicit language policy?

    Surprised by this finding, the researchers launched a user study to dig deeper. They gathered four datasets to mimic different policies, such as a dataset of dog images that could be in violation of an apartment’s rule against aggressive breeds. Then they asked groups of participants to provide descriptive or normative labels.

    In each case, the descriptive labelers were asked to indicate whether three factual features were present in the image or text, such as whether the dog appears aggressive. Their responses were then used to craft judgements. (If a user said a photo contained an aggressive dog, then the policy was violated.) The labelers did not know the pet policy. On the other hand, normative labelers were given the policy prohibiting aggressive dogs, and then asked whether it had been violated by each image, and why.

    The researchers found that humans were significantly more likely to label an object as a violation in the descriptive setting. The disparity, which they computed using the absolute difference in labels on average, ranged from 8 percent on a dataset of images used to judge dress code violations to 20 percent for the dog images.

    “While we didn’t explicitly test why this happens, one hypothesis is that maybe how people think about rule violations is different from how they think about descriptive data. Generally, normative decisions are more lenient,” Balagopalan says.

    Yet data are usually gathered with descriptive labels to train a model for a particular machine-learning task. These data are often repurposed later to train different models that perform normative judgements, like rule violations.

    Training troubles

    To study the potential impacts of repurposing descriptive data, the researchers trained two models to judge rule violations using one of their four data settings. They trained one model using descriptive data and the other using normative data, and then compared their performance.

    They found that if descriptive data are used to train a model, it will underperform a model trained to perform the same judgements using normative data. Specifically, the descriptive model is more likely to misclassify inputs by falsely predicting a rule violation. And the descriptive model’s accuracy was even lower when classifying objects that human labelers disagreed about.

    “This shows that the data do really matter. It is important to match the training context to the deployment context if you are training models to detect if a rule has been violated,” Balagopalan says.

    It can be very difficult for users to determine how data have been gathered; this information can be buried in the appendix of a research paper or not revealed by a private company, Ghassemi says.

    Improving dataset transparency is one way this problem could be mitigated. If researchers know how data were gathered, then they know how those data should be used. Another possible strategy is to fine-tune a descriptively trained model on a small amount of normative data. This idea, known as transfer learning, is something the researchers want to explore in future work.

    They also want to conduct a similar study with expert labelers, like doctors or lawyers, to see if it leads to the same label disparity.

    “The way to fix this is to transparently acknowledge that if we want to reproduce human judgment, we must only use data that were collected in that setting. Otherwise, we are going to end up with systems that are going to have extremely harsh moderations, much harsher than what humans would do. Humans would see nuance or make another distinction, whereas these models don’t,” Ghassemi says.

    This research was funded, in part, by the Schwartz Reisman Institute for Technology and Society, Microsoft Research, the Vector Institute, and a Canada Research Council Chain. More

  • in

    Researchers create a tool for accurately simulating complex systems

    Researchers often use simulations when designing new algorithms, since testing ideas in the real world can be both costly and risky. But since it’s impossible to capture every detail of a complex system in a simulation, they typically collect a small amount of real data that they replay while simulating the components they want to study.

    Known as trace-driven simulation (the small pieces of real data are called traces), this method sometimes results in biased outcomes. This means researchers might unknowingly choose an algorithm that is not the best one they evaluated, and which will perform worse on real data than the simulation predicted that it should.

    MIT researchers have developed a new method that eliminates this source of bias in trace-driven simulation. By enabling unbiased trace-driven simulations, the new technique could help researchers design better algorithms for a variety of applications, including improving video quality on the internet and increasing the performance of data processing systems.

    The researchers’ machine-learning algorithm draws on the principles of causality to learn how the data traces were affected by the behavior of the system. In this way, they can replay the correct, unbiased version of the trace during the simulation.

    When compared to a previously developed trace-driven simulator, the researchers’ simulation method correctly predicted which newly designed algorithm would be best for video streaming — meaning the one that led to less rebuffering and higher visual quality. Existing simulators that do not account for bias would have pointed researchers to a worse-performing algorithm.

    “Data are not the only thing that matter. The story behind how the data are generated and collected is also important. If you want to answer a counterfactual question, you need to know the underlying data generation story so you only intervene on those things that you really want to simulate,” says Arash Nasr-Esfahany, an electrical engineering and computer science (EECS) graduate student and co-lead author of a paper on this new technique.

    He is joined on the paper by co-lead authors and fellow EECS graduate students Abdullah Alomar and Pouya Hamadanian; recent graduate student Anish Agarwal PhD ’21; and senior authors Mohammad Alizadeh, an associate professor of electrical engineering and computer science; and Devavrat Shah, the Andrew and Erna Viterbi Professor in EECS and a member of the Institute for Data, Systems, and Society and of the Laboratory for Information and Decision Systems. The research was recently presented at the USENIX Symposium on Networked Systems Design and Implementation.

    Specious simulations

    The MIT researchers studied trace-driven simulation in the context of video streaming applications.

    In video streaming, an adaptive bitrate algorithm continually decides the video quality, or bitrate, to transfer to a device based on real-time data on the user’s bandwidth. To test how different adaptive bitrate algorithms impact network performance, researchers can collect real data from users during a video stream for a trace-driven simulation.

    They use these traces to simulate what would have happened to network performance had the platform used a different adaptive bitrate algorithm in the same underlying conditions.

    Researchers have traditionally assumed that trace data are exogenous, meaning they aren’t affected by factors that are changed during the simulation. They would assume that, during the period when they collected the network performance data, the choices the bitrate adaptation algorithm made did not affect those data.

    But this is often a false assumption that results in biases about the behavior of new algorithms, making the simulation invalid, Alizadeh explains.

    “We recognized, and others have recognized, that this way of doing simulation can induce errors. But I don’t think people necessarily knew how significant those errors could be,” he says.

    To develop a solution, Alizadeh and his collaborators framed the issue as a causal inference problem. To collect an unbiased trace, one must understand the different causes that affect the observed data. Some causes are intrinsic to a system, while others are affected by the actions being taken.

    In the video streaming example, network performance is affected by the choices the bitrate adaptation algorithm made — but it’s also affected by intrinsic elements, like network capacity.

    “Our task is to disentangle these two effects, to try to understand what aspects of the behavior we are seeing are intrinsic to the system and how much of what we are observing is based on the actions that were taken. If we can disentangle these two effects, then we can do unbiased simulations,” he says.

    Learning from data

    But researchers often cannot directly observe intrinsic properties. This is where the new tool, called CausalSim, comes in. The algorithm can learn the underlying characteristics of a system using only the trace data.

    CausalSim takes trace data that were collected through a randomized control trial, and estimates the underlying functions that produced those data. The model tells the researchers, under the exact same underlying conditions that a user experienced, how a new algorithm would change the outcome.

    Using a typical trace-driven simulator, bias might lead a researcher to select a worse-performing algorithm, even though the simulation indicates it should be better. CausalSim helps researchers select the best algorithm that was tested.

    The MIT researchers observed this in practice. When they used CausalSim to design an improved bitrate adaptation algorithm, it led them to select a new variant that had a stall rate that was nearly 1.4 times lower than a well-accepted competing algorithm, while achieving the same video quality. The stall rate is the amount of time a user spent rebuffering the video.

    By contrast, an expert-designed trace-driven simulator predicted the opposite. It indicated that this new variant should cause a stall rate that was nearly 1.3 times higher. The researchers tested the algorithm on real-world video streaming and confirmed that CausalSim was correct.

    “The gains we were getting in the new variant were very close to CausalSim’s prediction, while the expert simulator was way off. This is really exciting because this expert-designed simulator has been used in research for the past decade. If CausalSim can so clearly be better than this, who knows what we can do with it?” says Hamadanian.

    During a 10-month experiment, CausalSim consistently improved simulation accuracy, resulting in algorithms that made about half as many errors as those designed using baseline methods.

    In the future, the researchers want to apply CausalSim to situations where randomized control trial data are not available or where it is especially difficult to recover the causal dynamics of the system. They also want to explore how to design and monitor systems to make them more amenable to causal analysis. More

  • in

    Researchers develop novel AI-based estimator for manufacturing medicine

    When medical companies manufacture the pills and tablets that treat any number of illnesses, aches, and pains, they need to isolate the active pharmaceutical ingredient from a suspension and dry it. The process requires a human operator to monitor an industrial dryer, agitate the material, and watch for the compound to take on the right qualities for compressing into medicine. The job depends heavily on the operator’s observations.   

    Methods for making that process less subjective and a lot more efficient are the subject of a recent Nature Communications paper authored by researchers at MIT and Takeda. The paper’s authors devise a way to use physics and machine learning to categorize the rough surfaces that characterize particles in a mixture. The technique, which uses a physics-enhanced autocorrelation-based estimator (PEACE), could change pharmaceutical manufacturing processes for pills and powders, increasing efficiency and accuracy and resulting in fewer failed batches of pharmaceutical products.  

    “Failed batches or failed steps in the pharmaceutical process are very serious,” says Allan Myerson, a professor of practice in the MIT Department of Chemical Engineering and one of the study’s authors. “Anything that improves the reliability of the pharmaceutical manufacturing, reduces time, and improves compliance is a big deal.”

    The team’s work is part of an ongoing collaboration between Takeda and MIT, launched in 2020. The MIT-Takeda Program aims to leverage the experience of both MIT and Takeda to solve problems at the intersection of medicine, artificial intelligence, and health care.

    In pharmaceutical manufacturing, determining whether a compound is adequately mixed and dried ordinarily requires stopping an industrial-sized dryer and taking samples off the manufacturing line for testing. Researchers at Takeda thought artificial intelligence could improve the task and reduce stoppages that slow down production. Originally the research team planned to use videos to train a computer model to replace a human operator. But determining which videos to use to train the model still proved too subjective. Instead, the MIT-Takeda team decided to illuminate particles with a laser during filtration and drying, and measure particle size distribution using physics and machine learning. 

    “We just shine a laser beam on top of this drying surface and observe,” says Qihang Zhang, a doctoral student in MIT’s Department of Electrical Engineering and Computer Science and the study’s first author. 

    Play video

    A physics-derived equation describes the interaction between the laser and the mixture, while machine learning characterizes the particle sizes. The process doesn’t require stopping and starting the process, which means the entire job is more secure and more efficient than standard operating procedure, according to George Barbastathis, professor of mechanical engineering at MIT and corresponding author of the study.

    The machine learning algorithm also does not require many datasets to learn its job, because the physics allows for speedy training of the neural network.

    “We utilize the physics to compensate for the lack of training data, so that we can train the neural network in an efficient way,” says Zhang. “Only a tiny amount of experimental data is enough to get a good result.”

    Today, the only inline processes used for particle measurements in the pharmaceutical industry are for slurry products, where crystals float in a liquid. There is no method for measuring particles within a powder during mixing. Powders can be made from slurries, but when a liquid is filtered and dried its composition changes, requiring new measurements. In addition to making the process quicker and more efficient, using the PEACE mechanism makes the job safer because it requires less handling of potentially highly potent materials, the authors say. 

    The ramifications for pharmaceutical manufacturing could be significant, allowing drug production to be more efficient, sustainable, and cost-effective, by reducing the number of experiments companies need to conduct when making products. Monitoring the characteristics of a drying mixture is an issue the industry has long struggled with, according to Charles Papageorgiou, the director of Takeda’s Process Chemistry Development group and one of the study’s authors. 

    “It is a problem that a lot of people are trying to solve, and there isn’t a good sensor out there,” says Papageorgiou. “This is a pretty big step change, I think, with respect to being able to monitor, in real time, particle size distribution.”

    Papageorgiou said that the mechanism could have applications in other industrial pharmaceutical operations. At some point, the laser technology may be able to train video imaging, allowing manufacturers to use a camera for analysis rather than laser measurements. The company is now working to assess the tool on different compounds in its lab. 

    The results come directly from collaboration between Takeda and three MIT departments: Mechanical Engineering, Chemical Engineering, and Electrical Engineering and Computer Science. Over the last three years, researchers at MIT and Takeda have worked together on 19 projects focused on applying machine learning and artificial intelligence to problems in the health-care and medical industry as part of the MIT-Takeda Program. 

    Often, it can take years for academic research to translate to industrial processes. But researchers are hopeful that direct collaboration could shorten that timeline. Takeda is a walking distance away from MIT’s campus, which allowed researchers to set up tests in the company’s lab, and real-time feedback from Takeda helped MIT researchers structure their research based on the company’s equipment and operations. 

    Combining the expertise and mission of both entities helps researchers ensure their experimental results will have real-world implications. The team has already filed for two patents and has plans to file for a third.   More

  • in

    Study: Covid-19 has reduced diverse urban interactions

    The Covid-19 pandemic has reduced how often urban residents intersect with people from different income brackets, according to a new study led by MIT researchers.

    Examining the movement of people in four U.S. cities before and after the onset of the pandemic, the study found a 15 to 30 percent decrease in the number of visits residents were making to areas that are socioeconomically different than their own. In turn, this has reduced people’s opportunities to interact with others from varied social and economic spheres.

    “Income diversity of urban encounters decreased during the pandemic, and not just in the lockdown stages,” says Takahiro Yabe, a postdoc at the Media Lab and co-author of a newly published paper detailing the study’s results. “It decreased in the long term as well, after mobility patterns recovered.”

    Indeed, the study found a large immediate dropoff in urban movement in the spring of 2020, when new policies temporarily shuttered many types of institutions and businesses in the U.S. and much of the world due to the emergence of the deadly Covid-19 virus. But even after such restrictions were lifted and the overall amount of urban movement approached prepandemic levels, movement patterns within cities have narrowed; people now visit fewer places.

    “We see that changes like working from home, less exploration, more online shopping, all these behaviors add up,” says Esteban Moro, a research scientist at MIT’s Sociotechnical Systems Research Center (SSRC) and another of the paper’s co-authors. “Working from home is amazing and shopping online is great, but we are not seeing each other at the rates we were before.”

    The paper, “Behavioral changes during the Covid-19 pandemic decreased income diversity of urban encounters,” appears in Nature Communications. The co-authors are Yabe; Bernardo García Bulle Bueno, a doctoral candidate at MIT’s Institute for Data, Systems, and Society (IDSS); Xiaowen Dong, an associate professor at Oxford University; Alex Pentland, professor of media arts and sciences at MIT and the Toshiba Professor at the Media Lab; and Moro, who is also an associate professor at the University Carlos III of Madrid.

    A decline in exploration

    To conduct the study, the researchers examined anonymized cellphone data from 1 million users over a three-year period, starting in early 2019, with data focused on four U.S. cities: Boston, Dallas, Los Angeles, and Seattle. The researchers recorded visits to 433,000 specific “point of interest” locations in those cities, corroborated in part with records from Infogroup’s U.S. Business Database, an annual census of company information.  

    The researchers used U.S. Census Bureau data to categorize the socioeconomic status of the people in the study, placing everyone into one of four income quartiles, based on the average income of the census block (a small area) in which they live. The scholars made the same income-level assessment for every census block in the four cities, then recorded instances in which someone spent from 10 minutes to four hours in a census block other than their own, to see how often people visited areas in different income quartiles. 

    Ultimately, the researchers found that by late 2021, the amount of urban movement overall was returning to prepandemic levels, but the scope of places residents were visiting had become more restricted.

    Among other things, people made many fewer visits to museums, leisure venues, transport sites, and coffee shops. Visits to grocery stores remained fairly constant — but people tend not to leave their socioeconomic circles for grocery shopping.

    “Early in the pandemic, people reduced their mobility radius significantly,” Yabe says. “By late 2021, that decrease flattened out, and the average dwell time people spent at places other than work and home recovered to prepandemic levels. What’s different is that exploration substantially decreased, around 5 to 10 percent. We also see less visitation to fun places.” He adds: “Museums are the most diverse places you can find, parks — they took the biggest hit during the pandemic. Places that are [more] segregated, like grocery stores, did not.”

    Overall, Moro notes, “When we explore less, we go to places that are less diverse.”

    Different cities, same pattern

    Because the study encompassed four cities with different types of policies about reopening public sites and businesses during the pandemic, the researchers could also evaluate what impact public health policies had on urban movement. But even in these different settings, the same phenomenon emerged, with a narrower range of mobility occurring by late 2021.

    “Despite the substantial differences in how cities dealt with Covid-19, the decrease in diversity and the behavioral changes were surprisingly similar across the four cities,” Yabe observes.

    The researchers emphasize that these changes in urban movement can have long-term societal effects. Prior research has shown a significant association between a diversity of social connections and greater economic success for people in lower-income groups. And while some interactions between people in different income quartiles might be brief and transactional, the evidence suggests that, on aggregate, other more substantial connections have also been reduced. Additionally, the scholars note, the narrowing of experience can also weaken civic ties and valuable political connections.

    “It’s creating an urban fabric that is actually more brittle, in the sense that we are less exposed to other people,” Moro says. “We don’t get to know other people in the city, and that is very important for policies and public opinion. We need to convince people that new policies and laws would be fair. And the only way to do that is to know other people’s needs. If we don’t see them around the city, that will be impossible.”

    At the same time, Yabe adds, “I think there is a lot we can do from a policy standpoint to bring people back to places that used to be a lot more diverse.” The researchers are currently developing further studies related to cultural and public institutions, as well and transportation issues, to try to evaluate urban connectivity in additional detail.

    “The quantity of our mobility has recovered,” Yabe says. “The quality has really changed, and we’re more segregated as a result.” More

  • in

    Drones navigate unseen environments with liquid neural networks

    In the vast, expansive skies where birds once ruled supreme, a new crop of aviators is taking flight. These pioneers of the air are not living creatures, but rather a product of deliberate innovation: drones. But these aren’t your typical flying bots, humming around like mechanical bees. Rather, they’re avian-inspired marvels that soar through the sky, guided by liquid neural networks to navigate ever-changing and unseen environments with precision and ease.

    Inspired by the adaptable nature of organic brains, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have introduced a method for robust flight navigation agents to master vision-based fly-to-target tasks in intricate, unfamiliar environments. The liquid neural networks, which can continuously adapt to new data inputs, showed prowess in making reliable decisions in unknown domains like forests, urban landscapes, and environments with added noise, rotation, and occlusion. These adaptable models, which outperformed many state-of-the-art counterparts in navigation tasks, could enable potential real-world drone applications like search and rescue, delivery, and wildlife monitoring.

    The researchers’ recent study, published today in Science Robotics, details how this new breed of agents can adapt to significant distribution shifts, a long-standing challenge in the field. The team’s new class of machine-learning algorithms, however, captures the causal structure of tasks from high-dimensional, unstructured data, such as pixel inputs from a drone-mounted camera. These networks can then extract crucial aspects of a task (i.e., understand the task at hand) and ignore irrelevant features, allowing acquired navigation skills to transfer targets seamlessly to new environments.

    Play video

    Drones navigate unseen environments with liquid neural networks.

    “We are thrilled by the immense potential of our learning-based control approach for robots, as it lays the groundwork for solving problems that arise when training in one environment and deploying in a completely distinct environment without additional training,” says Daniela Rus, CSAIL director and the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT. “Our experiments demonstrate that we can effectively teach a drone to locate an object in a forest during summer, and then deploy the model in winter, with vastly different surroundings, or even in urban settings, with varied tasks such as seeking and following. This adaptability is made possible by the causal underpinnings of our solutions. These flexible algorithms could one day aid in decision-making based on data streams that change over time, such as medical diagnosis and autonomous driving applications.”

    A daunting challenge was at the forefront: Do machine-learning systems understand the task they are given from data when flying drones to an unlabeled object? And, would they be able to transfer their learned skill and task to new environments with drastic changes in scenery, such as flying from a forest to an urban landscape? What’s more, unlike the remarkable abilities of our biological brains, deep learning systems struggle with capturing causality, frequently over-fitting their training data and failing to adapt to new environments or changing conditions. This is especially troubling for resource-limited embedded systems, like aerial drones, that need to traverse varied environments and respond to obstacles instantaneously. 

    The liquid networks, in contrast, offer promising preliminary indications of their capacity to address this crucial weakness in deep learning systems. The team’s system was first trained on data collected by a human pilot, to see how they transferred learned navigation skills to new environments under drastic changes in scenery and conditions. Unlike traditional neural networks that only learn during the training phase, the liquid neural net’s parameters can change over time, making them not only interpretable, but more resilient to unexpected or noisy data. 

    In a series of quadrotor closed-loop control experiments, the drones underwent range tests, stress tests, target rotation and occlusion, hiking with adversaries, triangular loops between objects, and dynamic target tracking. They tracked moving targets, and executed multi-step loops between objects in never-before-seen environments, surpassing performance of other cutting-edge counterparts. 

    The team believes that the ability to learn from limited expert data and understand a given task while generalizing to new environments could make autonomous drone deployment more efficient, cost-effective, and reliable. Liquid neural networks, they noted, could enable autonomous air mobility drones to be used for environmental monitoring, package delivery, autonomous vehicles, and robotic assistants. 

    “The experimental setup presented in our work tests the reasoning capabilities of various deep learning systems in controlled and straightforward scenarios,” says MIT CSAIL Research Affiliate Ramin Hasani. “There is still so much room left for future research and development on more complex reasoning challenges for AI systems in autonomous navigation applications, which has to be tested before we can safely deploy them in our society.”

    “Robust learning and performance in out-of-distribution tasks and scenarios are some of the key problems that machine learning and autonomous robotic systems have to conquer to make further inroads in society-critical applications,” says Alessio Lomuscio, professor of AI safety in the Department of Computing at Imperial College London. “In this context, the performance of liquid neural networks, a novel brain-inspired paradigm developed by the authors at MIT, reported in this study is remarkable. If these results are confirmed in other experiments, the paradigm here developed will contribute to making AI and robotic systems more reliable, robust, and efficient.”

    Clearly, the sky is no longer the limit, but rather a vast playground for the boundless possibilities of these airborne marvels. 

    Hasani and PhD student Makram Chahine; Patrick Kao ’22, MEng ’22; and PhD student Aaron Ray SM ’21 wrote the paper with Ryan Shubert ’20, MEng ’22; MIT postdocs Mathias Lechner and Alexander Amini; and Rus.

    This research was supported, in part, by Schmidt Futures, the U.S. Air Force Research Laboratory, the U.S. Air Force Artificial Intelligence Accelerator, and the Boeing Co. More

  • in

    Study: Shutting down nuclear power could increase air pollution

    Nearly 20 percent of today’s electricity in the United States comes from nuclear power. The U.S. has the largest nuclear fleet in the world, with 92 reactors scattered around the country. Many of these power plants have run for more than half a century and are approaching the end of their expected lifetimes.

    Policymakers are debating whether to retire the aging reactors or reinforce their structures to continue producing nuclear energy, which many consider a low-carbon alternative to climate-warming coal, oil, and natural gas.

    Now, MIT researchers say there’s another factor to consider in weighing the future of nuclear power: air quality. In addition to being a low carbon-emitting source, nuclear power is relatively clean in terms of the air pollution it generates. Without nuclear power, how would the pattern of air pollution shift, and who would feel its effects?

    The MIT team took on these questions in a new study appearing today in Nature Energy. They lay out a scenario in which every nuclear power plant in the country has shut down, and consider how other sources such as coal, natural gas, and renewable energy would fill the resulting energy needs throughout an entire year.

    Their analysis reveals that indeed, air pollution would increase, as coal, gas, and oil sources ramp up to compensate for nuclear power’s absence. This in itself may not be surprising, but the team has put numbers to the prediction, estimating that the increase in air pollution would have serious health effects, resulting in an additional 5,200 pollution-related deaths over a single year.

    If, however, more renewable energy sources become available to supply the energy grid, as they are expected to by the year 2030, air pollution would be curtailed, though not entirely. The team found that even under this heartier renewable scenario, there is still a slight increase in air pollution in some parts of the country, resulting in a total of 260 pollution-related deaths over one year.

    When they looked at the populations directly affected by the increased pollution, they found that Black or African American communities — a disproportionate number of whom live near fossil-fuel plants — experienced the greatest exposure.

    “This adds one more layer to the environmental health and social impacts equation when you’re thinking about nuclear shutdowns, where the conversation often focuses on local risks due to accidents and mining or long-term climate impacts,” says lead author Lyssa Freese, a graduate student in MIT’s Department of Earth, Atmospheric and Planetary Sciences (EAPS).

    “In the debate over keeping nuclear power plants open, air quality has not been a focus of that discussion,” adds study author Noelle Selin, a professor in MIT’s Institute for Data, Systems, and Society (IDSS) and EAPS. “What we found was that air pollution from fossil fuel plants is so damaging, that anything that increases it, such as a nuclear shutdown, is going to have substantial impacts, and for some people more than others.”

    The study’s MIT-affiliated co-authors also include Principal Research Scientist Sebastian Eastham and Guillaume Chossière SM ’17, PhD ’20, along with Alan Jenn of the University of California at Davis.

    Future phase-outs

    When nuclear power plants have closed in the past, fossil fuel use increased in response. In 1985, the closure of reactors in Tennessee Valley prompted a spike in coal use, while the 2012 shutdown of a plant in California led to an increase in natural gas. In Germany, where nuclear power has almost completely been phased out, coal-fired power increased initially to fill the gap.

    Noting these trends, the MIT team wondered how the U.S. energy grid would respond if nuclear power were completely phased out.

    “We wanted to think about what future changes were expected in the energy grid,” Freese says. “We knew that coal use was declining, and there was a lot of work already looking at the impact of what that would have on air quality. But no one had looked at air quality and nuclear power, which we also noticed was on the decline.”

    In the new study, the team used an energy grid dispatch model developed by Jenn to assess how the U.S. energy system would respond to a shutdown of nuclear power. The model simulates the production of every power plant in the country and runs continuously to estimate, hour by hour, the energy demands in 64 regions across the country.

    Much like the way the actual energy market operates, the model chooses to turn a plant’s production up or down based on cost: Plants producing the cheapest energy at any given time are given priority to supply the grid over more costly energy sources.

    The team fed the model available data on each plant’s changing emissions and energy costs throughout an entire year. They then ran the model under different scenarios, including: an energy grid with no nuclear power, a baseline grid similar to today’s that includes nuclear power, and a grid with no nuclear power that also incorporates the additional renewable sources that are expected to be added by 2030.

    They combined each simulation with an atmospheric chemistry model to simulate how each plant’s various emissions travel around the country and to overlay these tracks onto maps of population density. For populations in the path of pollution, they calculated the risk of premature death based on their degree of exposure.

    System response

    Play video

    Courtesy of the researchers, edited by MIT News

    Their analysis showed a clear pattern: Without nuclear power, air pollution worsened in general, mainly affecting regions in the East Coast, where nuclear power plants are mostly concentrated. Without those plants, the team observed an uptick in production from coal and gas plants, resulting in 5,200 pollution-related deaths across the country, compared to the baseline scenario.

    They also calculated that more people are also likely to die prematurely due to climate impacts from the increase in carbon dioxide emissions, as the grid compensates for nuclear power’s absence. The climate-related effects from this additional influx of carbon dioxide could lead to 160,000 additional deaths over the next century.

    “We need to be thoughtful about how we’re retiring nuclear power plants if we are trying to think about them as part of an energy system,” Freese says. “Shutting down something that doesn’t have direct emissions itself can still lead to increases in emissions, because the grid system will respond.”

    “This might mean that we need to deploy even more renewables, in order to fill the hole left by nuclear, which is essentially a zero-emissions energy source,” Selin adds. “Otherwise we will have a reduction in air quality that we weren’t necessarily counting on.”

    This study was supported, in part, by the U.S. Environmental Protection Agency. More