More stories

  • in

    Artificial intelligence predicts patients’ race from their medical images

    The miseducation of algorithms is a critical problem; when artificial intelligence mirrors unconscious thoughts, racism, and biases of the humans who generated these algorithms, it can lead to serious harm. Computer programs, for example, have wrongly flagged Black defendants as twice as likely to reoffend as someone who’s white. When an AI used cost as a proxy for health needs, it falsely named Black patients as healthier than equally sick white ones, as less money was spent on them. Even AI used to write a play relied on using harmful stereotypes for casting. 

    Removing sensitive features from the data seems like a viable tweak. But what happens when it’s not enough? 

    Examples of bias in natural language processing are boundless — but MIT scientists have investigated another important, largely underexplored modality: medical images. Using both private and public datasets, the team found that AI can accurately predict self-reported race of patients from medical images alone. Using imaging data of chest X-rays, limb X-rays, chest CT scans, and mammograms, the team trained a deep learning model to identify race as white, Black, or Asian — even though the images themselves contained no explicit mention of the patient’s race. This is a feat even the most seasoned physicians cannot do, and it’s not clear how the model was able to do this. 

    In an attempt to tease out and make sense of the enigmatic “how” of it all, the researchers ran a slew of experiments. To investigate possible mechanisms of race detection, they looked at variables like differences in anatomy, bone density, resolution of images — and many more, and the models still prevailed with high ability to detect race from chest X-rays. “These results were initially confusing, because the members of our research team could not come anywhere close to identifying a good proxy for this task,” says paper co-author Marzyeh Ghassemi, an assistant professor in the MIT Department of Electrical Engineering and Computer Science and the Institute for Medical Engineering and Science (IMES), who is an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and of the MIT Jameel Clinic. “Even when you filter medical images past where the images are recognizable as medical images at all, deep models maintain a very high performance. That is concerning because superhuman capacities are generally much more difficult to control, regulate, and prevent from harming people.”

    In a clinical setting, algorithms can help tell us whether a patient is a candidate for chemotherapy, dictate the triage of patients, or decide if a movement to the ICU is necessary. “We think that the algorithms are only looking at vital signs or laboratory tests, but it’s possible they’re also looking at your race, ethnicity, sex, whether you’re incarcerated or not — even if all of that information is hidden,” says paper co-author Leo Anthony Celi, principal research scientist in IMES at MIT and associate professor of medicine at Harvard Medical School. “Just because you have representation of different groups in your algorithms, that doesn’t guarantee it won’t perpetuate or magnify existing disparities and inequities. Feeding the algorithms with more data with representation is not a panacea. This paper should make us pause and truly reconsider whether we are ready to bring AI to the bedside.” 

    The study, “AI recognition of patient race in medical imaging: a modeling study,” was published in Lancet Digital Health on May 11. Celi and Ghassemi wrote the paper alongside 20 other authors in four countries.

    To set up the tests, the scientists first showed that the models were able to predict race across multiple imaging modalities, various datasets, and diverse clinical tasks, as well as across a range of academic centers and patient populations in the United States. They used three large chest X-ray datasets, and tested the model on an unseen subset of the dataset used to train the model and a completely different one. Next, they trained the racial identity detection models for non-chest X-ray images from multiple body locations, including digital radiography, mammography, lateral cervical spine radiographs, and chest CTs to see whether the model’s performance was limited to chest X-rays. 

    The team covered many bases in an attempt to explain the model’s behavior: differences in physical characteristics between different racial groups (body habitus, breast density), disease distribution (previous studies have shown that Black patients have a higher incidence for health issues like cardiac disease), location-specific or tissue specific differences, effects of societal bias and environmental stress, the ability of deep learning systems to detect race when multiple demographic and patient factors were combined, and if specific image regions contributed to recognizing race. 

    What emerged was truly staggering: The ability of the models to predict race from diagnostic labels alone was much lower than the chest X-ray image-based models. 

    For example, the bone density test used images where the thicker part of the bone appeared white, and the thinner part appeared more gray or translucent. Scientists assumed that since Black people generally have higher bone mineral density, the color differences helped the AI models to detect race. To cut that off, they clipped the images with a filter, so the model couldn’t color differences. It turned out that cutting off the color supply didn’t faze the model — it still could accurately predict races. (The “Area Under the Curve” value, meaning the measure of the accuracy of a quantitative diagnostic test, was 0.94–0.96). As such, the learned features of the model appeared to rely on all regions of the image, meaning that controlling this type of algorithmic behavior presents a messy, challenging problem. 

    The scientists acknowledge limited availability of racial identity labels, which caused them to focus on Asian, Black, and white populations, and that their ground truth was a self-reported detail. Other forthcoming work will include potentially looking at isolating different signals before image reconstruction, because, as with bone density experiments, they couldn’t account for residual bone tissue that was on the images. 

    Notably, other work by Ghassemi and Celi led by MIT student Hammaad Adam has found that models can also identify patient self-reported race from clinical notes even when those notes are stripped of explicit indicators of race. Just as in this work, human experts are not able to accurately predict patient race from the same redacted clinical notes.

    “We need to bring social scientists into the picture. Domain experts, which are usually the clinicians, public health practitioners, computer scientists, and engineers are not enough. Health care is a social-cultural problem just as much as it’s a medical problem. We need another group of experts to weigh in and to provide input and feedback on how we design, develop, deploy, and evaluate these algorithms,” says Celi. “We need to also ask the data scientists, before any exploration of the data, are there disparities? Which patient groups are marginalized? What are the drivers of those disparities? Is it access to care? Is it from the subjectivity of the care providers? If we don’t understand that, we won’t have a chance of being able to identify the unintended consequences of the algorithms, and there’s no way we’ll be able to safeguard the algorithms from perpetuating biases.”

    “The fact that algorithms ‘see’ race, as the authors convincingly document, can be dangerous. But an important and related fact is that, when used carefully, algorithms can also work to counter bias,” says Ziad Obermeyer, associate professor at the University of California at Berkeley, whose research focuses on AI applied to health. “In our own work, led by computer scientist Emma Pierson at Cornell, we show that algorithms that learn from patients’ pain experiences can find new sources of knee pain in X-rays that disproportionately affect Black patients — and are disproportionately missed by radiologists. So just like any tool, algorithms can be a force for evil or a force for good — which one depends on us, and the choices we make when we build algorithms.”

    The work is supported, in part, by the National Institutes of Health. More

  • in

    Living better with algorithms

    Laboratory for Information and Decision Systems (LIDS) student Sarah Cen remembers the lecture that sent her down the track to an upstream question.

    At a talk on ethical artificial intelligence, the speaker brought up a variation on the famous trolley problem, which outlines a philosophical choice between two undesirable outcomes.

    The speaker’s scenario: Say a self-driving car is traveling down a narrow alley with an elderly woman walking on one side and a small child on the other, and no way to thread between both without a fatality. Who should the car hit?

    Then the speaker said: Let’s take a step back. Is this the question we should even be asking?

    That’s when things clicked for Cen. Instead of considering the point of impact, a self-driving car could have avoided choosing between two bad outcomes by making a decision earlier on — the speaker pointed out that, when entering the alley, the car could have determined that the space was narrow and slowed to a speed that would keep everyone safe.

    Recognizing that today’s AI safety approaches often resemble the trolley problem, focusing on downstream regulation such as liability after someone is left with no good choices, Cen wondered: What if we could design better upstream and downstream safeguards to such problems? This question has informed much of Cen’s work.

    “Engineering systems are not divorced from the social systems on which they intervene,” Cen says. Ignoring this fact risks creating tools that fail to be useful when deployed or, more worryingly, that are harmful.

    Cen arrived at LIDS in 2018 via a slightly roundabout route. She first got a taste for research during her undergraduate degree at Princeton University, where she majored in mechanical engineering. For her master’s degree, she changed course, working on radar solutions in mobile robotics (primarily for self-driving cars) at Oxford University. There, she developed an interest in AI algorithms, curious about when and why they misbehave. So, she came to MIT and LIDS for her doctoral research, working with Professor Devavrat Shah in the Department of Electrical Engineering and Computer Science, for a stronger theoretical grounding in information systems.

    Auditing social media algorithms

    Together with Shah and other collaborators, Cen has worked on a wide range of projects during her time at LIDS, many of which tie directly to her interest in the interactions between humans and computational systems. In one such project, Cen studies options for regulating social media. Her recent work provides a method for translating human-readable regulations into implementable audits.

    To get a sense of what this means, suppose that regulators require that any public health content — for example, on vaccines — not be vastly different for politically left- and right-leaning users. How should auditors check that a social media platform complies with this regulation? Can a platform be made to comply with the regulation without damaging its bottom line? And how does compliance affect the actual content that users do see?

    Designing an auditing procedure is difficult in large part because there are so many stakeholders when it comes to social media. Auditors have to inspect the algorithm without accessing sensitive user data. They also have to work around tricky trade secrets, which can prevent them from getting a close look at the very algorithm that they are auditing because these algorithms are legally protected. Other considerations come into play as well, such as balancing the removal of misinformation with the protection of free speech.

    To meet these challenges, Cen and Shah developed an auditing procedure that does not need more than black-box access to the social media algorithm (which respects trade secrets), does not remove content (which avoids issues of censorship), and does not require access to users (which preserves users’ privacy).

    In their design process, the team also analyzed the properties of their auditing procedure, finding that it ensures a desirable property they call decision robustness. As good news for the platform, they show that a platform can pass the audit without sacrificing profits. Interestingly, they also found the audit naturally incentivizes the platform to show users diverse content, which is known to help reduce the spread of misinformation, counteract echo chambers, and more.

    Who gets good outcomes and who gets bad ones?

    In another line of research, Cen looks at whether people can receive good long-term outcomes when they not only compete for resources, but also don’t know upfront what resources are best for them.

    Some platforms, such as job-search platforms or ride-sharing apps, are part of what is called a matching market, which uses an algorithm to match one set of individuals (such as workers or riders) with another (such as employers or drivers). In many cases, individuals have matching preferences that they learn through trial and error. In labor markets, for example, workers learn their preferences about what kinds of jobs they want, and employers learn their preferences about the qualifications they seek from workers.

    But learning can be disrupted by competition. If workers with a particular background are repeatedly denied jobs in tech because of high competition for tech jobs, for instance, they may never get the knowledge they need to make an informed decision about whether they want to work in tech. Similarly, tech employers may never see and learn what these workers could do if they were hired.

    Cen’s work examines this interaction between learning and competition, studying whether it is possible for individuals on both sides of the matching market to walk away happy.

    Modeling such matching markets, Cen and Shah found that it is indeed possible to get to a stable outcome (workers aren’t incentivized to leave the matching market), with low regret (workers are happy with their long-term outcomes), fairness (happiness is evenly distributed), and high social welfare.

    Interestingly, it’s not obvious that it’s possible to get stability, low regret, fairness, and high social welfare simultaneously.  So another important aspect of the research was uncovering when it is possible to achieve all four criteria at once and exploring the implications of those conditions.

    What is the effect of X on Y?

    For the next few years, though, Cen plans to work on a new project, studying how to quantify the effect of an action X on an outcome Y when it’s expensive — or impossible — to measure this effect, focusing in particular on systems that have complex social behaviors.

    For instance, when Covid-19 cases surged in the pandemic, many cities had to decide what restrictions to adopt, such as mask mandates, business closures, or stay-home orders. They had to act fast and balance public health with community and business needs, public spending, and a host of other considerations.

    Typically, in order to estimate the effect of restrictions on the rate of infection, one might compare the rates of infection in areas that underwent different interventions. If one county has a mask mandate while its neighboring county does not, one might think comparing the counties’ infection rates would reveal the effectiveness of mask mandates. 

    But of course, no county exists in a vacuum. If, for instance, people from both counties gather to watch a football game in the maskless county every week, people from both counties mix. These complex interactions matter, and Sarah plans to study questions of cause and effect in such settings.

    “We’re interested in how decisions or interventions affect an outcome of interest, such as how criminal justice reform affects incarceration rates or how an ad campaign might change the public’s behaviors,” Cen says.

    Cen has also applied the principles of promoting inclusivity to her work in the MIT community.

    As one of three co-presidents of the Graduate Women in MIT EECS student group, she helped organize the inaugural GW6 research summit featuring the research of women graduate students — not only to showcase positive role models to students, but also to highlight the many successful graduate women at MIT who are not to be underestimated.

    Whether in computing or in the community, a system taking steps to address bias is one that enjoys legitimacy and trust, Cen says. “Accountability, legitimacy, trust — these principles play crucial roles in society and, ultimately, will determine which systems endure with time.”  More

  • in

    On the road to cleaner, greener, and faster driving

    No one likes sitting at a red light. But signalized intersections aren’t just a minor nuisance for drivers; vehicles consume fuel and emit greenhouse gases while waiting for the light to change.

    What if motorists could time their trips so they arrive at the intersection when the light is green? While that might be just a lucky break for a human driver, it could be achieved more consistently by an autonomous vehicle that uses artificial intelligence to control its speed.

    In a new study, MIT researchers demonstrate a machine-learning approach that can learn to control a fleet of autonomous vehicles as they approach and travel through a signalized intersection in a way that keeps traffic flowing smoothly.

    Using simulations, they found that their approach reduces fuel consumption and emissions while improving average vehicle speed. The technique gets the best results if all cars on the road are autonomous, but even if only 25 percent use their control algorithm, it still leads to substantial fuel and emissions benefits.

    “This is a really interesting place to intervene. No one’s life is better because they were stuck at an intersection. With a lot of other climate change interventions, there is a quality-of-life difference that is expected, so there is a barrier to entry there. Here, the barrier is much lower,” says senior author Cathy Wu, the Gilbert W. Winslow Career Development Assistant Professor in the Department of Civil and Environmental Engineering and a member of the Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS).

    The lead author of the study is Vindula Jayawardana, a graduate student in LIDS and the Department of Electrical Engineering and Computer Science. The research will be presented at the European Control Conference.

    Intersection intricacies

    While humans may drive past a green light without giving it much thought, intersections can present billions of different scenarios depending on the number of lanes, how the signals operate, the number of vehicles and their speeds, the presence of pedestrians and cyclists, etc.

    Typical approaches for tackling intersection control problems use mathematical models to solve one simple, ideal intersection. That looks good on paper, but likely won’t hold up in the real world, where traffic patterns are often about as messy as they come.

    Wu and Jayawardana shifted gears and approached the problem using a model-free technique known as deep reinforcement learning. Reinforcement learning is a trial-and-error method where the control algorithm learns to make a sequence of decisions. It is rewarded when it finds a good sequence. With deep reinforcement learning, the algorithm leverages assumptions learned by a neural network to find shortcuts to good sequences, even if there are billions of possibilities.

    This is useful for solving a long-horizon problem like this; the control algorithm must issue upwards of 500 acceleration instructions to a vehicle over an extended time period, Wu explains.

    “And we have to get the sequence right before we know that we have done a good job of mitigating emissions and getting to the intersection at a good speed,” she adds.

    But there’s an additional wrinkle. The researchers want the system to learn a strategy that reduces fuel consumption and limits the impact on travel time. These goals can be conflicting.

    “To reduce travel time, we want the car to go fast, but to reduce emissions, we want the car to slow down or not move at all. Those competing rewards can be very confusing to the learning agent,” Wu says.

    While it is challenging to solve this problem in its full generality, the researchers employed a workaround using a technique known as reward shaping. With reward shaping, they give the system some domain knowledge it is unable to learn on its own. In this case, they penalized the system whenever the vehicle came to a complete stop, so it would learn to avoid that action.

    Traffic tests

    Once they developed an effective control algorithm, they evaluated it using a traffic simulation platform with a single intersection. The control algorithm is applied to a fleet of connected autonomous vehicles, which can communicate with upcoming traffic lights to receive signal phase and timing information and observe their immediate surroundings. The control algorithm tells each vehicle how to accelerate and decelerate.

    Their system didn’t create any stop-and-go traffic as vehicles approached the intersection. (Stop-and-go traffic occurs when cars are forced to come to a complete stop due to stopped traffic ahead). In simulations, more cars made it through in a single green phase, which outperformed a model that simulates human drivers. When compared to other optimization methods also designed to avoid stop-and-go traffic, their technique resulted in larger fuel consumption and emissions reductions. If every vehicle on the road is autonomous, their control system can reduce fuel consumption by 18 percent and carbon dioxide emissions by 25 percent, while boosting travel speeds by 20 percent.

    “A single intervention having 20 to 25 percent reduction in fuel or emissions is really incredible. But what I find interesting, and was really hoping to see, is this non-linear scaling. If we only control 25 percent of vehicles, that gives us 50 percent of the benefits in terms of fuel and emissions reduction. That means we don’t have to wait until we get to 100 percent autonomous vehicles to get benefits from this approach,” she says.

    Down the road, the researchers want to study interaction effects between multiple intersections. They also plan to explore how different intersection set-ups (number of lanes, signals, timings, etc.) can influence travel time, emissions, and fuel consumption. In addition, they intend to study how their control system could impact safety when autonomous vehicles and human drivers share the road. For instance, even though autonomous vehicles may drive differently than human drivers, slower roadways and roadways with more consistent speeds could improve safety, Wu says.

    While this work is still in its early stages, Wu sees this approach as one that could be more feasibly implemented in the near-term.

    “The aim in this work is to move the needle in sustainable mobility. We want to dream, as well, but these systems are big monsters of inertia. Identifying points of intervention that are small changes to the system but have significant impact is something that gets me up in the morning,” she says.  

    This work was supported, in part, by the MIT-IBM Watson AI Lab. More

  • in

    Technique protects privacy when making online recommendations

    Algorithms recommend products while we shop online or suggest songs we might like as we listen to music on streaming apps.

    These algorithms work by using personal information like our past purchases and browsing history to generate tailored recommendations. The sensitive nature of such data makes preserving privacy extremely important, but existing methods for solving this problem rely on heavy cryptographic tools requiring enormous amounts of computation and bandwidth.

    MIT researchers may have a better solution. They developed a privacy-preserving protocol that is so efficient it can run on a smartphone over a very slow network. Their technique safeguards personal data while ensuring recommendation results are accurate.

    In addition to user privacy, their protocol minimizes the unauthorized transfer of information from the database, known as leakage, even if a malicious agent tries to trick a database into revealing secret information.

    The new protocol could be especially useful in situations where data leaks could violate user privacy laws, like when a health care provider uses a patient’s medical history to search a database for other patients who had similar symptoms or when a company serves targeted advertisements to users under European privacy regulations.

    “This is a really hard problem. We relied on a whole string of cryptographic and algorithmic tricks to arrive at our protocol,” says Sacha Servan-Schreiber, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper that presents this new protocol.

    Servan-Schreiber wrote the paper with fellow CSAIL graduate student Simon Langowski and their advisor and senior author Srinivas Devadas, the Edwin Sibley Webster Professor of Electrical Engineering. The research will be presented at the IEEE Symposium on Security and Privacy.

    The data next door

    The technique at the heart of algorithmic recommendation engines is known as a nearest neighbor search, which involves finding the data point in a database that is closest to a query point. Data points that are mapped nearby share similar attributes and are called neighbors.

    These searches involve a server that is linked with an online database which contains concise representations of data point attributes. In the case of a music streaming service, those attributes, known as feature vectors, could be the genre or popularity of different songs.

    To find a song recommendation, the client (user) sends a query to the server that contains a certain feature vector, like a genre of music the user likes or a compressed history of their listening habits. The server then provides the ID of a feature vector in the database that is closest to the client’s query, without revealing the actual vector. In the case of music streaming, that ID would likely be a song title. The client learns the recommended song title without learning the feature vector associated with it.

    “The server has to be able to do this computation without seeing the numbers it is doing the computation on. It can’t actually see the features, but still needs to give you the closest thing in the database,” says Langowski.

    To achieve this, the researchers created a protocol that relies on two separate servers that access the same database. Using two servers makes the process more efficient and enables the use of a cryptographic technique known as private information retrieval. This technique allows a client to query a database without revealing what it is searching for, Servan-Schreiber explains.

    Overcoming security challenges

    But while private information retrieval is secure on the client side, it doesn’t provide database privacy on its own. The database offers a set of candidate vectors — possible nearest neighbors — for the client, which are typically winnowed down later by the client using brute force. However, doing so can reveal a lot about the database to the client. The additional privacy challenge is to prevent the client from learning those extra vectors. 

    The researchers employed a tuning technique that eliminates many of the extra vectors in the first place, and then used a different trick, which they call oblivious masking, to hide any additional data points except for the actual nearest neighbor. This efficiently preserves database privacy, so the client won’t learn anything about the feature vectors in the database.  

    Once they designed this protocol, they tested it with a nonprivate implementation on four real-world datasets to determine how to tune the algorithm to maximize accuracy. Then, they used their protocol to conduct private nearest neighbor search queries on those datasets.

    Their technique requires a few seconds of server processing time per query and less than 10 megabytes of communication between the client and servers, even with databases that contained more than 10 million items. By contrast, other secure methods can require gigabytes of communication or hours of computation time. With each query, their method achieved greater than 95 percent accuracy (meaning that nearly every time it found the actual approximate nearest neighbor to the query point). 

    The techniques they used to enable database privacy will thwart a malicious client even if it sends false queries to try and trick the server into leaking information.

    “A malicious client won’t learn much more information than an honest client following protocol. And it protects against malicious servers, too. If one deviates from protocol, you might not get the right result, but they will never learn what the client’s query was,” Langowski says.

    In the future, the researchers plan to adjust the protocol so it can preserve privacy using only one server. This could enable it to be applied in more real-world situations, since it would not require the use of two noncolluding entities (which don’t share information with each other) to manage the database.  

    “Nearest neighbor search undergirds many critical machine-learning driven applications, from providing users with content recommendations to classifying medical conditions. However, it typically requires sharing a lot of data with a central system to aggregate and enable the search,” says Bayan Bruss, head of applied machine-learning research at Capital One, who was not involved with this work. “This research provides a key step towards ensuring that the user receives the benefits from nearest neighbor search while having confidence that the central system will not use their data for other purposes.” More

  • in

    MIT to launch new Office of Research Computing and Data

    As the computing and data needs of MIT’s research community continue to grow — both in their quantity and complexity — the Institute is launching a new effort to ensure that researchers have access to the advanced computing resources and data management services they need to do their best work. 

    At the core of this effort is the creation of the new Office of Research Computing and Data (ORCD), to be led by Professor Peter Fisher, who will step down as head of the Department of Physics to serve as the office’s inaugural director. The office, which formally opens in September, will build on and replace the MIT Research Computing Project, an initiative supported by the Office of the Vice President for Research, which contributed in recent years to improving the computing resources available to MIT researchers.

    “Almost every scientific field makes use of research computing to carry out our mission at MIT — and computing needs vary between different research groups. In my world, high-energy physics experiments need large amounts of storage and many identical general-purpose CPUs, while astrophysical theorists simulating the formation of galaxy clusters need relatively little storage, but many CPUs with high-speed connections between them,” says Fisher, the Thomas A. Frank (1977) Professor of Physics, who will take up the mantle of ORCD director on Sept. 1.

    “I envision ORCD to be, at a minimum, a centralized system with a spectrum of different capabilities to allow our MIT researchers to start their projects and understand the computational resources needed to execute them,” Fisher adds.

    The Office of Research Computing and Data will provide services spanning hardware, software, and cloud solutions, including data storage and retrieval, and offer advice, training, documentation, and data curation for MIT’s research community. It will also work to develop innovative solutions that address emerging or highly specialized needs, and it will advance strategic collaborations with industry.

    The exceptional performance of MIT’s endowment last year has provided a unique opportunity for MIT to distribute endowment funds to accelerate progress on an array of Institute priorities in fiscal year 2023, beginning July 1, 2022. On the basis of community input and visiting committee feedback, MIT’s leadership identified research computing as one such priority, enabling the expanded effort that the Institute commenced today. Future operation of ORCD will incorporate a cost-recovery model.

    In his new role, Fisher will report to Maria Zuber, MIT’s vice president for research, and coordinate closely with MIT Information Systems and Technology (IS&T), MIT Libraries, and the deans of the five schools and the MIT Schwarzman College of Computing, among others. He will also work closely with Provost Cindy Barnhart.

    “I am thrilled that Peter has agreed to take on this important role,” says Zuber. “Under his leadership, I am confident that we’ll be able to build on the important progress of recent years to deliver to MIT researchers best-in-class infrastructure, services, and expertise so they can maximize the performance of their research.”

    MIT’s research computing capabilities have grown significantly in recent years. Ten years ago, the Institute joined with a number of other Massachusetts universities to establish the Massachusetts Green High-Performance Computing Center (MGHPCC) in Holyoke to provide the high-performance, low-carbon computing power necessary to carry out cutting-edge research while reducing its environmental impact. MIT’s capacity at the MGHPCC is now almost fully utilized, however, and an expansion is underway.

    The need for more advanced computing capacity is not the only issue to be addressed. Over the last decade, there have been considerable advances in cloud computing, which is increasingly used in research computing, requiring the Institute to take a new look at how it works with cloud services providers and then allocates cloud resources to departments, labs, and centers. And MIT’s longstanding model for research computing — which has been mostly decentralized — can lead to inefficiencies and inequities among departments, even as it offers flexibility.

    The Institute has been carefully assessing how to address these issues for several years, including in connection with the establishment of the MIT Schwarzman College of Computing. In August 2019, a college task force on computing infrastructure found a “campus-wide preference for an overarching organizational model of computing infrastructure that transcends a college or school and most logically falls under senior leadership.” The task force’s report also addressed the need for a better balance between centralized and decentralized research computing resources.

    “The needs for computing infrastructure and support vary considerably across disciplines,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and the Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “With the new Office of Research Computing and Data, the Institute is seizing the opportunity to transform its approach to supporting research computing and data, including not only hardware and cloud computing but also expertise. This move is a critical step forward in supporting MIT’s research and scholarship.”

    Over time, ORCD (pronounced “orchid”) aims to recruit a staff of professionals, including data scientists and engineers and system and hardware administrators, who will enhance, support, and maintain MIT’s research computing infrastructure, and ensure that all researchers on campus have access to a minimum level of advanced computing and data management.

    The new research computing and data effort is part of a broader push to modernize MIT’s information technology infrastructure and systems. “We are at an inflection point, where we have a significant opportunity to invest in core needs, replace or upgrade aging systems, and respond fully to the changing needs of our faculty, students, and staff,” says Mark Silis, MIT’s vice president for information systems and technology. “We are thrilled to have a new partner in the Office of Research Computing and Data as we embark on this important work.” More

  • in

    Artificial intelligence system learns concepts shared across video, audio, and text

    Humans observe the world through a combination of different modalities, like vision, hearing, and our understanding of language. Machines, on the other hand, interpret the world through data that algorithms can process.

    So, when a machine “sees” a photo, it must encode that photo into data it can use to perform a task like image classification. This process becomes more complicated when inputs come in multiple formats, like videos, audio clips, and images.

    “The main challenge here is, how can a machine align those different modalities? As humans, this is easy for us. We see a car and then hear the sound of a car driving by, and we know these are the same thing. But for machine learning, it is not that straightforward,” says Alexander Liu, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and first author of a paper tackling this problem. 

    Liu and his collaborators developed an artificial intelligence technique that learns to represent data in a way that captures concepts which are shared between visual and audio modalities. For instance, their method can learn that the action of a baby crying in a video is related to the spoken word “crying” in an audio clip.

    Using this knowledge, their machine-learning model can identify where a certain action is taking place in a video and label it.

    It performs better than other machine-learning methods at cross-modal retrieval tasks, which involve finding a piece of data, like a video, that matches a user’s query given in another form, like spoken language. Their model also makes it easier for users to see why the machine thinks the video it retrieved matches their query.

    This technique could someday be utilized to help robots learn about concepts in the world through perception, more like the way humans do.

    Joining Liu on the paper are CSAIL postdoc SouYoung Jin; grad students Cheng-I Jeff Lai and Andrew Rouditchenko; Aude Oliva, senior research scientist in CSAIL and MIT director of the MIT-IBM Watson AI Lab; and senior author James Glass, senior research scientist and head of the Spoken Language Systems Group in CSAIL. The research will be presented at the Annual Meeting of the Association for Computational Linguistics.

    Learning representations

    The researchers focus their work on representation learning, which is a form of machine learning that seeks to transform input data to make it easier to perform a task like classification or prediction.

    The representation learning model takes raw data, such as videos and their corresponding text captions, and encodes them by extracting features, or observations about objects and actions in the video. Then it maps those data points in a grid, known as an embedding space. The model clusters similar data together as single points in the grid. Each of these data points, or vectors, is represented by an individual word.

    For instance, a video clip of a person juggling might be mapped to a vector labeled “juggling.”

    The researchers constrain the model so it can only use 1,000 words to label vectors. The model can decide which actions or concepts it wants to encode into a single vector, but it can only use 1,000 vectors. The model chooses the words it thinks best represent the data.

    Rather than encoding data from different modalities onto separate grids, their method employs a shared embedding space where two modalities can be encoded together. This enables the model to learn the relationship between representations from two modalities, like video that shows a person juggling and an audio recording of someone saying “juggling.”

    To help the system process data from multiple modalities, they designed an algorithm that guides the machine to encode similar concepts into the same vector.

    “If there is a video about pigs, the model might assign the word ‘pig’ to one of the 1,000 vectors. Then if the model hears someone saying the word ‘pig’ in an audio clip, it should still use the same vector to encode that,” Liu explains.

    A better retriever

    They tested the model on cross-modal retrieval tasks using three datasets: a video-text dataset with video clips and text captions, a video-audio dataset with video clips and spoken audio captions, and an image-audio dataset with images and spoken audio captions.

    For example, in the video-audio dataset, the model chose 1,000 words to represent the actions in the videos. Then, when the researchers fed it audio queries, the model tried to find the clip that best matched those spoken words.

    “Just like a Google search, you type in some text and the machine tries to tell you the most relevant things you are searching for. Only we do this in the vector space,” Liu says.

    Not only was their technique more likely to find better matches than the models they compared it to, it is also easier to understand.

    Because the model could only use 1,000 total words to label vectors, a user can more see easily which words the machine used to conclude that the video and spoken words are similar. This could make the model easier to apply in real-world situations where it is vital that users understand how it makes decisions, Liu says.

    The model still has some limitations they hope to address in future work. For one, their research focused on data from two modalities at a time, but in the real world humans encounter many data modalities simultaneously, Liu says.

    “And we know 1,000 words works on this kind of dataset, but we don’t know if it can be generalized to a real-world problem,” he adds.

    Plus, the images and videos in their datasets contained simple objects or straightforward actions; real-world data are much messier. They also want to determine how well their method scales up when there is a wider diversity of inputs.

    This research was supported, in part, by the MIT-IBM Watson AI Lab and its member companies, Nexplore and Woodside, and by the MIT Lincoln Laboratory. More

  • in

    What words can convey

    From search engines to voice assistants, computers are getting better at understanding what we mean. That’s thanks to language-processing programs that make sense of a staggering number of words, without ever being told explicitly what those words mean. Such programs infer meaning instead through statistics — and a new study reveals that this computational approach can assign many kinds of information to a single word, just like the human brain.

    The study, published April 14 in the journal Nature Human Behavior, was co-led by Gabriel Grand, a graduate student in electrical engineering and computer science who is affiliated with MIT’s Computer Science and Artificial Intelligence Laboratory, and Idan Blank PhD ’16, an assistant professor at the University of California at Los Angeles. The work was supervised by McGovern Institute for Brain Research investigator Ev Fedorenko, a cognitive neuroscientist who studies how the human brain uses and understands language, and Francisco Pereira at the National Institute of Mental Health. Fedorenko says the rich knowledge her team was able to find within computational language models demonstrates just how much can be learned about the world through language alone.

    The research team began its analysis of statistics-based language processing models in 2015, when the approach was new. Such models derive meaning by analyzing how often pairs of words co-occur in texts and using those relationships to assess the similarities of words’ meanings. For example, such a program might conclude that “bread” and “apple” are more similar to one another than they are to “notebook,” because “bread” and “apple” are often found in proximity to words like “eat” or “snack,” whereas “notebook” is not.

    The models were clearly good at measuring words’ overall similarity to one another. But most words carry many kinds of information, and their similarities depend on which qualities are being evaluated. “Humans can come up with all these different mental scales to help organize their understanding of words,” explains Grand, a former undergraduate researcher in the Fedorenko lab. For example, he says, “dolphins and alligators might be similar in size, but one is much more dangerous than the other.”

    Grand and Blank, who was then a graduate student at the McGovern Institute, wanted to know whether the models captured that same nuance. And if they did, how was the information organized?

    To learn how the information in such a model stacked up to humans’ understanding of words, the team first asked human volunteers to score words along many different scales: Were the concepts those words conveyed big or small, safe or dangerous, wet or dry? Then, having mapped where people position different words along these scales, they looked to see whether language processing models did the same.

    Grand explains that distributional semantic models use co-occurrence statistics to organize words into a huge, multidimensional matrix. The more similar words are to one another, the closer they are within that space. The dimensions of the space are vast, and there is no inherent meaning built into its structure. “In these word embeddings, there are hundreds of dimensions, and we have no idea what any dimension means,” he says. “We’re really trying to peer into this black box and say, ‘is there structure in here?’”

    Specifically, they asked whether the semantic scales they had asked their volunteers use were represented in the model. So they looked to see where words in the space lined up along vectors defined by the extremes of those scales. Where did dolphins and tigers fall on line from “big” to “small,” for example? And were they closer together along that line than they were on a line representing danger (“safe” to “dangerous”)?

    Across more than 50 sets of world categories and semantic scales, they found that the model had organized words very much like the human volunteers. Dolphins and tigers were judged to be similar in terms of size, but far apart on scales measuring danger or wetness. The model had organized the words in a way that represented many kinds of meaning — and it had done so based entirely on the words’ co-occurrences.

    That, Fedorenko says, tells us something about the power of language. “The fact that we can recover so much of this rich semantic information from just these simple word co-occurrence statistics suggests that this is one very powerful source of learning about things that you may not even have direct perceptual experience with.” More

  • in

    Engineers use artificial intelligence to capture the complexity of breaking waves

    Waves break once they swell to a critical height, before cresting and crashing into a spray of droplets and bubbles. These waves can be as large as a surfer’s point break and as small as a gentle ripple rolling to shore. For decades, the dynamics of how and when a wave breaks have been too complex to predict.

    Now, MIT engineers have found a new way to model how waves break. The team used machine learning along with data from wave-tank experiments to tweak equations that have traditionally been used to predict wave behavior. Engineers typically rely on such equations to help them design resilient offshore platforms and structures. But until now, the equations have not been able to capture the complexity of breaking waves.

    The updated model made more accurate predictions of how and when waves break, the researchers found. For instance, the model estimated a wave’s steepness just before breaking, and its energy and frequency after breaking, more accurately than the conventional wave equations.

    Their results, published today in the journal Nature Communications, will help scientists understand how a breaking wave affects the water around it. Knowing precisely how these waves interact can help hone the design of offshore structures. It can also improve predictions for how the ocean interacts with the atmosphere. Having better estimates of how waves break can help scientists predict, for instance, how much carbon dioxide and other atmospheric gases the ocean can absorb.

    “Wave breaking is what puts air into the ocean,” says study author Themis Sapsis, an associate professor of mechanical and ocean engineering and an affiliate of the Institute for Data, Systems, and Society at MIT. “It may sound like a detail, but if you multiply its effect over the area of the entire ocean, wave breaking starts becoming fundamentally important to climate prediction.”

    The study’s co-authors include lead author and MIT postdoc Debbie Eeltink, Hubert Branger and Christopher Luneau of Aix-Marseille University, Amin Chabchoub of Kyoto University, Jerome Kasparian of the University of Geneva, and T.S. van den Bremer of Delft University of Technology.

    Learning tank

    To predict the dynamics of a breaking wave, scientists typically take one of two approaches: They either attempt to precisely simulate the wave at the scale of individual molecules of water and air, or they run experiments to try and characterize waves with actual measurements. The first approach is computationally expensive and difficult to simulate even over a small area; the second requires a huge amount of time to run enough experiments to yield statistically significant results.

    The MIT team instead borrowed pieces from both approaches to develop a more efficient and accurate model using machine learning. The researchers started with a set of equations that is considered the standard description of wave behavior. They aimed to improve the model by “training” the model on data of breaking waves from actual experiments.

    “We had a simple model that doesn’t capture wave breaking, and then we had the truth, meaning experiments that involve wave breaking,” Eeltink explains. “Then we wanted to use machine learning to learn the difference between the two.”

    The researchers obtained wave breaking data by running experiments in a 40-meter-long tank. The tank was fitted at one end with a paddle which the team used to initiate each wave. The team set the paddle to produce a breaking wave in the middle of the tank. Gauges along the length of the tank measured the water’s height as waves propagated down the tank.

    “It takes a lot of time to run these experiments,” Eeltink says. “Between each experiment you have to wait for the water to completely calm down before you launch the next experiment, otherwise they influence each other.”

    Safe harbor

    In all, the team ran about 250 experiments, the data from which they used to train a type of machine-learning algorithm known as a neural network. Specifically, the algorithm is trained to compare the real waves in experiments with the predicted waves in the simple model, and based on any differences between the two, the algorithm tunes the model to fit reality.

    After training the algorithm on their experimental data, the team introduced the model to entirely new data — in this case, measurements from two independent experiments, each run at separate wave tanks with different dimensions. In these tests, they found the updated model made more accurate predictions than the simple, untrained model, for instance making better estimates of a breaking wave’s steepness.

    The new model also captured an essential property of breaking waves known as the “downshift,” in which the frequency of a wave is shifted to a lower value. The speed of a wave depends on its frequency. For ocean waves, lower frequencies move faster than higher frequencies. Therefore, after the downshift, the wave will move faster. The new model predicts the change in frequency, before and after each breaking wave, which could be especially relevant in preparing for coastal storms.

    “When you want to forecast when high waves of a swell would reach a harbor, and you want to leave the harbor before those waves arrive, then if you get the wave frequency wrong, then the speed at which the waves are approaching is wrong,” Eeltink says.

    The team’s updated wave model is in the form of an open-source code that others could potentially use, for instance in climate simulations of the ocean’s potential to absorb carbon dioxide and other atmospheric gases. The code can also be worked into simulated tests of offshore platforms and coastal structures.

    “The number one purpose of this model is to predict what a wave will do,” Sapsis says. “If you don’t model wave breaking right, it would have tremendous implications for how structures behave. With this, you could simulate waves to help design structures better, more efficiently, and without huge safety factors.”

    This research is supported, in part, by the Swiss National Science Foundation, and by the U.S. Office of Naval Research. More