School of Engineering Archivi - technology-news.space - All about the world of technology!

Latest story

138 Shares119 Views

Method prevents an AI model from being overconfident about wrong answers

by Markus Andrews 31 July 2024, 04:00

People use large language models for a huge array of tasks, from translating an article to identifying financial fraud. However, despite the incredible capabilities and versatility of these models, they sometimes generate inaccurate responses.On top of that problem, the models can be overconfident about wrong answers or underconfident about correct ones, making it tough for a user to know when a model can be trusted.Researchers typically calibrate a machine-learning model to ensure its level of confidence lines up with its accuracy. A well-calibrated model should have less confidence about an incorrect prediction, and vice-versa. But because large language models (LLMs) can be applied to a seemingly endless collection of diverse tasks, traditional calibration methods are ineffective.Now, researchers from MIT and the MIT-IBM Watson AI Lab have introduced a calibration method tailored to large language models. Their method, called Thermometer, involves building a smaller, auxiliary model that runs on top of a large language model to calibrate it.Thermometer is more efficient than other approaches — requiring less power-hungry computation — while preserving the accuracy of the model and enabling it to produce better-calibrated responses on tasks it has not seen before.By enabling efficient calibration of an LLM for a variety of tasks, Thermometer could help users pinpoint situations where a model is overconfident about false predictions, ultimately preventing them from deploying that model in a situation where it may fail.“With Thermometer, we want to provide the user with a clear signal to tell them whether a model’s response is accurate or inaccurate, in a way that reflects the model’s uncertainty, so they know if that model is reliable,” says Maohao Shen, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on Thermometer.Shen is joined on the paper by Gregory Wornell, the Sumitomo Professor of Engineering who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory for Electronics, and is a member of the MIT-IBM Watson AI Lab; senior author Soumya Ghosh, a research staff member in the MIT-IBM Watson AI Lab; as well as others at MIT and the MIT-IBM Watson AI Lab. The research was recently presented at the International Conference on Machine Learning.Universal calibrationSince traditional machine-learning models are typically designed to perform a single task, calibrating them usually involves one task-specific method. On the other hand, since LLMs have the flexibility to perform many tasks, using a traditional method to calibrate that model for one task might hurt its performance on another task.Calibrating an LLM often involves sampling from the model multiple times to obtain different predictions and then aggregating these predictions to obtain better-calibrated confidence. However, because these models have billions of parameters, the computational costs of such approaches rapidly add up.“In a sense, large language models are universal because they can handle various tasks. So, we need a universal calibration method that can also handle many different tasks,” says Shen.With Thermometer, the researchers developed a versatile technique that leverages a classical calibration method called temperature scaling to efficiently calibrate an LLM for a new task.In this context, a “temperature” is a scaling parameter used to adjust a model’s confidence to be aligned with its prediction accuracy. Traditionally, one determines the right temperature using a labeled validation dataset of task-specific examples.Since LLMs are often applied to new tasks, labeled datasets can be nearly impossible to acquire. For instance, a user who wants to deploy an LLM to answer customer questions about a new product likely does not have a dataset containing such questions and answers.Instead of using a labeled dataset, the researchers train an auxiliary model that runs on top of an LLM to automatically predict the temperature needed to calibrate it for this new task.They use labeled datasets of a few representative tasks to train the Thermometer model, but then once it has been trained, it can generalize to new tasks in a similar category without the need for additional labeled data.A Thermometer model trained on a collection of multiple-choice question datasets, perhaps including one with algebra questions and one with medical questions, could be used to calibrate an LLM that will answer questions about geometry or biology, for instance.“The aspirational goal is for it to work on any task, but we are not quite there yet,” Ghosh says. The Thermometer model only needs to access a small part of the LLM’s inner workings to predict the right temperature that will calibrate its prediction for data points of a specific task. An efficient approachImportantly, the technique does not require multiple training runs and only slightly slows the LLM. Plus, since temperature scaling does not alter a model’s predictions, Thermometer preserves its accuracy.When they compared Thermometer to several baselines on multiple tasks, it consistently produced better-calibrated uncertainty measures while requiring much less computation.“As long as we train a Thermometer model on a sufficiently large number of tasks, it should be able to generalize well across any new task, just like a large language model, it is also a universal model,” Shen adds.The researchers also found that if they train a Thermometer model for a smaller LLM, it can be directly applied to calibrate a larger LLM within the same family.In the future, they want to adapt Thermometer for more complex text-generation tasks and apply the technique to even larger LLMs. The researchers also hope to quantify the diversity and number of labeled datasets one would need to train a Thermometer model so it can generalize to a new task.This research was funded, in part, by the MIT-IBM Watson AI Lab. More

More stories

138 Shares189 Views
in Data Management & Statistics
Study: When allocating scarce resources with AI, randomization can improve fairness
by Markus Andrews 24 July 2024, 04:00
Organizations are increasingly utilizing machine-learning models to allocate scarce resources or opportunities. For instance, such models can help companies screen resumes to choose job interview candidates or aid hospitals in ranking kidney transplant patients based on their likelihood of survival.When deploying a model, users typically strive to ensure its predictions are fair by reducing bias. This often involves techniques like adjusting the features a model uses to make decisions or calibrating the scores it generates.However, researchers from MIT and Northeastern University argue that these fairness methods are not sufficient to address structural injustices and inherent uncertainties. In a new paper, they show how randomizing a model’s decisions in a structured way can improve fairness in certain situations.For example, if multiple companies use the same machine-learning model to rank job interview candidates deterministically — without any randomization — then one deserving individual could be the bottom-ranked candidate for every job, perhaps due to how the model weighs answers provided in an online form. Introducing randomization into a model’s decisions could prevent one worthy person or group from always being denied a scarce resource, like a job interview.Through their analysis, the researchers found that randomization can be especially beneficial when a model’s decisions involve uncertainty or when the same group consistently receives negative decisions.They present a framework one could use to introduce a specific amount of randomization into a model’s decisions by allocating resources through a weighted lottery. This method, which an individual can tailor to fit their situation, can improve fairness without hurting the efficiency or accuracy of a model.“Even if you could make fair predictions, should you be deciding these social allocations of scarce resources or opportunities strictly off scores or rankings? As things scale, and we see more and more opportunities being decided by these algorithms, the inherent uncertainties in these scores can be amplified. We show that fairness may require some sort of randomization,” says Shomik Jain, a graduate student in the Institute for Data, Systems, and Society (IDSS) and lead author of the paper.Jain is joined on the paper by Kathleen Creel, assistant professor of philosophy and computer science at Northeastern University; and senior author Ashia Wilson, the Lister Brothers Career Development Professor in the Department of Electrical Engineering and Computer Science and a principal investigator in the Laboratory for Information and Decision Systems (LIDS). The research will be presented at the International Conference on Machine Learning.Considering claimsThis work builds off a previous paper in which the researchers explored harms that can occur when one uses deterministic systems at scale. They found that using a machine-learning model to deterministically allocate resources can amplify inequalities that exist in training data, which can reinforce bias and systemic inequality. “Randomization is a very useful concept in statistics, and to our delight, satisfies the fairness demands coming from both a systemic and individual point of view,” Wilson says.In this paper, they explored the question of when randomization can improve fairness. They framed their analysis around the ideas of philosopher John Broome, who wrote about the value of using lotteries to award scarce resources in a way that honors all claims of individuals.A person’s claim to a scarce resource, like a kidney transplant, can stem from merit, deservingness, or need. For instance, everyone has a right to life, and their claims on a kidney transplant may stem from that right, Wilson explains.“When you acknowledge that people have different claims to these scarce resources, fairness is going to require that we respect all claims of individuals. If we always give someone with a stronger claim the resource, is that fair?” Jain says.That sort of deterministic allocation could cause systemic exclusion or exacerbate patterned inequality, which occurs when receiving one allocation increases an individual’s likelihood of receiving future allocations. In addition, machine-learning models can make mistakes, and a deterministic approach could cause the same mistake to be repeated.Randomization can overcome these problems, but that doesn’t mean all decisions a model makes should be randomized equally.Structured randomizationThe researchers use a weighted lottery to adjust the level of randomization based on the amount of uncertainty involved in the model’s decision-making. A decision that is less certain should incorporate more randomization.“In kidney allocation, usually the planning is around projected lifespan, and that is deeply uncertain. If two patients are only five years apart, it becomes a lot harder to measure. We want to leverage that level of uncertainty to tailor the randomization,” Wilson says.The researchers used statistical uncertainty quantification methods to determine how much randomization is needed in different situations. They show that calibrated randomization can lead to fairer outcomes for individuals without significantly affecting the utility, or effectiveness, of the model.“There is a balance to be had between overall utility and respecting the rights of the individuals who are receiving a scarce resource, but oftentimes the tradeoff is relatively small,” says Wilson.However, the researchers emphasize there are situations where randomizing decisions would not improve fairness and could harm individuals, such as in criminal justice contexts.But there could be other areas where randomization can improve fairness, such as college admissions, and the researchers plan to study other use cases in future work. They also want to explore how randomization can affect other factors, such as competition or prices, and how it could be used to improve the robustness of machine-learning models.“We are hoping our paper is a first move toward illustrating that there might be a benefit to randomization. We are offering randomization as a tool. How much you are going to want to do it is going to be up to all the stakeholders in the allocation to decide. And, of course, how they decide is another research question all together,” says Wilson. More
163 Shares169 Views
in Data Management & Statistics
AI model identifies certain breast tumor stages likely to progress to invasive cancer
by Markus Andrews 22 July 2024, 18:00
Ductal carcinoma in situ (DCIS) is a type of preinvasive tumor that sometimes progresses to a highly deadly form of breast cancer. It accounts for about 25 percent of all breast cancer diagnoses.Because it is difficult for clinicians to determine the type and stage of DCIS, patients with DCIS are often overtreated. To address this, an interdisciplinary team of researchers from MIT and ETH Zurich developed an AI model that can identify the different stages of DCIS from a cheap and easy-to-obtain breast tissue image. Their model shows that both the state and arrangement of cells in a tissue sample are important for determining the stage of DCIS.Because such tissue images are so easy to obtain, the researchers were able to build one of the largest datasets of its kind, which they used to train and test their model. When they compared its predictions to conclusions of a pathologist, they found clear agreement in many instances.In the future, the model could be used as a tool to help clinicians streamline the diagnosis of simpler cases without the need for labor-intensive tests, giving them more time to evaluate cases where it is less clear if DCIS will become invasive.“We took the first step in understanding that we should be looking at the spatial organization of cells when diagnosing DCIS, and now we have developed a technique that is scalable. From here, we really need a prospective study. Working with a hospital and getting this all the way to the clinic will be an important step forward,” says Caroline Uhler, a professor in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS), who is also director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS).Uhler, co-corresponding author of a paper on this research, is joined by lead author Xinyi Zhang, a graduate student in EECS and the Eric and Wendy Schmidt Center; co-corresponding author GV Shivashankar, professor of mechogenomics at ETH Zurich jointly with the Paul Scherrer Institute; and others at MIT, ETH Zurich, and the University of Palermo in Italy. The open-access research was published July 20 in Nature Communications.Combining imaging with AI Between 30 and 50 percent of patients with DCIS develop a highly invasive stage of cancer, but researchers don’t know the biomarkers that could tell a clinician which tumors will progress.Researchers can use techniques like multiplexed staining or single-cell RNA sequencing to determine the stage of DCIS in tissue samples. However, these tests are too expensive to be performed widely, Shivashankar explains.In previous work, these researchers showed that a cheap imagining technique known as chromatin staining could be as informative as the much costlier single-cell RNA sequencing.For this research, they hypothesized that combining this single stain with a carefully designed machine-learning model could provide the same information about cancer stage as costlier techniques.First, they created a dataset containing 560 tissue sample images from 122 patients at three different stages of disease. They used this dataset to train an AI model that learns a representation of the state of each cell in a tissue sample image, which it uses to infer the stage of a patient’s cancer.However, not every cell is indicative of cancer, so the researchers had to aggregate them in a meaningful way.They designed the model to create clusters of cells in similar states, identifying eight states that are important markers of DCIS. Some cell states are more indicative of invasive cancer than others. The model determines the proportion of cells in each state in a tissue sample.Organization matters“But in cancer, the organization of cells also changes. We found that just having the proportions of cells in every state is not enough. You also need to understand how the cells are organized,” says Shivashankar.With this insight, they designed the model to consider proportion and arrangement of cell states, which significantly boosted its accuracy.“The interesting thing for us was seeing how much spatial organization matters. Previous studies had shown that cells which are close to the breast duct are important. But it is also important to consider which cells are close to which other cells,” says Zhang.When they compared the results of their model with samples evaluated by a pathologist, it had clear agreement in many instances. In cases that were not as clear-cut, the model could provide information about features in a tissue sample, like the organization of cells, that a pathologist could use in decision-making.This versatile model could also be adapted for use in other types of cancer, or even neurodegenerative conditions, which is one area the researchers are also currently exploring.“We have shown that, with the right AI techniques, this simple stain can be very powerful. There is still much more research to do, but we need to take the organization of cells into account in more of our studies,” Uhler says.This research was funded, in part, by the Eric and Wendy Schmidt Center at the Broad Institute, ETH Zurich, the Paul Scherrer Institute, the Swiss National Science Foundation, the U.S. National Institutes of Health, the U.S. Office of Naval Research, the MIT Jameel Clinic for Machine Learning and Health, the MIT-IBM Watson AI Lab, and a Simons Investigator Award. More
138 Shares109 Views
in Data Management & Statistics
How to assess a general-purpose AI model’s reliability before it’s deployed
by Markus Andrews 16 July 2024, 04:00
Foundation models are massive deep-learning models that have been pretrained on an enormous amount of general-purpose, unlabeled data. They can be applied to a variety of tasks, like generating images or answering customer questions.But these models, which serve as the backbone for powerful artificial intelligence tools like ChatGPT and DALL-E, can offer up incorrect or misleading information. In a safety-critical situation, such as a pedestrian approaching a self-driving car, these mistakes could have serious consequences.To help prevent such mistakes, researchers from MIT and the MIT-IBM Watson AI Lab developed a technique to estimate the reliability of foundation models before they are deployed to a specific task.They do this by considering a set of foundation models that are slightly different from one another. Then they use their algorithm to assess the consistency of the representations each model learns about the same test data point. If the representations are consistent, it means the model is reliable.When they compared their technique to state-of-the-art baseline methods, it was better at capturing the reliability of foundation models on a variety of downstream classification tasks.Someone could use this technique to decide if a model should be applied in a certain setting, without the need to test it on a real-world dataset. This could be especially useful when datasets may not be accessible due to privacy concerns, like in health care settings. In addition, the technique could be used to rank models based on reliability scores, enabling a user to select the best one for their task.“All models can be wrong, but models that know when they are wrong are more useful. The problem of quantifying uncertainty or reliability is more challenging for these foundation models because their abstract representations are difficult to compare. Our method allows one to quantify how reliable a representation model is for any given input data,” says senior author Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS).He is joined on a paper about the work by lead author Young-Jin Park, a LIDS graduate student; Hao Wang, a research scientist at the MIT-IBM Watson AI Lab; and Shervin Ardeshir, a senior research scientist at Netflix. The paper will be presented at the Conference on Uncertainty in Artificial Intelligence.Measuring consensusTraditional machine-learning models are trained to perform a specific task. These models typically make a concrete prediction based on an input. For instance, the model might tell you whether a certain image contains a cat or a dog. In this case, assessing reliability could be a matter of looking at the final prediction to see if the model is right.But foundation models are different. The model is pretrained using general data, in a setting where its creators don’t know all downstream tasks it will be applied to. Users adapt it to their specific tasks after it has already been trained.Unlike traditional machine-learning models, foundation models don’t give concrete outputs like “cat” or “dog” labels. Instead, they generate an abstract representation based on an input data point.To assess the reliability of a foundation model, the researchers used an ensemble approach by training several models which share many properties but are slightly different from one another.“Our idea is like measuring the consensus. If all those foundation models are giving consistent representations for any data in our dataset, then we can say this model is reliable,” Park says.But they ran into a problem: How could they compare abstract representations?“These models just output a vector, comprised of some numbers, so we can’t compare them easily,” he adds.They solved this problem using an idea called neighborhood consistency.For their approach, the researchers prepare a set of reliable reference points to test on the ensemble of models. Then, for each model, they investigate the reference points located near that model’s representation of the test point.By looking at the consistency of neighboring points, they can estimate the reliability of the models.Aligning the representationsFoundation models map data points to what is known as a representation space. One way to think about this space is as a sphere. Each model maps similar data points to the same part of its sphere, so images of cats go in one place and images of dogs go in another.But each model would map animals differently in its own sphere, so while cats may be grouped near the South Pole of one sphere, another model could map cats somewhere in the Northern Hemisphere.The researchers use the neighboring points like anchors to align those spheres so they can make the representations comparable. If a data point’s neighbors are consistent across multiple representations, then one should be confident about the reliability of the model’s output for that point.When they tested this approach on a wide range of classification tasks, they found that it was much more consistent than baselines. Plus, it wasn’t tripped up by challenging test points that caused other methods to fail.Moreover, their approach can be used to assess reliability for any input data, so one could evaluate how well a model works for a particular type of individual, such as a patient with certain characteristics.“Even if the models all have average performance overall, from an individual point of view, you’d prefer the one that works best for that individual,” Wang says.However, one limitation comes from the fact that they must train an ensemble of foundation models, which is computationally expensive. In the future, they plan to find more efficient ways to build multiple models, perhaps by using small perturbations of a single model.“With the current trend of using foundational models for their embeddings to support various downstream tasks — from fine-tuning to retrieval augmented generation — the topic of quantifying uncertainty at the representation level is increasingly important, but challenging, as embeddings on their own have no grounding. What matters instead is how embeddings of different inputs are related to one another, an idea that this work neatly captures through the proposed neighborhood consistency score,” says Marco Pavone, an associate professor in the Department of Aeronautics and Astronautics at Stanford University, who was not involved with this work. “This is a promising step towards high quality uncertainty quantifications for embedding models, and I’m excited to see future extensions which can operate without requiring model-ensembling to really enable this approach to scale to foundation-size models.”This work is funded, in part, by the MIT-IBM Watson AI Lab, MathWorks, and Amazon. More
63 Shares129 Views
in Data Management & Statistics
Machine learning and the microscope
by Markus Andrews 12 July 2024, 04:00
With recent advances in imaging, genomics and other technologies, the life sciences are awash in data. If a biologist is studying cells taken from the brain tissue of Alzheimer’s patients, for example, there could be any number of characteristics they want to investigate — a cell’s type, the genes it’s expressing, its location within the tissue, or more. However, while cells can now be probed experimentally using different kinds of measurements simultaneously, when it comes to analyzing the data, scientists usually can only work with one type of measurement at a time.Working with “multimodal” data, as it’s called, requires new computational tools, which is where Xinyi Zhang comes in.The fourth-year MIT PhD student is bridging machine learning and biology to understand fundamental biological principles, especially in areas where conventional methods have hit limitations. Working in the lab of MIT Professor Caroline Uhler in the Department of Electrical Engineering and Computer Science, the Laboratory for Information and Decision Systems, and the Institute for Data, Systems, and Society, and collaborating with researchers at the Eric and Wendy Schmidt Center at the Broad Institute and elsewhere, Zhang has led multiple efforts to build computational frameworks and principles for understanding the regulatory mechanisms of cells.“All of these are small steps toward the end goal of trying to answer how cells work, how tissues and organs work, why they have disease, and why they can sometimes be cured and sometimes not,” Zhang says.The activities Zhang pursues in her down time are no less ambitious. The list of hobbies she has taken up at the Institute include sailing, skiing, ice skating, rock climbing, performing with MIT’s Concert Choir, and flying single-engine planes. (She earned her pilot’s license in November 2022.)“I guess I like to go to places I’ve never been and do things I haven’t done before,” she says with signature understatement.Uhler, her advisor, says that Zhang’s quiet humility leads to a surprise “in every conversation.”“Every time, you learn something like, ‘Okay, so now she’s learning to fly,’” Uhler says. “It’s just amazing. Anything she does, she does for the right reasons. She wants to be good at the things she cares about, which I think is really exciting.”Zhang first became interested in biology as a high school student in Hangzhou, China. She liked that her teachers couldn’t answer her questions in biology class, which led her to see it as the “most interesting” topic to study.Her interest in biology eventually turned into an interest in bioengineering. After her parents, who were middle school teachers, suggested studying in the United States, she majored in the latter alongside electrical engineering and computer science as an undergraduate at the University of California at Berkeley.Zhang was ready to dive straight into MIT’s EECS PhD program after graduating in 2020, but the Covid-19 pandemic delayed her first year. Despite that, in December 2022, she, Uhler, and two other co-authors published a paper in Nature Communications.The groundwork for the paper was laid by Xiao Wang, one of the co-authors. She had previously done work with the Broad Institute in developing a form of spatial cell analysis that combined multiple forms of cell imaging and gene expression for the same cell while also mapping out the cell’s place in the tissue sample it came from — something that had never been done before.This innovation had many potential applications, including enabling new ways of tracking the progression of various diseases, but there was no way to analyze all the multimodal data the method produced. In came Zhang, who became interested in designing a computational method that could.The team focused on chromatin staining as their imaging method of choice, which is relatively cheap but still reveals a great deal of information about cells. The next step was integrating the spatial analysis techniques developed by Wang, and to do that, Zhang began designing an autoencoder.Autoencoders are a type of neural network that typically encodes and shrinks large amounts of high-dimensional data, then expand the transformed data back to its original size. In this case, Zhang’s autoencoder did the reverse, taking the input data and making it higher-dimensional. This allowed them to combine data from different animals and remove technical variations that were not due to meaningful biological differences.In the paper, they used this technology, abbreviated as STACI, to identify how cells and tissues reveal the progression of Alzheimer’s disease when observed under a number of spatial and imaging techniques. The model can also be used to analyze any number of diseases, Zhang says.Given unlimited time and resources, her dream would be to build a fully complete model of human life. Unfortunately, both time and resources are limited. Her ambition isn’t, however, and she says she wants to keep applying her skills to solve the “most challenging questions that we don’t have the tools to answer.”She’s currently working on wrapping up a couple of projects, one focused on studying neurodegeneration by analyzing frontal cortex imaging and another on predicting protein images from protein sequences and chromatin imaging.“There are still many unanswered questions,” she says. “I want to pick questions that are biologically meaningful, that help us understand things we didn’t know before.” More
75 Shares169 Views
in Data Management & Statistics
When to trust an AI model
by Markus Andrews 11 July 2024, 18:45
Because machine-learning models can give false predictions, researchers often equip them with the ability to tell a user how confident they are about a certain decision. This is especially important in high-stake settings, such as when models are used to help identify disease in medical images or filter job applications.But a model’s uncertainty quantifications are only useful if they are accurate. If a model says it is 49 percent confident that a medical image shows a pleural effusion, then 49 percent of the time, the model should be right.MIT researchers have introduced a new approach that can improve uncertainty estimates in machine-learning models. Their method not only generates more accurate uncertainty estimates than other techniques, but does so more efficiently.In addition, because the technique is scalable, it can be applied to huge deep-learning models that are increasingly being deployed in health care and other safety-critical situations.This technique could give end users, many of whom lack machine-learning expertise, better information they can use to determine whether to trust a model’s predictions or if the model should be deployed for a particular task.“It is easy to see these models perform really well in scenarios where they are very good, and then assume they will be just as good in other scenarios. This makes it especially important to push this kind of work that seeks to better calibrate the uncertainty of these models to make sure they align with human notions of uncertainty,” says lead author Nathan Ng, a graduate student at the University of Toronto who is a visiting student at MIT.Ng wrote the paper with Roger Grosse, an assistant professor of computer science at the University of Toronto; and senior author Marzyeh Ghassemi, an associate professor in the Department of Electrical Engineering and Computer Science and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems. The research will be presented at the International Conference on Machine Learning.Quantifying uncertaintyUncertainty quantification methods often require complex statistical calculations that don’t scale well to machine-learning models with millions of parameters. These methods also require users to make assumptions about the model and data used to train it.The MIT researchers took a different approach. They use what is known as the minimum description length principle (MDL), which does not require the assumptions that can hamper the accuracy of other methods. MDL is used to better quantify and calibrate uncertainty for test points the model has been asked to label.The technique the researchers developed, known as IF-COMP, makes MDL fast enough to use with the kinds of large deep-learning models deployed in many real-world settings.MDL involves considering all possible labels a model could give a test point. If there are many alternative labels for this point that fit well, its confidence in the label it chose should decrease accordingly.“One way to understand how confident a model is would be to tell it some counterfactual information and see how likely it is to believe you,” Ng says.For example, consider a model that says a medical image shows a pleural effusion. If the researchers tell the model this image shows an edema, and it is willing to update its belief, then the model should be less confident in its original decision.With MDL, if a model is confident when it labels a datapoint, it should use a very short code to describe that point. If it is uncertain about its decision because the point could have many other labels, it uses a longer code to capture these possibilities.The amount of code used to label a datapoint is known as stochastic data complexity. If the researchers ask the model how willing it is to update its belief about a datapoint given contrary evidence, the stochastic data complexity should decrease if the model is confident.But testing each datapoint using MDL would require an enormous amount of computation.Speeding up the processWith IF-COMP, the researchers developed an approximation technique that can accurately estimate stochastic data complexity using a special function, known as an influence function. They also employed a statistical technique called temperature-scaling, which improves the calibration of the model’s outputs. This combination of influence functions and temperature-scaling enables high-quality approximations of the stochastic data complexity.In the end, IF-COMP can efficiently produce well-calibrated uncertainty quantifications that reflect a model’s true confidence. The technique can also determine whether the model has mislabeled certain data points or reveal which data points are outliers.The researchers tested their system on these three tasks and found that it was faster and more accurate than other methods.“It is really important to have some certainty that a model is well-calibrated, and there is a growing need to detect when a specific prediction doesn’t look quite right. Auditing tools are becoming more necessary in machine-learning problems as we use large amounts of unexamined data to make models that will be applied to human-facing problems,” Ghassemi says.IF-COMP is model-agnostic, so it can provide accurate uncertainty quantifications for many types of machine-learning models. This could enable it to be deployed in a wider range of real-world settings, ultimately helping more practitioners make better decisions.“People need to understand that these systems are very fallible and can make things up as they go. A model may look like it is highly confident, but there are a ton of different things it is willing to believe given evidence to the contrary,” Ng says.In the future, the researchers are interested in applying their approach to large language models and studying other potential use cases for the minimum description length principle. More
150 Shares199 Views
in Data Management & Statistics
MIT ARCLab announces winners of inaugural Prize for AI Innovation in Space
by Markus Andrews 11 July 2024, 17:55
Satellite density in Earth’s orbit has increased exponentially in recent years, with lower costs of small satellites allowing governments, researchers, and private companies to launch and operate some 2,877 satellites into orbit in 2023 alone. This includes increased geostationary Earth orbit (GEO) satellite activity, which brings technologies with global-scale impact, from broadband internet to climate surveillance. Along with the manifold benefits of these satellite-enabled technologies, however, come increased safety and security risks, as well as environmental concerns. More accurate and efficient methods of monitoring and modeling satellite behavior are urgently needed to prevent collisions and other disasters.To address this challenge, the MIT Astrodynamics, Space Robotic, and Controls Laboratory (ARCLab) launched the MIT ARCLab Prize for AI Innovation in Space: a first-of-its-kind competition asking contestants to harness AI to characterize satellites’ patterns of life (PoLs) — the long-term behavioral narrative of a satellite in orbit — using purely passively collected information. Following the call for participants last fall, 126 teams used machine learning to create algorithms to label and time-stamp the behavioral modes of GEO satellites over a six-month period, competing for accuracy and efficiency.With support from the U.S. Department of the Air Force-MIT AI Accelerator, the challenge offers a total of $25,000. A team of judges from ARCLab and MIT Lincoln Laboratory evaluated the submissions based on clarity, novelty, technical depth, and reproducibility, assigning each entry a score out of 100 points. Now the judges have announced the winners and runners-up:First prize: David Baldsiefen — Team Hawaii2024With a winning score of 96, Baldsiefen will be awarded $10,000 and is invited to join the ARCLab team in presenting at a poster session at the Advanced Maui Optical and Space Surveillance Technologies (AMOS) Conference in Hawaii this fall. One evaluator noted, “Clear and concise report, with very good ideas such as the label encoding of the localizer. Decisions on the architectures and the feature engineering are well reasoned. The code provided is also well documented and structured, allowing an easy reproducibility of the experimentation.”Second prize: Binh Tran, Christopher Yeung, Kurtis Johnson, Nathan Metzger — Team Millennial-IUPWith a score of 94.2, Y, Millennial-IUP will be awarded $5,000 and will also join the ARCLab team at the AMOS conference. One evaluator said, “The models chosen were sensible and justified, they made impressive efforts in efficiency gains… They used physics to inform their models and this appeared to be reproducible. Overall it was an easy to follow, concise report without much jargon.”Third Prize: Isaac Haik and Francois Porcher — Team QR_IsWith a score of 94, Haik and Porcher will share the third prize of $3,000 and will also be invited to the AMOS conference with the ARCLab team. One evaluator noted, “This informative and interesting report describes the combination of ML and signal processing techniques in a compelling way, assisted by informative plots, tables, and sequence diagrams. The author identifies and describes a modular approach to class detection and their assessment of feature utility, which they correctly identify is not evenly useful across classes… Any lack of mission expertise is made up for by a clear and detailed discussion of the benefits and pitfalls of the methods they used and discussion of what they learned.”The fourth- through seventh-place scoring teams will each receive $1,000 and a certificate of excellence.“The goal of this competition was to foster an interdisciplinary approach to problem-solving in the space domain by inviting AI development experts to apply their skills in this new context of orbital capacity. And all of our winning teams really delivered — they brought technical skill, novel approaches, and expertise to a very impressive round of submissions.” says Professor Richard Linares, who heads ARCLab.Active modeling with passive dataThroughout a GEO satellite’s time in orbit, operators issue commands to place them in various behavioral modes—station-keeping, longitudinal shifts, end-of-life behaviors, and so on. Satellite Patterns of Life (PoLs) describe on-orbit behavior composed of sequences of both natural and non-natural behavior modes.ARCLab has developed a groundbreaking benchmarking tool for geosynchronous satellite pattern-of-life characterization and created the Satellite Pattern-of-Life Identification Dataset (SPLID), comprising real and synthetic space object data. The challenge participants used this tool to create algorithms that use AI to map out the on-orbit behaviors of a satellite.The goal of the MIT ARCLab Prize for AI Innovation in Space is to encourage technologists and enthusiasts to bring innovation and new skills sets to well-established challenges in aerospace. The team aims to hold the competition in 2025 and 2026 to explore other topics and invite experts in AI to apply their skills to new challenges. More
200 Shares109 Views
in Data Management & Statistics
Community members receive 2024 MIT Excellence Awards, Collier Medal, and Staff Award for Distinction in Service
by Markus Andrews 11 July 2024, 17:00
On Wednesday, June 5, 13 individuals and four teams were awarded MIT Excellence Awards — the highest awards for staff at the Institute. Colleagues holding signs, waving pompoms, and cheering gathered in Kresge Auditorium to show their support for the honorees. In addition to the Excellence Awards, staff members were honored with the Collier Medal, the Staff Award for Distinction in Service, and the Gordon Y. Billard Award. The Collier Medal honors the memory of Officer Sean Collier, who gave his life protecting and serving MIT; it celebrates an individual or group whose actions demonstrate the importance of community. The Staff Award for Distinction in Service is presented to a staff member whose service results in a positive lasting impact on the Institute.The Gordon Y. Billard Award is given annually to staff, faculty, or an MIT-affiliated individual(s) who has given “special service of outstanding merit performed for the Institute.” This year, for the first time, this award was presented at the MIT Excellence Awards and Collier Medal celebration. The 2024 MIT Excellence Award recipients and their award categories are: Innovative Solutions Nanotechnology Material Core Staff, Koch Institute for Integrative Cancer Research, Office of the Vice President for Research (Margaret Bisher, Giovanni de Nola, David Mankus, and Dong Soo Yun)Bringing Out the Best Salvatore Ieni James Kelsey Lauren PouchakServing Our Community Megan Chester Alessandra Davy-Falconi David Randall Days Weekend Team, Department of Custodial Services, Department of Facilities: Karen Melisa Betancourth, Ana Guerra Chavarria, Yeshi Khando, Joao Pacheco, and Kevin Salazar IMES/HST Academic Office Team, Institute for Medical Engineering and Science, School of Engineering: Traci Anderson, Joseph R. Stein, and Laurie Ward Team Leriche, Department of Custodial Services, Department of Facilities: Anthony Anzalone, David Solomon Carrasco, Larrenton Forrest, Michael Leriche, and Joe VieiraEmbracing Diversity, Equity, and Inclusion Bhaskar Pant Jessica TamOutstanding Contributor Paul W. Barone Marcia G. Davidson Steven Kooi Tianjiao Lei Andrew H. Mack
2024 MIT Excellence Awards + Collier Medal Ceremony
The 2024 Collier Medal recipient was Benjamin B. Lewis, a graduate student in the Institute for Data, Systems and Society in the MIT Schwarzman College of Computing. Last spring, he founded the Cambridge branch of End Overdose, a nonprofit dedicated to reducing drug-related overdose deaths. Through his efforts, more than 600 members of the Greater Boston community, including many at MIT, have been trained to administer lifesaving treatment at critical moments.This year’s recipient of the 2024 Staff Award for Distinction in Service was Diego F. Arango (Department of Custodial Services, Department of Facilities), daytime custodian in Building 46. He was nominated by no fewer than 36 staff, faculty, students, and researchers for creating a positive working environment and for offering “help whenever, wherever, and to whomever needs it.”Three community members were honored with a 2024 Gordon Y. Billard AwardDeborah G. Douglas, senior director of collections and curator of science and technology, MIT MuseumRonald Hasseltine, assistant provost for research administration, Office of the Vice President for ResearchRichard K. Lester, vice provost for international activities and Japan Steel Industry Professor of Nuclear Science and Engineering, School of EngineeringPresenters included President Sally Kornbluth; MIT Chief of Police John DiFava and Deputy Chief Steven DeMarco; Vice President for Human Resources Ramona Allen; Executive Vice President and Treasurer Glen Shor; Provost Cynthia Barnhart; Lincoln Laboratory director Eric Evans; Chancellor Melissa Nobles; and Dean of the School of Engineering Anantha Chandrakasan.Visit the MIT Human Resources website for more information about the award recipients, categories, and to view photos and video of the event. More
150 Shares169 Views
in Data Management & Statistics
Study finds health risks in switching ships from diesel to ammonia fuel
by Markus Andrews 11 July 2024, 04:00
As container ships the size of city blocks cross the oceans to deliver cargo, their huge diesel engines emit large quantities of air pollutants that drive climate change and have human health impacts. It has been estimated that maritime shipping accounts for almost 3 percent of global carbon dioxide emissions and the industry’s negative impacts on air quality cause about 100,000 premature deaths each year.Decarbonizing shipping to reduce these detrimental effects is a goal of the International Maritime Organization, a U.N. agency that regulates maritime transport. One potential solution is switching the global fleet from fossil fuels to sustainable fuels such as ammonia, which could be nearly carbon-free when considering its production and use.But in a new study, an interdisciplinary team of researchers from MIT and elsewhere caution that burning ammonia for maritime fuel could worsen air quality further and lead to devastating public health impacts, unless it is adopted alongside strengthened emissions regulations.Ammonia combustion generates nitrous oxide (N2O), a greenhouse gas that is about 300 times more potent than carbon dioxide. It also emits nitrogen in the form of nitrogen oxides (NO and NO2, referred to as NOx), and unburnt ammonia may slip out, which eventually forms fine particulate matter in the atmosphere. These tiny particles can be inhaled deep into the lungs, causing health problems like heart attacks, strokes, and asthma.The new study indicates that, under current legislation, switching the global fleet to ammonia fuel could cause up to about 600,000 additional premature deaths each year. However, with stronger regulations and cleaner engine technology, the switch could lead to about 66,000 fewer premature deaths than currently caused by maritime shipping emissions, with far less impact on global warming.“Not all climate solutions are created equal. There is almost always some price to pay. We have to take a more holistic approach and consider all the costs and benefits of different climate solutions, rather than just their potential to decarbonize,” says Anthony Wong, a postdoc in the MIT Center for Global Change Science and lead author of the study.His co-authors include Noelle Selin, an MIT professor in the Institute for Data, Systems, and Society and the Department of Earth, Atmospheric and Planetary Sciences (EAPS); Sebastian Eastham, a former principal research scientist who is now a senior lecturer at Imperial College London; Christine Mounaïm-Rouselle, a professor at the University of Orléans in France; Yiqi Zhang, a researcher at the Hong Kong University of Science and Technology; and Florian Allroggen, a research scientist in the MIT Department of Aeronautics and Astronautics. The research appears this week in Environmental Research Letters.Greener, cleaner ammoniaTraditionally, ammonia is made by stripping hydrogen from natural gas and then combining it with nitrogen at extremely high temperatures. This process is often associated with a large carbon footprint. The maritime shipping industry is betting on the development of “green ammonia,” which is produced by using renewable energy to make hydrogen via electrolysis and to generate heat.“In theory, if you are burning green ammonia in a ship engine, the carbon emissions are almost zero,” Wong says.But even the greenest ammonia generates nitrous oxide (N2O), nitrogen oxides (NOx) when combusted, and some of the ammonia may slip out, unburnt. This nitrous oxide would escape into the atmosphere, where the greenhouse gas would remain for more than 100 years. At the same time, the nitrogen emitted as NOx and ammonia would fall to Earth, damaging fragile ecosystems. As these emissions are digested by bacteria, additional N2O is produced.NOx and ammonia also mix with gases in the air to form fine particulate matter. A primary contributor to air pollution, fine particulate matter kills an estimated 4 million people each year.“Saying that ammonia is a ‘clean’ fuel is a bit of an overstretch. Just because it is carbon-free doesn’t necessarily mean it is clean and good for public health,” Wong says.A multifaceted modelThe researchers wanted to paint the whole picture, capturing the environmental and public health impacts of switching the global fleet to ammonia fuel. To do so, they designed scenarios to measure how pollutant impacts change under certain technology and policy assumptions.From a technological point of view, they considered two ship engines. The first burns pure ammonia, which generates higher levels of unburnt ammonia but emits fewer nitrogen oxides. The second engine technology involves mixing ammonia with hydrogen to improve combustion and optimize the performance of a catalytic converter, which controls both nitrogen oxides and unburnt ammonia pollution.They also considered three policy scenarios: current regulations, which only limit NOx emissions in some parts of the world; a scenario that adds ammonia emission limits over North America and Western Europe; and a scenario that adds global limits on ammonia and NOx emissions.The researchers used a ship track model to calculate how pollutant emissions change under each scenario and then fed the results into an air quality model. The air quality model calculates the impact of ship emissions on particulate matter and ozone pollution. Finally, they estimated the effects on global public health.One of the biggest challenges came from a lack of real-world data, since no ammonia-powered ships are yet sailing the seas. Instead, the researchers relied on experimental ammonia combustion data from collaborators to build their model.“We had to come up with some clever ways to make that data useful and informative to both the technology and regulatory situations,” he says.A range of outcomesIn the end, they found that with no new regulations and ship engines that burn pure ammonia, switching the entire fleet would cause 681,000 additional premature deaths each year.“While a scenario with no new regulations is not very realistic, it serves as a good warning of how dangerous ammonia emissions could be. And unlike NOx, ammonia emissions from shipping are currently unregulated,” Wong says.However, even without new regulations, using cleaner engine technology would cut the number of premature deaths down to about 80,000, which is about 20,000 fewer than are currently attributed to maritime shipping emissions. With stronger global regulations and cleaner engine technology, the number of people killed by air pollution from shipping could be reduced by about 66,000.“The results of this study show the importance of developing policies alongside new technologies,” Selin says. “There is a potential for ammonia in shipping to be beneficial for both climate and air quality, but that requires that regulations be designed to address the entire range of potential impacts, including both climate and air quality.”Ammonia’s air quality impacts would not be felt uniformly across the globe, and addressing them fully would require coordinated strategies across very different contexts. Most premature deaths would occur in East Asia, since air quality regulations are less stringent in this region. Higher levels of existing air pollution cause the formation of more particulate matter from ammonia emissions. In addition, shipping volume over East Asia is far greater than elsewhere on Earth, compounding these negative effects.In the future, the researchers want to continue refining their analysis. They hope to use these findings as a starting point to urge the marine industry to share engine data they can use to better evaluate air quality and climate impacts. They also hope to inform policymakers about the importance and urgency of updating shipping emission regulations.This research was funded by the MIT Climate and Sustainability Consortium. More

School of Engineering

Latest story

Method prevents an AI model from being overconfident about wrong answers

More stories

Study: When allocating scarce resources with AI, randomization can improve fairness

AI model identifies certain breast tumor stages likely to progress to invasive cancer

How to assess a general-purpose AI model’s reliability before it’s deployed

Machine learning and the microscope

When to trust an AI model

MIT ARCLab announces winners of inaugural Prize for AI Innovation in Space

Community members receive 2024 MIT Excellence Awards, Collier Medal, and Staff Award for Distinction in Service

Study finds health risks in switching ships from diesel to ammonia fuel

ITALIAN LANGUAGE

ENGLISH LANGUAGE