Algorithms Archivi - technology-news.space - All about the world of technology!

Latest story

138 Shares119 Views

Method prevents an AI model from being overconfident about wrong answers

by Markus Andrews 31 July 2024, 04:00

People use large language models for a huge array of tasks, from translating an article to identifying financial fraud. However, despite the incredible capabilities and versatility of these models, they sometimes generate inaccurate responses.On top of that problem, the models can be overconfident about wrong answers or underconfident about correct ones, making it tough for a user to know when a model can be trusted.Researchers typically calibrate a machine-learning model to ensure its level of confidence lines up with its accuracy. A well-calibrated model should have less confidence about an incorrect prediction, and vice-versa. But because large language models (LLMs) can be applied to a seemingly endless collection of diverse tasks, traditional calibration methods are ineffective.Now, researchers from MIT and the MIT-IBM Watson AI Lab have introduced a calibration method tailored to large language models. Their method, called Thermometer, involves building a smaller, auxiliary model that runs on top of a large language model to calibrate it.Thermometer is more efficient than other approaches — requiring less power-hungry computation — while preserving the accuracy of the model and enabling it to produce better-calibrated responses on tasks it has not seen before.By enabling efficient calibration of an LLM for a variety of tasks, Thermometer could help users pinpoint situations where a model is overconfident about false predictions, ultimately preventing them from deploying that model in a situation where it may fail.“With Thermometer, we want to provide the user with a clear signal to tell them whether a model’s response is accurate or inaccurate, in a way that reflects the model’s uncertainty, so they know if that model is reliable,” says Maohao Shen, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on Thermometer.Shen is joined on the paper by Gregory Wornell, the Sumitomo Professor of Engineering who leads the Signals, Information, and Algorithms Laboratory in the Research Laboratory for Electronics, and is a member of the MIT-IBM Watson AI Lab; senior author Soumya Ghosh, a research staff member in the MIT-IBM Watson AI Lab; as well as others at MIT and the MIT-IBM Watson AI Lab. The research was recently presented at the International Conference on Machine Learning.Universal calibrationSince traditional machine-learning models are typically designed to perform a single task, calibrating them usually involves one task-specific method. On the other hand, since LLMs have the flexibility to perform many tasks, using a traditional method to calibrate that model for one task might hurt its performance on another task.Calibrating an LLM often involves sampling from the model multiple times to obtain different predictions and then aggregating these predictions to obtain better-calibrated confidence. However, because these models have billions of parameters, the computational costs of such approaches rapidly add up.“In a sense, large language models are universal because they can handle various tasks. So, we need a universal calibration method that can also handle many different tasks,” says Shen.With Thermometer, the researchers developed a versatile technique that leverages a classical calibration method called temperature scaling to efficiently calibrate an LLM for a new task.In this context, a “temperature” is a scaling parameter used to adjust a model’s confidence to be aligned with its prediction accuracy. Traditionally, one determines the right temperature using a labeled validation dataset of task-specific examples.Since LLMs are often applied to new tasks, labeled datasets can be nearly impossible to acquire. For instance, a user who wants to deploy an LLM to answer customer questions about a new product likely does not have a dataset containing such questions and answers.Instead of using a labeled dataset, the researchers train an auxiliary model that runs on top of an LLM to automatically predict the temperature needed to calibrate it for this new task.They use labeled datasets of a few representative tasks to train the Thermometer model, but then once it has been trained, it can generalize to new tasks in a similar category without the need for additional labeled data.A Thermometer model trained on a collection of multiple-choice question datasets, perhaps including one with algebra questions and one with medical questions, could be used to calibrate an LLM that will answer questions about geometry or biology, for instance.“The aspirational goal is for it to work on any task, but we are not quite there yet,” Ghosh says. The Thermometer model only needs to access a small part of the LLM’s inner workings to predict the right temperature that will calibrate its prediction for data points of a specific task. An efficient approachImportantly, the technique does not require multiple training runs and only slightly slows the LLM. Plus, since temperature scaling does not alter a model’s predictions, Thermometer preserves its accuracy.When they compared Thermometer to several baselines on multiple tasks, it consistently produced better-calibrated uncertainty measures while requiring much less computation.“As long as we train a Thermometer model on a sufficiently large number of tasks, it should be able to generalize well across any new task, just like a large language model, it is also a universal model,” Shen adds.The researchers also found that if they train a Thermometer model for a smaller LLM, it can be directly applied to calibrate a larger LLM within the same family.In the future, they want to adapt Thermometer for more complex text-generation tasks and apply the technique to even larger LLMs. The researchers also hope to quantify the diversity and number of labeled datasets one would need to train a Thermometer model so it can generalize to a new task.This research was funded, in part, by the MIT-IBM Watson AI Lab. More

More stories

138 Shares189 Views
in Data Management & Statistics
Study: When allocating scarce resources with AI, randomization can improve fairness
by Markus Andrews 24 July 2024, 04:00
Organizations are increasingly utilizing machine-learning models to allocate scarce resources or opportunities. For instance, such models can help companies screen resumes to choose job interview candidates or aid hospitals in ranking kidney transplant patients based on their likelihood of survival.When deploying a model, users typically strive to ensure its predictions are fair by reducing bias. This often involves techniques like adjusting the features a model uses to make decisions or calibrating the scores it generates.However, researchers from MIT and Northeastern University argue that these fairness methods are not sufficient to address structural injustices and inherent uncertainties. In a new paper, they show how randomizing a model’s decisions in a structured way can improve fairness in certain situations.For example, if multiple companies use the same machine-learning model to rank job interview candidates deterministically — without any randomization — then one deserving individual could be the bottom-ranked candidate for every job, perhaps due to how the model weighs answers provided in an online form. Introducing randomization into a model’s decisions could prevent one worthy person or group from always being denied a scarce resource, like a job interview.Through their analysis, the researchers found that randomization can be especially beneficial when a model’s decisions involve uncertainty or when the same group consistently receives negative decisions.They present a framework one could use to introduce a specific amount of randomization into a model’s decisions by allocating resources through a weighted lottery. This method, which an individual can tailor to fit their situation, can improve fairness without hurting the efficiency or accuracy of a model.“Even if you could make fair predictions, should you be deciding these social allocations of scarce resources or opportunities strictly off scores or rankings? As things scale, and we see more and more opportunities being decided by these algorithms, the inherent uncertainties in these scores can be amplified. We show that fairness may require some sort of randomization,” says Shomik Jain, a graduate student in the Institute for Data, Systems, and Society (IDSS) and lead author of the paper.Jain is joined on the paper by Kathleen Creel, assistant professor of philosophy and computer science at Northeastern University; and senior author Ashia Wilson, the Lister Brothers Career Development Professor in the Department of Electrical Engineering and Computer Science and a principal investigator in the Laboratory for Information and Decision Systems (LIDS). The research will be presented at the International Conference on Machine Learning.Considering claimsThis work builds off a previous paper in which the researchers explored harms that can occur when one uses deterministic systems at scale. They found that using a machine-learning model to deterministically allocate resources can amplify inequalities that exist in training data, which can reinforce bias and systemic inequality. “Randomization is a very useful concept in statistics, and to our delight, satisfies the fairness demands coming from both a systemic and individual point of view,” Wilson says.In this paper, they explored the question of when randomization can improve fairness. They framed their analysis around the ideas of philosopher John Broome, who wrote about the value of using lotteries to award scarce resources in a way that honors all claims of individuals.A person’s claim to a scarce resource, like a kidney transplant, can stem from merit, deservingness, or need. For instance, everyone has a right to life, and their claims on a kidney transplant may stem from that right, Wilson explains.“When you acknowledge that people have different claims to these scarce resources, fairness is going to require that we respect all claims of individuals. If we always give someone with a stronger claim the resource, is that fair?” Jain says.That sort of deterministic allocation could cause systemic exclusion or exacerbate patterned inequality, which occurs when receiving one allocation increases an individual’s likelihood of receiving future allocations. In addition, machine-learning models can make mistakes, and a deterministic approach could cause the same mistake to be repeated.Randomization can overcome these problems, but that doesn’t mean all decisions a model makes should be randomized equally.Structured randomizationThe researchers use a weighted lottery to adjust the level of randomization based on the amount of uncertainty involved in the model’s decision-making. A decision that is less certain should incorporate more randomization.“In kidney allocation, usually the planning is around projected lifespan, and that is deeply uncertain. If two patients are only five years apart, it becomes a lot harder to measure. We want to leverage that level of uncertainty to tailor the randomization,” Wilson says.The researchers used statistical uncertainty quantification methods to determine how much randomization is needed in different situations. They show that calibrated randomization can lead to fairer outcomes for individuals without significantly affecting the utility, or effectiveness, of the model.“There is a balance to be had between overall utility and respecting the rights of the individuals who are receiving a scarce resource, but oftentimes the tradeoff is relatively small,” says Wilson.However, the researchers emphasize there are situations where randomizing decisions would not improve fairness and could harm individuals, such as in criminal justice contexts.But there could be other areas where randomization can improve fairness, such as college admissions, and the researchers plan to study other use cases in future work. They also want to explore how randomization can affect other factors, such as competition or prices, and how it could be used to improve the robustness of machine-learning models.“We are hoping our paper is a first move toward illustrating that there might be a benefit to randomization. We are offering randomization as a tool. How much you are going to want to do it is going to be up to all the stakeholders in the allocation to decide. And, of course, how they decide is another research question all together,” says Wilson. More
138 Shares109 Views
in Data Management & Statistics
How to assess a general-purpose AI model’s reliability before it’s deployed
by Markus Andrews 16 July 2024, 04:00
Foundation models are massive deep-learning models that have been pretrained on an enormous amount of general-purpose, unlabeled data. They can be applied to a variety of tasks, like generating images or answering customer questions.But these models, which serve as the backbone for powerful artificial intelligence tools like ChatGPT and DALL-E, can offer up incorrect or misleading information. In a safety-critical situation, such as a pedestrian approaching a self-driving car, these mistakes could have serious consequences.To help prevent such mistakes, researchers from MIT and the MIT-IBM Watson AI Lab developed a technique to estimate the reliability of foundation models before they are deployed to a specific task.They do this by considering a set of foundation models that are slightly different from one another. Then they use their algorithm to assess the consistency of the representations each model learns about the same test data point. If the representations are consistent, it means the model is reliable.When they compared their technique to state-of-the-art baseline methods, it was better at capturing the reliability of foundation models on a variety of downstream classification tasks.Someone could use this technique to decide if a model should be applied in a certain setting, without the need to test it on a real-world dataset. This could be especially useful when datasets may not be accessible due to privacy concerns, like in health care settings. In addition, the technique could be used to rank models based on reliability scores, enabling a user to select the best one for their task.“All models can be wrong, but models that know when they are wrong are more useful. The problem of quantifying uncertainty or reliability is more challenging for these foundation models because their abstract representations are difficult to compare. Our method allows one to quantify how reliable a representation model is for any given input data,” says senior author Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS).He is joined on a paper about the work by lead author Young-Jin Park, a LIDS graduate student; Hao Wang, a research scientist at the MIT-IBM Watson AI Lab; and Shervin Ardeshir, a senior research scientist at Netflix. The paper will be presented at the Conference on Uncertainty in Artificial Intelligence.Measuring consensusTraditional machine-learning models are trained to perform a specific task. These models typically make a concrete prediction based on an input. For instance, the model might tell you whether a certain image contains a cat or a dog. In this case, assessing reliability could be a matter of looking at the final prediction to see if the model is right.But foundation models are different. The model is pretrained using general data, in a setting where its creators don’t know all downstream tasks it will be applied to. Users adapt it to their specific tasks after it has already been trained.Unlike traditional machine-learning models, foundation models don’t give concrete outputs like “cat” or “dog” labels. Instead, they generate an abstract representation based on an input data point.To assess the reliability of a foundation model, the researchers used an ensemble approach by training several models which share many properties but are slightly different from one another.“Our idea is like measuring the consensus. If all those foundation models are giving consistent representations for any data in our dataset, then we can say this model is reliable,” Park says.But they ran into a problem: How could they compare abstract representations?“These models just output a vector, comprised of some numbers, so we can’t compare them easily,” he adds.They solved this problem using an idea called neighborhood consistency.For their approach, the researchers prepare a set of reliable reference points to test on the ensemble of models. Then, for each model, they investigate the reference points located near that model’s representation of the test point.By looking at the consistency of neighboring points, they can estimate the reliability of the models.Aligning the representationsFoundation models map data points to what is known as a representation space. One way to think about this space is as a sphere. Each model maps similar data points to the same part of its sphere, so images of cats go in one place and images of dogs go in another.But each model would map animals differently in its own sphere, so while cats may be grouped near the South Pole of one sphere, another model could map cats somewhere in the Northern Hemisphere.The researchers use the neighboring points like anchors to align those spheres so they can make the representations comparable. If a data point’s neighbors are consistent across multiple representations, then one should be confident about the reliability of the model’s output for that point.When they tested this approach on a wide range of classification tasks, they found that it was much more consistent than baselines. Plus, it wasn’t tripped up by challenging test points that caused other methods to fail.Moreover, their approach can be used to assess reliability for any input data, so one could evaluate how well a model works for a particular type of individual, such as a patient with certain characteristics.“Even if the models all have average performance overall, from an individual point of view, you’d prefer the one that works best for that individual,” Wang says.However, one limitation comes from the fact that they must train an ensemble of foundation models, which is computationally expensive. In the future, they plan to find more efficient ways to build multiple models, perhaps by using small perturbations of a single model.“With the current trend of using foundational models for their embeddings to support various downstream tasks — from fine-tuning to retrieval augmented generation — the topic of quantifying uncertainty at the representation level is increasingly important, but challenging, as embeddings on their own have no grounding. What matters instead is how embeddings of different inputs are related to one another, an idea that this work neatly captures through the proposed neighborhood consistency score,” says Marco Pavone, an associate professor in the Department of Aeronautics and Astronautics at Stanford University, who was not involved with this work. “This is a promising step towards high quality uncertainty quantifications for embedding models, and I’m excited to see future extensions which can operate without requiring model-ensembling to really enable this approach to scale to foundation-size models.”This work is funded, in part, by the MIT-IBM Watson AI Lab, MathWorks, and Amazon. More
125 Shares179 Views
in Data Management & Statistics
“They can see themselves shaping the world they live in”
by Markus Andrews 8 July 2024, 19:00
During the journey from the suburbs to the city, the tree canopy often dwindles down as skyscrapers rise up. A group of New England Innovation Academy students wondered why that is.“Our friend Victoria noticed that where we live in Marlborough there are lots of trees in our own backyards. But if you drive just 30 minutes to Boston, there are almost no trees,” said high school junior Ileana Fournier. “We were struck by that duality.”This inspired Fournier and her classmates Victoria Leeth and Jessie Magenyi to prototype a mobile app that illustrates Massachusetts deforestation trends for Day of AI, a free, hands-on curriculum developed by the MIT Responsible AI for Social Empowerment and Education (RAISE) initiative, headquartered in the MIT Media Lab and in collaboration with the MIT Schwarzman College of Computing and MIT Open Learning. They were among a group of 20 students from New England Innovation Academy who shared their projects during the 2024 Day of AI global celebration hosted with the Museum of Science.The Day of AI curriculum introduces K-12 students to artificial intelligence. Now in its third year, Day of AI enables students to improve their communities and collaborate on larger global challenges using AI. Fournier, Leeth, and Magenyi’s TreeSavers app falls under the Telling Climate Stories with Data module, one of four new climate-change-focused lessons.“We want you to be able to express yourselves creatively to use AI to solve problems with critical-thinking skills,” Cynthia Breazeal, director of MIT RAISE, dean for digital learning at MIT Open Learning, and professor of media arts and sciences, said during this year’s Day of AI global celebration at the Museum of Science. “We want you to have an ethical and responsible way to think about this really powerful, cool, and exciting technology.”Moving from understanding to actionDay of AI invites students to examine the intersection of AI and various disciplines, such as history, civics, computer science, math, and climate change. With the curriculum available year-round, more than 10,000 educators across 114 countries have brought Day of AI activities to their classrooms and homes.The curriculum gives students the agency to evaluate local issues and invent meaningful solutions. “We’re thinking about how to create tools that will allow kids to have direct access to data and have a personal connection that intersects with their lived experiences,” Robert Parks, curriculum developer at MIT RAISE, said at the Day of AI global celebration.Before this year, first-year Jeremie Kwapong said he knew very little about AI. “I was very intrigued,” he said. “I started to experiment with ChatGPT to see how it reacts. How close can I get this to human emotion? What is AI’s knowledge compared to a human’s knowledge?”In addition to helping students spark an interest in AI literacy, teachers around the world have told MIT RAISE that they want to use data science lessons to engage students in conversations about climate change. Therefore, Day of AI’s new hands-on projects use weather and climate change to show students why it’s important to develop a critical understanding of dataset design and collection when observing the world around them.“There is a lag between cause and effect in everyday lives,” said Parks. “Our goal is to demystify that, and allow kids to access data so they can see a long view of things.”Tools like MIT App Inventor — which allows anyone to create a mobile application — help students make sense of what they can learn from data. Fournier, Leeth, and Magenyi programmed TreeSavers in App Inventor to chart regional deforestation rates across Massachusetts, identify ongoing trends through statistical models, and predict environmental impact. The students put that “long view” of climate change into practice when developing TreeSavers’ interactive maps. Users can toggle between Massachusetts’s current tree cover, historical data, and future high-risk areas.Although AI provides fast answers, it doesn’t necessarily offer equitable solutions, said David Sittenfeld, director of the Center for the Environment at the Museum of Science. The Day of AI curriculum asks students to make decisions on sourcing data, ensuring unbiased data, and thinking responsibly about how findings could be used.“There’s an ethical concern about tracking people’s data,” said Ethan Jorda, a New England Innovation Academy student. His group used open-source data to program an app that helps users track and reduce their carbon footprint.Christine Cunningham, senior vice president of STEM Learning at the Museum of Science, believes students are prepared to use AI responsibly to make the world a better place. “They can see themselves shaping the world they live in,” said Cunningham. “Moving through from understanding to action, kids will never look at a bridge or a piece of plastic lying on the ground in the same way again.”Deepening collaboration on earth and beyondThe 2024 Day of AI speakers emphasized collaborative problem solving at the local, national, and global levels.“Through different ideas and different perspectives, we’re going to get better solutions,” said Cunningham. “How do we start young enough that every child has a chance to both understand the world around them but also to move toward shaping the future?”Presenters from MIT, the Museum of Science, and NASA approached this question with a common goal — expanding STEM education to learners of all ages and backgrounds.“We have been delighted to collaborate with the MIT RAISE team to bring this year’s Day of AI celebration to the Museum of Science,” says Meg Rosenburg, manager of operations at the Museum of Science Centers for Public Science Learning. “This opportunity to highlight the new climate modules for the curriculum not only perfectly aligns with the museum’s goals to focus on climate and active hope throughout our Year of the Earthshot initiative, but it has also allowed us to bring our teams together and grow a relationship that we are very excited to build upon in the future.”Rachel Connolly, systems integration and analysis lead for NASA’s Science Activation Program, showed the power of collaboration with the example of how human comprehension of Saturn’s appearance has evolved. From Galileo’s early telescope to the Cassini space probe, modern imaging of Saturn represents 400 years of science, technology, and math working together to further knowledge.“Technologies, and the engineers who built them, advance the questions we’re able to ask and therefore what we’re able to understand,” said Connolly, research scientist at MIT Media Lab.New England Innovation Academy students saw an opportunity for collaboration a little closer to home. Emmett Buck-Thompson, Jeff Cheng, and Max Hunt envisioned a social media app to connect volunteers with local charities. Their project was inspired by Buck-Thompson’s father’s difficulties finding volunteering opportunities, Hunt’s role as the president of the school’s Community Impact Club, and Cheng’s aspiration to reduce screen time for social media users. Using MIT App Inventor, their combined ideas led to a prototype with the potential to make a real-world impact in their community.The Day of AI curriculum teaches the mechanics of AI, ethical considerations and responsible uses, and interdisciplinary applications for different fields. It also empowers students to become creative problem solvers and engaged citizens in their communities and online. From supporting volunteer efforts to encouraging action for the state’s forests to tackling the global challenge of climate change, today’s students are becoming tomorrow’s leaders with Day of AI.“We want to empower you to know that this is a tool you can use to make your community better, to help people around you with this technology,” said Breazeal.Other Day of AI speakers included Tim Ritchie, president of the Museum of Science; Michael Lawrence Evans, program director of the Boston Mayor’s Office of New Urban Mechanics; Dava Newman, director of the MIT Media Lab; and Natalie Lao, executive director of the App Inventor Foundation. More
88 Shares149 Views
in Data Management & Statistics
MIT researchers introduce generative AI for databases
by Markus Andrews 8 July 2024, 04:00
A new tool makes it easier for database users to perform complicated statistical analyses of tabular data without the need to know what is going on behind the scenes.GenSQL, a generative AI system for databases, could help users make predictions, detect anomalies, guess missing values, fix errors, or generate synthetic data with just a few keystrokes.For instance, if the system were used to analyze medical data from a patient who has always had high blood pressure, it could catch a blood pressure reading that is low for that particular patient but would otherwise be in the normal range.GenSQL automatically integrates a tabular dataset and a generative probabilistic AI model, which can account for uncertainty and adjust their decision-making based on new data.Moreover, GenSQL can be used to produce and analyze synthetic data that mimic the real data in a database. This could be especially useful in situations where sensitive data cannot be shared, such as patient health records, or when real data are sparse.This new tool is built on top of SQL, a programming language for database creation and manipulation that was introduced in the late 1970s and is used by millions of developers worldwide.“Historically, SQL taught the business world what a computer could do. They didn’t have to write custom programs, they just had to ask questions of a database in high-level language. We think that, when we move from just querying data to asking questions of models and data, we are going to need an analogous language that teaches people the coherent questions you can ask a computer that has a probabilistic model of the data,” says Vikash Mansinghka ’05, MEng ’09, PhD ’09, senior author of a paper introducing GenSQL and a principal research scientist and leader of the Probabilistic Computing Project in the MIT Department of Brain and Cognitive Sciences.When the researchers compared GenSQL to popular, AI-based approaches for data analysis, they found that it was not only faster but also produced more accurate results. Importantly, the probabilistic models used by GenSQL are explainable, so users can read and edit them.“Looking at the data and trying to find some meaningful patterns by just using some simple statistical rules might miss important interactions. You really want to capture the correlations and the dependencies of the variables, which can be quite complicated, in a model. With GenSQL, we want to enable a large set of users to query their data and their model without having to know all the details,” adds lead author Mathieu Huot, a research scientist in the Department of Brain and Cognitive Sciences and member of the Probabilistic Computing Project.They are joined on the paper by Matin Ghavami and Alexander Lew, MIT graduate students; Cameron Freer, a research scientist; Ulrich Schaechtel and Zane Shelby of Digital Garage; Martin Rinard, an MIT professor in the Department of Electrical Engineering and Computer Science and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Feras Saad ’15, MEng ’16, PhD ’22, an assistant professor at Carnegie Mellon University. The research was recently presented at the ACM Conference on Programming Language Design and Implementation.Combining models and databasesSQL, which stands for structured query language, is a programming language for storing and manipulating information in a database. In SQL, people can ask questions about data using keywords, such as by summing, filtering, or grouping database records.However, querying a model can provide deeper insights, since models can capture what data imply for an individual. For instance, a female developer who wonders if she is underpaid is likely more interested in what salary data mean for her individually than in trends from database records.The researchers noticed that SQL didn’t provide an effective way to incorporate probabilistic AI models, but at the same time, approaches that use probabilistic models to make inferences didn’t support complex database queries.They built GenSQL to fill this gap, enabling someone to query both a dataset and a probabilistic model using a straightforward yet powerful formal programming language.A GenSQL user uploads their data and probabilistic model, which the system automatically integrates. Then, she can run queries on data that also get input from the probabilistic model running behind the scenes. This not only enables more complex queries but can also provide more accurate answers.For instance, a query in GenSQL might be something like, “How likely is it that a developer from Seattle knows the programming language Rust?” Just looking at a correlation between columns in a database might miss subtle dependencies. Incorporating a probabilistic model can capture more complex interactions. Plus, the probabilistic models GenSQL utilizes are auditable, so people can see which data the model uses for decision-making. In addition, these models provide measures of calibrated uncertainty along with each answer.For instance, with this calibrated uncertainty, if one queries the model for predicted outcomes of different cancer treatments for a patient from a minority group that is underrepresented in the dataset, GenSQL would tell the user that it is uncertain, and how uncertain it is, rather than overconfidently advocating for the wrong treatment.Faster and more accurate resultsTo evaluate GenSQL, the researchers compared their system to popular baseline methods that use neural networks. GenSQL was between 1.7 and 6.8 times faster than these approaches, executing most queries in a few milliseconds while providing more accurate results.They also applied GenSQL in two case studies: one in which the system identified mislabeled clinical trial data and the other in which it generated accurate synthetic data that captured complex relationships in genomics.Next, the researchers want to apply GenSQL more broadly to conduct largescale modeling of human populations. With GenSQL, they can generate synthetic data to draw inferences about things like health and salary while controlling what information is used in the analysis.They also want to make GenSQL easier to use and more powerful by adding new optimizations and automation to the system. In the long run, the researchers want to enable users to make natural language queries in GenSQL. Their goal is to eventually develop a ChatGPT-like AI expert one could talk to about any database, which grounds its answers using GenSQL queries. This research is funded, in part, by the Defense Advanced Research Projects Agency (DARPA), Google, and the Siegel Family Foundation. More
88 Shares149 Views
in Data Management & Statistics
A data-driven approach to making better choices
by Markus Andrews 6 June 2024, 20:45
Imagine a world in which some important decision — a judge’s sentencing recommendation, a child’s treatment protocol, which person or business should receive a loan — was made more reliable because a well-designed algorithm helped a key decision-maker arrive at a better choice. A new MIT economics course is investigating these interesting possibilities.Class 14.163 (Algorithms and Behavioral Science) is a new cross-disciplinary course focused on behavioral economics, which studies the cognitive capacities and limitations of human beings. The course was co-taught this past spring by assistant professor of economics Ashesh Rambachan and visiting lecturer Sendhil Mullainathan.Rambachan studies the economic applications of machine learning, focusing on algorithmic tools that drive decision-making in the criminal justice system and consumer lending markets. He also develops methods for determining causation using cross-sectional and dynamic data.Mullainathan will soon join the MIT departments of Electrical Engineering and Computer Science and Economics as a professor. His research uses machine learning to understand complex problems in human behavior, social policy, and medicine. Mullainathan co-founded the Abdul Latif Jameel Poverty Action Lab (J-PAL) in 2003.The new course’s goals are both scientific (to understand people) and policy-driven (to improve society by improving decisions). Rambachan believes that machine-learning algorithms provide new tools for both the scientific and applied goals of behavioral economics.“The course investigates the deployment of computer science, artificial intelligence (AI), economics, and machine learning in service of improved outcomes and reduced instances of bias in decision-making,” Rambachan says.There are opportunities, Rambachan believes, for constantly evolving digital tools like AI, machine learning, and large language models (LLMs) to help reshape everything from discriminatory practices in criminal sentencing to health-care outcomes among underserved populations.Students learn how to use machine learning tools with three main objectives: to understand what they do and how they do it, to formalize behavioral economics insights so they compose well within machine learning tools, and to understand areas and topics where the integration of behavioral economics and algorithmic tools might be most fruitful.Students also produce ideas, develop associated research, and see the bigger picture. They’re led to understand where an insight fits and see where the broader research agenda is leading. Participants can think critically about what supervised LLMs can (and cannot) do, to understand how to integrate those capacities with the models and insights of behavioral economics, and to recognize the most fruitful areas for the application of what investigations uncover.The dangers of subjectivity and biasAccording to Rambachan, behavioral economics acknowledges that biases and mistakes exist throughout our choices, even absent algorithms. “The data used by our algorithms exist outside computer science and machine learning, and instead are often produced by people,” he continues. “Understanding behavioral economics is therefore essential to understanding the effects of algorithms and how to better build them.”Rambachan sought to make the course accessible regardless of attendees’ academic backgrounds. The class included advanced degree students from a variety of disciplines.By offering students a cross-disciplinary, data-driven approach to investigating and discovering ways in which algorithms might improve problem-solving and decision-making, Rambachan hopes to build a foundation on which to redesign existing systems of jurisprudence, health care, consumer lending, and industry, to name a few areas.“Understanding how data are generated can help us understand bias,” Rambachan says. “We can ask questions about producing a better outcome than what currently exists.”Useful tools for re-imagining social operationsEconomics doctoral student Jimmy Lin was skeptical about the claims Rambachan and Mullainathan made when the class began, but changed his mind as the course continued.“Ashesh and Sendhil started with two provocative claims: The future of behavioral science research will not exist without AI, and the future of AI research will not exist without behavioral science,” Lin says. “Over the course of the semester, they deepened my understanding of both fields and walked us through numerous examples of how economics informed AI research and vice versa.”Lin, who’d previously done research in computational biology, praised the instructors’ emphasis on the importance of a “producer mindset,” thinking about the next decade of research rather than the previous decade. “That’s especially important in an area as interdisciplinary and fast-moving as the intersection of AI and economics — there isn’t an old established literature, so you’re forced to ask new questions, invent new methods, and create new bridges,” he says.The speed of change to which Lin alludes is a draw for him, too. “We’re seeing black-box AI methods facilitate breakthroughs in math, biology, physics, and other scientific disciplines,” Lin says. “AI can change the way we approach intellectual discovery as researchers.”An interdisciplinary future for economics and social systemsStudying traditional economic tools and enhancing their value with AI may yield game-changing shifts in how institutions and organizations teach and empower leaders to make choices.“We’re learning to track shifts, to adjust frameworks and better understand how to deploy tools in service of a common language,” Rambachan says. “We must continually interrogate the intersection of human judgment, algorithms, AI, machine learning, and LLMs.”Lin enthusiastically recommended the course regardless of students’ backgrounds. “Anyone broadly interested in algorithms in society, applications of AI across academic disciplines, or AI as a paradigm for scientific discovery should take this class,” he says. “Every lecture felt like a goldmine of perspectives on research, novel application areas, and inspiration on how to produce new, exciting ideas.”The course, Rambachan says, argues that better-built algorithms can improve decision-making across disciplines. “By building connections between economics, computer science, and machine learning, perhaps we can automate the best of human choices to improve outcomes while minimizing or eliminating the worst,” he says.Lin remains excited about the course’s as-yet unexplored possibilities. “It’s a class that makes you excited about the future of research and your own role in it,” he says. More
63 Shares99 Views
in Data Management & Statistics
A community collaboration for progress
by Markus Andrews 22 May 2024, 20:30
While decades of discriminatory policies and practices continue to fuel the affordable housing crisis in the United States, less than three miles from the MIT campus exists a beacon of innovation and community empowerment.“We are very proud to continue MIT’s long-standing partnership with Camfield Estates,” says Catherine D’Ignazio, associate professor of urban science and planning. “Camfield has long been an incubator of creative ideas focused on uplifting their community.”D’Ignazio co-leads a research team focused on housing as part of the MIT Initiative for Combatting Systemic Racism (ICSR) led by the Institute for Data, Systems, and Society (IDSS). The group researches the uneven impacts of data, AI, and algorithmic systems on housing in the United States, as well as ways that these same tools could be used to address racial disparities. The Camfield Tenant Association is a research partner providing insight into the issue and relevant data, as well as opportunities for MIT researchers to solve real challenges and make a local impact.
Play video
MIT Initiative on Combatting Systemic Racism – Housing Video: MIT Sociotechnical Systems Research Center
Formerly known as “Camfield Gardens,” the 102-unit housing development in Roxbury, Massachusetts, was among the pioneering sites in the 1990s to engage in the U.S. Department of Housing and Urban Development’s (HUD) program aimed at revitalizing disrepaired public housing across the country. This also served as the catalyst for their collaboration with MIT, which began in the early 2000s.“The program gave Camfield the money and energy to tear everything on the site down and build it back up anew, in addition to allowing them to buy the property from the city for $1 and take full ownership of the site,” explains Nolen Scruggs, a master’s student in the MIT Department of Urban Studies and Planning (DUSP) who has worked with Camfield over the past few years as part of ICSR’s housing vertical team. “At the time, MIT graduate students helped start a ‘digital divide’ bridge gap program that later evolved into the tech lab that is still there today, continuing to enable residents to learn computer skills and things they might need to get a hand up.”Because of that early collaboration, Camfield Estates reached out to MIT in 2022 to start a new chapter of collaboration with students. Scruggs spent a few months building a team of students from Harvard University, Wentworth Institute of Technology, and MIT to work on a housing design project meant to help the Camfield Tenants Association prepare for their looming redevelopment needs.“One of the things that’s been really important to the work of the ICSR housing vertical is historical context,” says Peko Hosoi, a professor of mechanical engineering and mathematics who co-leads the ICSR Housing vertical with D’Ignazio. “We didn’t get to the place we are right now with housing in an instant. There’s a lot of things that have happened in the U.S. like redlining, predatory lending, and different ways of investing in infrastructure that add important contexts.”“Quantitative methods are a great way to look across macroscale phenomena, but our team recognizes and values qualitative and participatory methods as well, to get a more grounded picture of what community needs really are and what kinds of innovations can bubble up from communities themselves,” D’Ignazio adds. “This is where the partnership with Camfield Estates comes in, which Nolen has been leading.”Finding creative solutionsBefore coming to MIT, Scruggs, a proud New Yorker, worked on housing issues while interning for his local congressperson, House Minority Leader Hakeem Jeffries. He called residents to discuss their housing concerns, learning about the affordability issues that were making it hard for lower- and middle-income families to find places to live.“Having this behind-the-scenes experience set the stage for my involvement in Camfield,” Scruggs says, recalling his start at Camfield conducting participatory action research, meeting with Camfield seniors to discuss and capture their concerns.Scruggs says the biggest issue they have been trying to tackle with Camfield is twofold: creating more space for new residents while also helping current residents achieve their end goal of homeownership.“This speaks to some of the larger issues our group at ICSR is working on in terms of housing affordability,” he says. “With Camfield it is looking at where can people with Section 8 vouchers move, what limits do they have, and what barriers do they face — whether it’s through big tech systems, or individual preferences coming from landlords.”Scruggs adds, “The discrimination those people face while trying to find a house, lock it down, talk to a bank, etc. — it can be very, very difficult and discouraging.” Scruggs says one attempt to combat this issue would be through hiring a caseworker to assist people through the process — one of many ideas that came from a Camfield collaboration with the FHLBank Affordable Housing Development Competition.As part of the competition, the goal for Scruggs’s team was to help Camfield tenants understand all of their options and their potential trade-offs, so that in the end they can make informed decisions about what they want to do with their space.“So often redevelopment schemes don’t ensure people can come back.” Scruggs says. “There are specific design proposals being made to ensure that the structure of people’s lifestyles wouldn’t be disrupted.”Scruggs says that tentative recommendations discussed with tenant association president Paulette Ford include replacing the community center with a high-rise development that would increase the number of units available.“I think they are thinking really creatively about their options,” Hosoi says. “Paulette Ford, and her mother before her, have always referred to Camfield as a ‘hand up,’ with the idea that people come to Camfield to live until they can afford a home of their own locally.”Scruggs’s other partnership with Camfield involves working with MIT undergraduate Amelie Nagle as part of the Undergraduate Research Opportunities Program to create programing that will teach computer design and coding to Camfield community kids — in the very TechLab that goes back to MIT and Camfield’s first collaboration.“Nolen has a real commitment to community-led knowledge production,” says D’Ignazio. “It has been a pleasure to work with him and see how he takes all his urban planning skills (GIS, mapping, urban design, photography, and more) to work in respectful ways that foreground community innovation.”She adds: “We are hopeful that the process will yield some high-quality architectural and planning ideas, and help Camfield take the next step towards realizing their innovative vision.” More
100 Shares199 Views
in Data Management & Statistics
Fostering research, careers, and community in materials science
by Markus Andrews 1 May 2024, 20:25
Gabrielle Wood, a junior at Howard University majoring in chemical engineering, is on a mission to improve the sustainability and life cycles of natural resources and materials. Her work in the Materials Initiative for Comprehensive Research Opportunity (MICRO) program has given her hands-on experience with many different aspects of research, including MATLAB programming, experimental design, data analysis, figure-making, and scientific writing.Wood is also one of 10 undergraduates from 10 universities around the United States to participate in the first MICRO Summit earlier this year. The internship program, developed by the MIT Department of Materials Science and Engineering (DMSE), first launched in fall 2021. Now in its third year, the program continues to grow, providing even more opportunities for non-MIT undergraduate students — including the MICRO Summit and the program’s expansion to include Northwestern University.“I think one of the most valuable aspects of the MICRO program is the ability to do research long term with an experienced professor in materials science and engineering,” says Wood. “My school has limited opportunities for undergraduate research in sustainable polymers, so the MICRO program allowed me to gain valuable experience in this field, which I would not otherwise have.”Like Wood, Griheydi Garcia, a senior chemistry major at Manhattan College, values the exposure to materials science, especially since she is not able to learn as much about it at her home institution.“I learned a lot about crystallography and defects in materials through the MICRO curriculum, especially through videos,” says Garcia. “The research itself is very valuable, as well, because we get to apply what we’ve learned through the videos in the research we do remotely.”Expanding research opportunitiesFrom the beginning, the MICRO program was designed as a fully remote, rigorous education and mentoring program targeted toward students from underserved backgrounds interested in pursuing graduate school in materials science or related fields. Interns are matched with faculty to work on their specific research interests.Jessica Sandland ’99, PhD ’05, principal lecturer in DMSE and co-founder of MICRO, says that research projects for the interns are designed to be work that they can do remotely, such as developing a machine-learning algorithm or a data analysis approach.“It’s important to note that it’s not just about what the program and faculty are bringing to the student interns,” says Sandland, a member of the MIT Digital Learning Lab, a joint program between MIT Open Learning and the Institute’s academic departments. “The students are doing real research and work, and creating things of real value. It’s very much an exchange.”Cécile Chazot PhD ’22, now an assistant professor of materials science and engineering at Northwestern University, had helped to establish MICRO at MIT from the very beginning. Once at Northwestern, she quickly realized that expanding MICRO to Northwestern would offer even more research opportunities to interns than by relying on MIT alone — leveraging the university’s strong materials science and engineering department, as well as offering resources for biomaterials research through Northwestern’s medical school. The program received funding from 3M and officially launched at Northwestern in fall 2023. Approximately half of the MICRO interns are now in the program with MIT and half are with Northwestern. Wood and Garcia both participate in the program via Northwestern.“By expanding to another school, we’ve been able to have interns work with a much broader range of research projects,” says Chazot. “It has become easier for us to place students with faculty and research that match their interests.”Building communityThe MICRO program received a Higher Education Innovation grant from the Abdul Latif Jameel World Education Lab, part of MIT Open Learning, to develop an in-person summit. In January 2024, interns visited MIT for three days of presentations, workshops, and campus tours — including a tour of the MIT.nano building — as well as various community-building activities.“A big part of MICRO is the community,” says Chazot. “A highlight of the summit was just seeing the students come together.”The summit also included panel discussions that allowed interns to gain insights and advice from graduate students and professionals. The graduate panel discussion included MIT graduate students Sam Figueroa (mechanical engineering), Isabella Caruso (DMSE), and Eliana Feygin (DMSE). The career panel was led by Chazot and included Jatin Patil PhD ’23, head of product at SiTration; Maureen Reitman ’90, ScD ’93, group vice president and principal engineer at Exponent; Lucas Caretta PhD ’19, assistant professor of engineering at Brown University; Raquel D’Oyen ’90, who holds a PhD from Northwestern University and is a senior engineer at Raytheon; and Ashley Kaiser MS ’19, PhD ’21, senior process engineer at 6K.Students also had an opportunity to share their work with each other through research presentations. Their presentations covered a wide range of topics, including: developing a computer program to calculate solubility parameters for polymers used in textile manufacturing; performing a life-cycle analysis of a photonic chip and evaluating its environmental impact in comparison to a standard silicon microchip; and applying machine learning algorithms to scanning transmission electron microscopy images of CrSBr, a two-dimensional magnetic material. “The summit was wonderful and the best academic experience I have had as a first-year college student,” says MICRO intern Gabriella La Cour, who is pursuing a major in chemistry and dual degree biomedical engineering at Spelman College and participates in MICRO through MIT. “I got to meet so many students who were all in grades above me … and I learned a little about how to navigate college as an upperclassman.” “I actually have an extremely close friendship with one of the students, and we keep in touch regularly,” adds La Cour. “Professor Chazot gave valuable advice about applications and recommendation letters that will be useful when I apply to REUs [Research Experiences for Undergraduates] and graduate schools.”Looking to the future, MICRO organizers hope to continue to grow the program’s reach.“We would love to see other schools taking on this model,” says Sandland. “There are a lot of opportunities out there. The more departments, research groups, and mentors that get involved with this program, the more impact it can have.” More
113 Shares149 Views
in Data Management & Statistics
An AI dataset carves new paths to tornado detection
by Markus Andrews 29 April 2024, 17:55
The return of spring in the Northern Hemisphere touches off tornado season. A tornado’s twisting funnel of dust and debris seems an unmistakable sight. But that sight can be obscured to radar, the tool of meteorologists. It’s hard to know exactly when a tornado has formed, or even why.
A new dataset could hold answers. It contains radar returns from thousands of tornadoes that have hit the United States in the past 10 years. Storms that spawned tornadoes are flanked by other severe storms, some with nearly identical conditions, that never did. MIT Lincoln Laboratory researchers who curated the dataset, called TorNet, have now released it open source. They hope to enable breakthroughs in detecting one of nature’s most mysterious and violent phenomena.
“A lot of progress is driven by easily available, benchmark datasets. We hope TorNet will lay a foundation for machine learning algorithms to both detect and predict tornadoes,” says Mark Veillette, the project’s co-principal investigator with James Kurdzo. Both researchers work in the Air Traffic Control Systems Group.
Along with the dataset, the team is releasing models trained on it. The models show promise for machine learning’s ability to spot a twister. Building on this work could open new frontiers for forecasters, helping them provide more accurate warnings that might save lives.
Swirling uncertainty
About 1,200 tornadoes occur in the United States every year, causing millions to billions of dollars in economic damage and claiming 71 lives on average. Last year, one unusually long-lasting tornado killed 17 people and injured at least 165 others along a 59-mile path in Mississippi.
Yet tornadoes are notoriously difficult to forecast because scientists don’t have a clear picture of why they form. “We can see two storms that look identical, and one will produce a tornado and one won’t. We don’t fully understand it,” Kurdzo says.
A tornado’s basic ingredients are thunderstorms with instability caused by rapidly rising warm air and wind shear that causes rotation. Weather radar is the primary tool used to monitor these conditions. But tornadoes lay too low to be detected, even when moderately close to the radar. As the radar beam with a given tilt angle travels further from the antenna, it gets higher above the ground, mostly seeing reflections from rain and hail carried in the “mesocyclone,” the storm’s broad, rotating updraft. A mesocyclone doesn’t always produce a tornado.
With this limited view, forecasters must decide whether or not to issue a tornado warning. They often err on the side of caution. As a result, the rate of false alarms for tornado warnings is more than 70 percent. “That can lead to boy-who-cried-wolf syndrome,” Kurdzo says.
In recent years, researchers have turned to machine learning to better detect and predict tornadoes. However, raw datasets and models have not always been accessible to the broader community, stifling progress. TorNet is filling this gap.
The dataset contains more than 200,000 radar images, 13,587 of which depict tornadoes. The rest of the images are non-tornadic, taken from storms in one of two categories: randomly selected severe storms or false-alarm storms (those that led a forecaster to issue a warning but that didn’t produce a tornado).
Each sample of a storm or tornado comprises two sets of six radar images. The two sets correspond to different radar sweep angles. The six images portray different radar data products, such as reflectivity (showing precipitation intensity) or radial velocity (indicating if winds are moving toward or away from the radar).
A challenge in curating the dataset was first finding tornadoes. Within the corpus of weather radar data, tornadoes are extremely rare events. The team then had to balance those tornado samples with difficult non-tornado samples. If the dataset were too easy, say by comparing tornadoes to snowstorms, an algorithm trained on the data would likely over-classify storms as tornadic.
“What’s beautiful about a true benchmark dataset is that we’re all working with the same data, with the same level of difficulty, and can compare results,” Veillette says. “It also makes meteorology more accessible to data scientists, and vice versa. It becomes easier for these two parties to work on a common problem.”
Both researchers represent the progress that can come from cross-collaboration. Veillette is a mathematician and algorithm developer who has long been fascinated by tornadoes. Kurdzo is a meteorologist by training and a signal processing expert. In grad school, he chased tornadoes with custom-built mobile radars, collecting data to analyze in new ways.
“This dataset also means that a grad student doesn’t have to spend a year or two building a dataset. They can jump right into their research,” Kurdzo says.
This project was funded by Lincoln Laboratory’s Climate Change Initiative, which aims to leverage the laboratory’s diverse technical strengths to help address climate problems threatening human health and global security.
Chasing answers with deep learning
Using the dataset, the researchers developed baseline artificial intelligence (AI) models. They were particularly eager to apply deep learning, a form of machine learning that excels at processing visual data. On its own, deep learning can extract features (key observations that an algorithm uses to make a decision) from images across a dataset. Other machine learning approaches require humans to first manually label features.
“We wanted to see if deep learning could rediscover what people normally look for in tornadoes and even identify new things that typically aren’t searched for by forecasters,” Veillette says.
The results are promising. Their deep learning model performed similar to or better than all tornado-detecting algorithms known in literature. The trained algorithm correctly classified 50 percent of weaker EF-1 tornadoes and over 85 percent of tornadoes rated EF-2 or higher, which make up the most devastating and costly occurrences of these storms.
They also evaluated two other types of machine-learning models, and one traditional model to compare against. The source code and parameters of all these models are freely available. The models and dataset are also described in a paper submitted to a journal of the American Meteorological Society (AMS). Veillette presented this work at the AMS Annual Meeting in January.
“The biggest reason for putting our models out there is for the community to improve upon them and do other great things,” Kurdzo says. “The best solution could be a deep learning model, or someone might find that a non-deep learning model is actually better.”
TorNet could be useful in the weather community for others uses too, such as for conducting large-scale case studies on storms. It could also be augmented with other data sources, like satellite imagery or lightning maps. Fusing multiple types of data could improve the accuracy of machine learning models.
Taking steps toward operations
On top of detecting tornadoes, Kurdzo hopes that models might help unravel the science of why they form.
“As scientists, we see all these precursors to tornadoes — an increase in low-level rotation, a hook echo in reflectivity data, specific differential phase (KDP) foot and differential reflectivity (ZDR) arcs. But how do they all go together? And are there physical manifestations we don’t know about?” he asks.
Teasing out those answers might be possible with explainable AI. Explainable AI refers to methods that allow a model to provide its reasoning, in a format understandable to humans, of why it came to a certain decision. In this case, these explanations might reveal physical processes that happen before tornadoes. This knowledge could help train forecasters, and models, to recognize the signs sooner.
“None of this technology is ever meant to replace a forecaster. But perhaps someday it could guide forecasters’ eyes in complex situations, and give a visual warning to an area predicted to have tornadic activity,” Kurdzo says.
Such assistance could be especially useful as radar technology improves and future networks potentially grow denser. Data refresh rates in a next-generation radar network are expected to increase from every five minutes to approximately one minute, perhaps faster than forecasters can interpret the new information. Because deep learning can process huge amounts of data quickly, it could be well-suited for monitoring radar returns in real time, alongside humans. Tornadoes can form and disappear in minutes.
But the path to an operational algorithm is a long road, especially in safety-critical situations, Veillette says. “I think the forecaster community is still, understandably, skeptical of machine learning. One way to establish trust and transparency is to have public benchmark datasets like this one. It’s a first step.”
The next steps, the team hopes, will be taken by researchers across the world who are inspired by the dataset and energized to build their own algorithms. Those algorithms will in turn go into test beds, where they’ll eventually be shown to forecasters, to start a process of transitioning into operations.
In the end, the path could circle back to trust.
“We may never get more than a 10- to 15-minute tornado warning using these tools. But if we could lower the false-alarm rate, we could start to make headway with public perception,” Kurdzo says. “People are going to use those warnings to take the action they need to save their lives.” More

Algorithms

Latest story

Method prevents an AI model from being overconfident about wrong answers

More stories

Study: When allocating scarce resources with AI, randomization can improve fairness

How to assess a general-purpose AI model’s reliability before it’s deployed

MIT researchers introduce generative AI for databases

Fostering research, careers, and community in materials science

An AI dataset carves new paths to tornado detection

ITALIAN LANGUAGE

ENGLISH LANGUAGE