Books and authors Archivi - technology-news.space - All about the world of technology!

Latest story

150 Shares199 Views

3 Questions: Catherine D’Ignazio on data science and a quest for justice

by Markus Andrews 19 June 2024, 04:00

As long as we apply data science to society, we should remember that our data may have flaws, biases, and absences. That is one motif of MIT Associate Professor Catherine D’Ignazio’s new book, “Counting Feminicide,” published this spring by the MIT Press. In it, D’Ignazio explores the world of Latin American activists who began using media accounts and other sources to tabulate how many women had been killed in their countries as the result of gender-based violence — and found that their own numbers differed greatly from official statistics.Some of these activists have become prominent public figures, and others less so, but all of them have produced work providing lessons about collecting data, sharing it, and applying data to projects supporting human liberty and dignity. Now, their stories are reaching a new audience thanks to D’Ignazio, an associate professor of urban science and planning in MIT’s Department of Urban Studies and Planning, and director of MIT’s Data and Feminism Lab. She is also hosting an ongoing, transnational book club about the work. MIT News spoke with D’Ignazio about the new book and how activists are expanding the traditional practice of data science.Q: What is your book about?A: Three things. It’s a book that documents the rise of data activism as a really interesting form of citizen data science. Increasingly, because of the availability of data and tools, gathering and doing your own data analysis is a growing form of social activism. We characterize it in the book as a citizenship practice. People are using data to make knowledge claims and put political demands out there for their institutions to respond to.Another takeaway is that from observing data activists, there are ways they approach data science that are very different from how it’s usually taught. Among other things, when undertaking work about inequality and violence, there’s a connection with the rows of data. It’s about memorializing people who have been lost. Mainstream data scientists can learn a lot from this.The third thing is about feminicide itself and missing information. The main reason people start collecting data about feminicide is because their institutions aren’t doing it. This includes our institutions here in the United States. We’re talking about violence against women that the state is neglecting to count, classify, or take action on. So, activists step into these gaps and do this to the best of their ability, and they have been quite effective. The media will go to the activists, who end up becoming authorities on feminicide.Q: Can you elaborate on the differences between the practices of these data activists and more standard data science?A: One difference is what I’ll call the intimacy and proximity to the rows of the data set. In conventional data science, when you’re analyzing data, typically you’re not also the data collector. However these activists and groups are involved across the entire pipeline. As a result, there’s a connection and humanization to each line of the data set. For example, there is a school nurse in Texas who runs the site Women Count USA, and she will spend many hours trying to find photographs of victims of feminicide, which represents unusual care paid to each row of a dataset.Another point is the sophistication that the data activists have around what their data represent and what the biases are in the data. In mainstream AI and data science, we’re still having conversations where people seem surprised that there is bias in datasets. But I was impressed with the critical sophistication with which the activists approached their data. They gather information from the media and are familiar with the biases media have, and are aware their data is not comprehensive but is still useful. We can hold those two things together. It’s often more comprehensive data than what the institutions themselves have or will release to the public.Q: You did not just chronicle the work of activists, but engaged with them as well, and report about that in the book. What did you work on with them?A: One big component in the book is the participatory technology development that we engaged in with the activists, and one chapter is a case study of our work with activists to co-design machine learning and AI technology that supports their work. Our team was brainstorming about a system for the activists that would automatically find cases, verify them, and put them right in the database. Interestingly, the activists pushed back on that. They did not want full automation. They felt being, in effect, witnesses is an important part of the work. The emotional burden is an important part of the work and very central to it, too. That’s not something I might always expect to hear from data scientists.Keeping the human in the loop also means the human makes the final decision over whether a specific item constitutes feminicide or not. Handling it like that aligns with the fact that there are multiple definitions of feminicide, which is a complicated thing from a computational perspective. The proliferation of definitions about what counts as feminicide is a reflection of the fact that this is an ongoing global, transnational conversation. Feminicide has been codified in many laws, especially in Latin American countries, but none of those single laws is definitive. And no single activist definition is definitive. People are creating this together, through dialogue and struggle, so any computational system has to be designed with that understanding of the democratic process in mind. More

More stories

50 Shares189 Views
in Data Management & Statistics
Learning how to learn
by Markus Andrews 19 October 2023, 20:50
Suppose you need to be on today’s only ferry to Martha’s Vineyard, which leaves at 2 p.m. It takes about 30 minutes (on average) to drive from where you are to the terminal. What time should you leave?
This is one of many common real-life examples used by Richard “Dick” Larson, a post-tenure professor in the MIT Institute for Data, Systems, and Society (IDSS), to explore exemplary problem-solving in his new book “Model Thinking for Everyday Life: How to Make Smarter Decisions.”
Larson’s book synthesizes a lifelong career as an MIT professor and researcher, highlighting crucial skills underpinning all empirical, rational, and critical thinking. “Critical thinkers are energetic detectives … always seeking the facts,” he says. “Additional facts may surface that can result in modified conclusions … A critical thinker is aware of the pitfalls of human intuition.”
For Larson, “model” thinking means not only thinking aided by conceptual and/or mathematical models, but a broader mode of critical thought that is informed by STEM concepts and worthy of emulation.
In the ferry example, a key concept at play is uncertainty. Accounting for uncertainty is a core challenge faced by systems engineers, operations researchers, and modelers of complex networks — all hats Larson has worn in over half a century at MIT.
Uncertainty complicates all prediction and decision-making, and while statistics offers tactics for managing uncertainty, “Model Thinking” is not a math textbook. There are equations for the math-curious, but it doesn’t take a degree from MIT to understand that
an average of 30 minutes would cover a range of times, some shorter, some longer;
outliers can exist in the data, like the time construction traffic added an additional 30 minutes
“about 30 minutes” is a prediction based on past experience, not current information (road closures, accidents, etc.); and
the consequence for missing the ferry is not a delay of hours, but a full day — which might completely disrupt the trip or its purpose.
And so, without doing much explicit math, you calculate variables, weigh the likelihood of different outcomes against the consequences of failure, and choose a departure time. Larson’s conclusion is one championed by dads everywhere: Leave on the earlier side, just in case.
“The world’s most important, invisible profession”
Throughout Larson’s career at MIT, he has focused on the science of solving problems and making better decisions. “Faced with a new problem, people often lack the ability to frame and formulate it using basic principles,” argues Larson. “Our emphasis is on problem framing and formulation, with mathematics and physics playing supporting roles.”
This is operations research, which Larson calls “the world’s most important invisible profession.” Formalized as a field during World War II, operations researchers use data and models to try to derive the “physics” of complex systems. The goal is typically optimizing things like scheduling, routing, simulation, prediction, planning, logistics, and queueing, for which Larson is especially well-known. A frequent media expert on the subject, he earned the moniker “Dr. Q” — and his research has led to new approaches for easing congestion in urban traffic, fast-food lines, and banks.
Larson’s experience with complex systems provides a wealth of examples to draw on, but he is keen to demonstrate that his purview includes everyday decisions, and that “Model Thinking” is a book for everyone.
“Everybody uses models, whether they realize it or not,” he says. “If you have a bunch of errands to do, and you try to plan out the order to do them so you don’t have to drive as much, that’s more or less the ‘traveling salesman’ problem, a classic from operations research. Or when someone is shopping for groceries and thinking about how much of each product they need — they’re basically using an inventory management model of their pantry.”
Larson’s takeaway is that since we all use conceptual models for thinking, planning, and decision-making, then understanding how our minds use models, and learning to use them more intentionally, can lead to clearer thinking, better planning, and smarter decision-making — especially when they are grounded in principles drawn from math and physics.
Passion for the process
Teaching STEM principles has long been a mission of Larson’s, who co-founded MIT BLOSSOMS (Blended Learning Open Source Science or Math Studies) with his late wife, Mary Elizabeth Murray. BLOSSOMS provides free, interactive STEM lessons and videos for primary school students around the world. Some of the exercises in “Model Thinking” refer to these videos as well.
“A child’s educational opportunities shouldn’t be limited by where they were born or the wealth of their parents,” says Larson of the enterprise.
It was also Murray who encouraged Larson to write “Model Thinking.” “She saw how excited I was about it,” he says. “I had the choice of writing a textbook on queuing, say, or something else. It didn’t excite me at all.”
Larson’s passion is for the process, not the answer. Throughout the book, he marks off opportunities for active learning with an icon showing the two tools necessary to complete each task: a sharpened pencil and a blank sheet of paper.
“Many of us in the age of instant Google searches have lost the ability — or perhaps the patience — to undertake multistep problems,” he argues.
Model thinkers, on the other hand, understand and remember solutions better for having thought through the steps, and can better apply what they’ve learned to future problems. Larson’s “homework” is to do critical thinking, not just read about it. By working through thought experiments and scenarios, readers can achieve a deeper understanding of concepts like selection bias, random incidence, and orders of magnitude, all of which can present counterintuitive examples to the uninitiated.
For Larson, who jokes that he is “an evangelist for models,” there is no better way to learn than by doing — except perhaps to teach. “Teaching a difficult topic is our best way to learn it ourselves, is an unselfish act, and bonds the teacher and learner,” he writes.
In his long career as an educator and education advocate, Larson says he has always remained a learner himself. His love for learning illuminates every page of “Model Thinking,” which he hopes will provide others with the enjoyment and satisfaction that comes from learning new things and solving complex problems.
“You will learn how to learn,” Larson says. “And you will enjoy it!” More
38 Shares149 Views
in Data Management & Statistics
Understanding viral justice
by Markus Andrews 17 July 2023, 20:10
In the wake of the Covid-19 pandemic, the word “viral” has a new resonance, and it’s not necessarily positive. Ruha Benjamin, a scholar who investigates the social dimensions of science, medicine, and technology, advocates a shift in perspective. She thinks justice can also be contagious. That’s the premise of Benjamin’s award-winning book “Viral Justice: How We Grow the World We Want,” as she shared with MIT Libraries staff on a June 14 visit.
“If this pandemic has taught us anything, it’s that something almost undetectable can be deadly, and that we can transmit it without even knowing,” said Benjamin, professor of African American studies at Princeton University. “Doesn’t this imply that small things, seemingly minor actions, decisions, or habits, could have exponential effects in the other direction, tipping the scales towards justice?”
To seek a more just world, Benjamin exhorted library staff to notice the ways exclusion is built into our daily lives, showing examples of park benches with armrests at regular intervals. On the surface they appear welcoming, but they also make lying down — or sleeping — impossible. This idea is taken to the extreme with “Pay and Sit,” an art installation by Fabian Brunsing in the form of a bench that deploys sharp spikes on the seat if the user doesn’t pay a meter. It serves as a powerful metaphor for discriminatory design.
“Dr. Benjamin’s keynote was seriously mind-blowing,” said Cherry Ibrahim, human resources generalist in the MIT Libraries. “One part that really grabbed my attention was when she talked about benches purposely designed to prevent unhoused people from sleeping on them. There are these hidden spikes in our community that we might not even realize because they don’t directly impact us.”
Benjamin urged the audience to look for those “spikes,” which new technologies can make even more insidious — gender and racial bias in facial recognition, the use of racial data in software used to predict student success, algorithmic bias in health care — often in the guise of progress. She coined the term “the New Jim Code” to describe the combination of coded bias and the imagined objectivity we ascribe to technology.
“At the MIT Libraries, we’re deeply concerned with combating inequities through our work, whether it’s democratizing access to data or investigating ways disparate communities can participate in scholarship with minimal bias or barriers,” says Director of Libraries Chris Bourg. “It’s our mission to remove the ‘spikes’ in the systems through which we create, use, and share knowledge.”
Calling out the harms encoded into our digital world is critical, argues Benjamin, but we must also create alternatives. This is where the collective power of individuals can be transformative. Benjamin shared examples of those who are “re-imagining the default settings of technology and society,” citing initiatives like Data for Black Lives movement and the Detroit Community Technology Project. “I’m interested in the way that everyday people are changing the digital ecosystem and demanding different kinds of rights and responsibilities and protections,” she said.
In 2020, Benjamin founded the Ida B. Wells Just Data Lab with a goal of bringing together students, educators, activists, and artists to develop a critical and creative approach to data conception, production, and circulation. Its projects have examined different aspects of data and racial inequality: assessing the impact of Covid-19 on student learning; providing resources that confront the experience of Black mourning, grief, and mental health; or developing a playbook for Black maternal mental health. Through the lab’s student-led projects Benjamin sees the next generation re-imagining technology in ways that respond to the needs of marginalized people.
“If inequity is woven into the very fabric of our society — we see it from policing to education to health care to work — then each twist, coil, and code is a chance for us to weave new patterns, practices, and politics,” she said. “The vastness of the problems that we’re up against will be their undoing.” More
88 Shares169 Views
in Data Management & Statistics
Q&A: A fresh look at data science
by Markus Andrews 12 January 2023, 18:30
As the leaders of a developing field, data scientists must often deal with a frustratingly slippery question: What is data science, precisely, and what is it good for?
Alfred Spector is a visiting scholar in the MIT Department of Electrical Engineering and Computer Science (EECS), an influential developer of distributed computing systems and applications, and a successful tech executive with companies including IBM and Google. Along with three co-authors — Peter Norvig at Stanford University and Google, Chris Wiggins at Columbia University and The New York Times, and Jeannette M. Wing at Columbia — Spector recently published “Data Science in Context: Foundations, Challenges, Opportunities” (Cambridge University Press), which provides a broad, conversational overview of the wide-ranging field driving change in sectors ranging from health care to transportation to commerce to entertainment.
Here, Spector talks about data-driven life, what makes a good data scientist, and how his book came together during the height of the Covid-19 pandemic.
Q: One of the most common buzzwords Americans hear is “data-driven,” but many might not know what that term is supposed to mean. Can you unpack it for us?
A: Data-driven broadly refers to techniques or algorithms powered by data — they either provide insight or reach conclusions, say, a recommendation or a prediction. The algorithms power models which are increasingly woven into the fabric of science, commerce, and life, and they often provide excellent results. The list of their successes is really too long to even begin to list. However, one concern is that the proliferation of data makes it easy for us as students, scientists, or just members of the public to jump to erroneous conclusions. As just one example, our own confirmation biases make us prone to believing some data elements or insights “prove” something we already believe to be true. Additionally, we often tend to see causal relationships where the data only shows correlation. It might seem paradoxical, but data science makes critical reading and analysis of data all the more important.
Q: What, to your mind, makes a good data scientist?
A: [In talking to students and colleagues] I optimistically emphasize the power of data science and the importance of gaining the computational, statistical, and machine learning skills to apply it. But, I also remind students that we are obligated to solve problems well. In our book, Chris [Wiggins] paraphrases danah boyd, who says that a successful application of data science is not one that merely meets some technical goal, but one that actually improves lives. More specifically, I exhort practitioners to provide a real solution to problems, or else clearly identify what we are not solving so that people see the limitations of our work. We should be extremely clear so that we do not generate harmful results or lead others to erroneous conclusions. I also remind people that all of us, including scientists and engineers, are human and subject to the same human foibles as everyone else, such as various biases.
Q: You discuss Covid-19 in your book. While some short-range models for mortality were very accurate during the heart of the pandemic, you note the failure of long-range models to predict any of 2020’s four major geotemporal Covid waves in the United States. Do you feel Covid was a uniquely hard situation to model?
A: Covid was particularly difficult to predict over the long term because of many factors — the virus was changing, human behavior was changing, political entities changed their minds. Also, we didn’t have fine-grained mobility data (perhaps, for good reasons), and we lacked sufficient scientific understanding of the virus, particularly in the first year.
I think there are many other domains which are similarly difficult. Our book teases out many reasons why data-driven models may not be applicable. Perhaps it’s too difficult to get or hold the necessary data. Perhaps the past doesn’t predict the future. If data models are being used in life-and-death situations, we may not be able to make them sufficiently dependable; this is particularly true as we’ve seen all the motivations that bad actors have to find vulnerabilities. So, as we continue to apply data science, we need to think through all the requirements we have, and the capability of the field to meet them. They often align, but not always. And, as data science seeks to solve problems into ever more important areas such as human health, education, transportation safety, etc., there will be many challenges.
Q: Let’s talk about the power of good visualization. You mention the popular, early 2000’s Baby Name Voyager website as one that changed your view on the importance of data visualization. Tell us how that happened.
A: That website, recently reborn as the Name Grapher, had two characteristics that I thought were brilliant. First, it had a really natural interface, where you type the initial characters of a name and it shows a frequency graph of all the names beginning with those letters, and their popularity over time. Second, it’s so much better than a spreadsheet with 140 columns representing years and rows representing names, despite the fact it contains no extra information. It also provided instantaneous feedback with its display graph dynamically changing as you type. To me, this showed the power of a very simple transformation that is done correctly.
Q: When you and your co-authors began planning “Data Science In Context,” what did you hope to offer?
A: We portray present data science as a field that’s already had enormous benefits, that provides even more future opportunities, but one that requires equally enormous care in its use. Referencing the word “context” in the title, we explain that the proper use of data science must consider the specifics of the application, the laws and norms of the society in which the application is used, and even the time period of its deployment. And, importantly for an MIT audience, the practice of data science must go beyond just the data and the model to the careful consideration of an application’s objectives, its security, privacy, abuse, and resilience risks, and even the understandability it conveys to humans. Within this expansive notion of context, we finally explain that data scientists must also carefully consider ethical trade-offs and societal implications.
Q: How did you keep focus throughout the process?
A: Much like in open-source projects, I played both the coordinating author role and also the role of overall librarian of all the material, but we all made significant contributions. Chris Wiggins is very knowledgeable on the Belmont principles and applied ethics; he was the major contributor of those sections. Peter Norvig, as the coauthor of a bestselling AI textbook, was particularly involved in the sections on building models and causality. Jeannette Wing worked with me very closely on our seven-element Analysis Rubric and recognized that a checklist for data science practitioners would end up being one of our book’s most important contributions.
From a nuts-and-bolts perspective, we wrote the book during Covid, using one large shared Google doc with weekly video conferences. Amazingly enough, Chris, Jeannette, and I didn’t meet in person at all, and Peter and I met only once — sitting outdoors on a wooden bench on the Stanford campus.
Q: That is an unusual way to write a book! Do you recommend it?
A: It would be nice to have had more social interaction, but a shared document, at least with a coordinating author, worked pretty well for something up to this size. The benefit is that we always had a single, coherent textual base, not dissimilar to how a programming team works together.
This is a condensed, edited version of a longer interview that originally appeared on the MIT EECS website. More

Books and authors

Latest story

3 Questions: Catherine D’Ignazio on data science and a quest for justice

More stories

Learning how to learn

Q&A: A fresh look at data science

ITALIAN LANGUAGE

ENGLISH LANGUAGE