Data Management & Statistics Archivi - Page 42 of 58 - technology-news.space

More stories

188 Shares189 Views
in Data Management & Statistics
3 Questions: Peko Hosoi on the data-driven reasoning behind MIT’s Covid-19 policies for the fall
by Markus Andrews 25 August 2021, 04:00
As students, faculty, and staff prepare for a full return to the MIT campus in the weeks ahead, procedures for entering buildings, navigating classrooms and labs, and interacting with friends and colleagues will likely take some getting used to.
The Institute recently reinforced its policies for indoor masking and has also continued to require regular testing for people who live, work, or study on campus — procedures that apply to both vaccinated and unvaccinated individuals. Vaccination is required for all students, faculty, and staff on campus unless a medical or religious exemption is granted.
These and other policies adopted by MIT to control the spread of Covid-19 have been informed by modeling efforts from a volunteer group of MIT faculty, students, and postdocs. The collaboration, dubbed Isolat, was co-founded by Anette “Peko” Hosoi, the Neil and Jane Pappalardo Professor of Mechanical Engineering and associate dean in the School of Engineering.
The group, which is organized through MIT’s Institute for Data, Systems, and Society (IDSS), has run numerous models to show how measures such as mask wearing, testing, ventilation, and quarantining could affect Covid-19’s spread. These models have helped to shape MIT’s Covid-19 policies throughout the pandemic, including its procedures for returning to campus this fall.
Hosoi spoke with MIT News about the data-backed reasoning behind some of these procedures, including indoor masking and regular testing, and how a “generous community” will help MIT safely weather the virus and its variants.
Q: Take us through how you have been modeling Covid-19 and its variants, in regard to helping MIT shape its Covid policies. What’s the approach you’ve taken, and why?
A: The approach we’re taking uses a simple counting exercise developed in IDSS to estimate the balance of testing, masking, and vaccination that is required to keep the virus in check. The underlying objective is to find infected people faster, on average, than they can infect others, which is captured in a simple algebraic expression. Our objective can be accomplished either by speeding up the rate of finding infected people (i.e. increasing testing frequency) or slowing down the rate of infection (i.e. increasing masking and vaccination) or by a combination of both. To give you a sense of the numbers, balances for different levels of testing are shown in the chart below for a vaccine efficacy of 67 percent and a contagious period of 18 days (which are the CDC’s latest parameters for the Delta variant).
The vertical axis shows the now-famous reproduction number R0, i.e. the average number of people that one infected person will infect throughout the course of their illness. These R0 are averages for the population, and in specific circumstances the spreading could be more than that.
Each blue line represents a different testing frequency: Below the line, the virus is controlled; above the line, it spreads. For example, the dotted blue line shows the boundary if we rely solely on vaccination with no testing. In that case, even if everyone is vaccinated, we can only control up to an R0 of about 3. Unfortunately, the CDC places R0 of the Delta variant somewhere between 5 and 9, so vaccination alone is insufficient to control the spread. (As an aside, this also means that given the efficacy estimates for the current vaccines, herd immunity is not possible.)
Next consider the dashed blue line, which represents the stability boundary if we test everyone once per week. If our vaccination rate is greater than about 90 percent, testing one time per week can control even the CDC’s most pessimistic estimate for the Delta variant’s R0.
Q: In returning to campus over the next few weeks, indoor masking and regular testing are required of every MIT community member, even those who are vaccinated. What in your modeling has shown that each of these policies is necessary?
A: Given that the chart above shows that vaccination and weekly testing are sufficient to control the virus, one should certainly ask “Why have we reinstated indoor masking?” The answer is related to the fact that, as a university, our population turns over once a year; every September we bring in a few thousand new people. Those people are coming from all over the world, and some of them may not have had the opportunity to get vaccinated yet. The good news is that MIT Medical has vaccines and will be administering them to any unvaccinated students as soon as they arrive; the bad news is that, as we all know, it takes three to five weeks for resistance to build up, depending on the vaccine. This means that we should think of August and September as a transition period during which the vaccination rates may fluctuate as new people arrive.
The other revelation that has informed our policies for September is the recent report from the CDC that infected vaccinated people carry roughly the same viral load as unvaccinated infected people. This suggests that vaccinated people — although they are highly unlikely to get seriously ill — are a consequential part of the transmission chain and can pass the virus along to others. So, in order to avoid giving the virus to people who are not yet fully vaccinated during the transition period, we all need to exercise a little extra care to give the newly vaccinated time for their immune systems to ramp up.
Q: As the fall progresses, what signs are you looking for that might shift decisions on masking and testing on campus?
A: Eventually we will have to shift responsibility toward individuals rather than institutions, and allow people to make decisions about masks and testing based on their own risk tolerance. The success of the vaccines in suppressing severe illness will enable us to shift to a position in which our objective is not necessarily to control the spread of the virus, but rather to reduce the risk of serious outcomes to an acceptable level. There are many people who believe we need to make this adjustment and wean ourselves off pandemic living. They are right; we cannot continue like this forever. However, we have not played all our cards yet, and, in my opinion, we need to carefully consider what’s left in our hand before we abdicate institutional responsibility.
The final ace we have to play is vaccinating kids. It is important to remember that we have many people in our community with kids who are too young to be vaccinated and, understandably, those parents do not want to bring Covid home to their children. Furthermore, our campus is not just a workplace; it is also home to thousands of people, some of whom have children living in our residences or attending an MIT childcare center. Given that context, and the high probability that a vaccine will be approved for children in the near future, it is my belief that our community has the empathy and fortitude to try to keep the virus in check until parents have the option to protect their children with vaccines.
Bearing in mind that children constitute an unprotected portion of our population, let me return to the original question and speculate on the fate of masks and testing in the fall. Regarding testing, the analysis suggests that we cannot give that up entirely if we would like to control the spread of the virus. Second, control of the virus is not the only benefit we get from testing. It also gives us situational awareness, serves as an early warning beacon, and provides information that individual members of the community can use as they make decisions about their own risk budget. Personally, I’ve been testing for a year now and I find it easy and reassuring. Honestly, it’s nice to know that I’m Covid-free before I see friends (outside!) or go home to my family.
Regarding masks, there is always uncertainty around whether a new variant will arise or whether vaccine efficacy will fade, but, given the current parameters and our analysis, my hope is that we will be in a position to provide some relief on the mask mandate once the incoming members of our population have been fully vaccinated. I also suspect that whenever the mask mandate is lifted, masks are not likely to go away. There are certainly situations in which I will continue to wear a mask regardless of the mandate, and many in our community will continue to feel safer wearing masks even when they are not required.
I believe that we are a generous community and that we will be willing to take precautions to help keep each other healthy. The students who were on campus last year did an outstanding job, and they have given me a tremendous amount of faith that we can be considerate and good to one another even in extremely trying times.
Previous item
Next item More
125 Shares169 Views
in Data Management & Statistics
Last-mile routing research challenge awards $175,000 to three winning teams
by Markus Andrews 24 August 2021, 19:10
Routing is one of the most studied problems in operations research; even small improvements in routing efficiency can save companies money and result in energy savings and reduced environmental impacts. Now, three teams of researchers from universities around the world have received prize money totaling $175,000 for their innovative route optimization models.
The three teams were the winners of the Amazon Last-Mile Routing Research Challenge, through which the MIT Center for Transportation & Logistics (MIT CTL) and Amazon engaged with a global community of researchers across a range of disciplines, from computer science to business operations to supply chain management, challenging them to build data-driven route optimization models leveraging massive historical route execution data.
First announced in February, the research challenge attracted more than 2,000 participants from around the world. Two hundred twenty-nine researcher teams formed during the spring to independently develop solutions that incorporated driver know-how into route optimization models with the intent that they would outperform traditional optimization approaches. Out of the 48 teams whose models qualified for the final round of the challenge, three teams’ work stood out above the rest. Amazon provided real operational training data for the models and evaluated submissions, with technical support from MIT CTL scientists.
In real life, drivers frequently deviate from planned and mathematically optimized route sequences. Drivers carry information about which roads are hard to navigate when traffic is bad, when and where they can easily find parking, which stops can be conveniently served together, and many other factors that existing optimization models simply don’t capture.
Each model addressed the challenge data in a unique way. The methodological approaches chosen by the participants frequently combined traditional exact and heuristic optimization approaches with nontraditional machine learning methods. On the machine learning side, the most commonly adopted methods were different variants of artificial neural networks, as well as inverse reinforcement learning approaches.
There were 45 submissions that reached the finalist phase, with team members hailing from 29 countries. Entrants spanned all levels of higher education from final-year undergraduate students to retired faculty. Entries were assessed in a double-blind review process so that the judges would not know what team was attached to each entry.
The third-place prize of $25,000 was awarded to Okan Arslan and Rasit Abay. Okan is a professor at HEC Montréal, and Rasit is a doctoral student at the University of New South Wales in Australia. The runner-up prize at $50,000 was awarded to MIT’s own Xiaotong Guo, Qingyi Wang, and Baichuan Mo, all doctoral students. The top prize of $100,000 was awarded to Professor William Cook of the University of Waterloo in Canada, Professor Stephan Held of the University of Bonn in Germany, and Professor Emeritus Keld Helsgaun of Roskilde University in Denmark. Congratulations to all winners and contestants were held via webinar on July 30.
Top-performing teams may be interviewed by Amazon for research roles in the company’s Last Mile organization. MIT CTL will publish and promote short technical papers written by all finalists and might invite top-performing teams to present at MIT. Further, a team led by Matthias Winkenbach, director of the MIT Megacity Logistics Lab, will guest-edit a special issue of Transportation Science, one of the most renowned academic journals in this field, featuring academic papers on topics related to the problem tackled by the research challenge. More
100 Shares99 Views
in Data Management & Statistics
Helping companies optimize their websites and mobile apps
by Markus Andrews 24 August 2021, 04:00
Creating a good customer experience increasingly means creating a good digital experience. But metrics like pageviews and clicks offer limited insight into how much customers actually like a digital product.
That’s the problem the digital optimization company Amplitude is solving. Amplitude gives companies a clearer picture into how users interact with their digital products to help them understand exactly which features to promote or improve.
“It’s all about using product data to drive your business,” says Amplitude CEO Spenser Skates ’10, who co-founded the company with Curtis Liu ’10 and Stanford University graduate Jeffrey Wang. “Mobile apps and websites are really complex. The average app or website will have thousands of things you can do with it. The question is how you know which of those things are driving a great user experience and which parts are really frustrating for users.”
Amplitude’s database can gather millions of details about how users behave inside an app or website and allow customers to explore that information without needing data science degrees.
“It provides an interface for very easy, accessible ways of looking at your data, understanding your data, and asking questions of that data,” Skates says.
Amplitude, which recently announced it will be going public, is already helping 23 of the 100 largest companies in the U.S. Customers include media companies like NBC, tech companies like Twitter, and retail companies like Walmart.
“Our platform helps businesses understand how people are using their apps and websites so they can create better versions of their products,” Skates says. “It’s all about creating a really compelling product.”
Learning entrepreneurship
The founders say their years at MIT were among the best of their lives. Skates and Liu were undergraduates from 2006 to 2010. Skates majored in biological engineering while Liu majored in mathematics and electrical engineering and computer science. The two first met as opponents in MIT’s Battlecode competition, in which students use artificial intelligence algorithms to control teams of robots that compete in a strategy game against other teams. The following year they teamed up.
“There are a lot of parallels between what you’re trying to do in Battlecode and what you end up having to do in the early stages of a startup,” Liu says. “You have limited resources, limited time, and you’re trying to accomplish a goal. What we found is trying a lot of different things, putting our ideas out there and testing them with real data, really helped us focus on the things that actually mattered. That method of iteration and continual improvement set the foundation for how we approach building products and startups.”
Liu and Skates next participated in the MIT $100K Entrepreneurship Competition with an idea for a cloud-based music streaming service. After graduation, Skates began working in finance and Liu got a job at Google, but they continued pursuing startup ideas on the side, including a website that let alumni see where their classmates ended up and a marketplace for finding photographers.
A year after graduation, the founders decided to quit their jobs and work on a startup full time. Skates moved into Liu’s apartment in San Francisco, setting up a mattress on the floor, and they began working on a project that became Sonalight, a voice recognition app. As part of the project, the founders built an internal system to understand where users got stuck in the app and what features were used the most.
Despite getting over 100,000 downloads, the founders decided Sonalight was a little too early for its time and started thinking their analytics feature could be useful to other companies. They spoke with about 30 different product teams to learn more about what companies wanted from their digital analytics. Amplitude was officially founded in 2012.
Amplitude gathers fine details about digital product usage, parsing out individual features and actions to give customers a better view of how their products are being used. Using the data in Amplitude’s intuitive, no-code interface, customers can make strategic decisions like whether to launch a feature or change a distribution channel.
The platform is designed to ease the bottlenecks that arise when executives, product teams, salespeople, and marketers want to answer questions about customer experience or behavior but need the data science team to crunch the numbers for them.
“It’s a very collaborative interface to encourage customers to work together to understand how users are engaging with their apps,” Skates says.
Amplitude’s database also uses machine learning to segment users, predict user outcomes, and uncover novel correlations. Earlier this year, the company unveiled a service called Recommend that helps companies create personalized user experiences across their entire platform in minutes. The service goes beyond demographics to personalize customer experiences based on what users have done or seen before within the product.
“We’re very conscious on the privacy front,” Skates says. “A lot of analytics companies will resell your data to third parties or use it for advertising purposes. We don’t do any of that. We’re only here to provide product insights to our customers. We’re not using data to track you across the web. Everyone expects Netflix to use the data on what you’ve watched before to recommend what to watch next. That’s effectively what we’re helping other companies do.”
Optimizing digital experiences
The meditation app Calm is on a mission to help users build habits that improve their mental wellness. Using Amplitude, the company learned that users most often use the app to get better sleep and reduce stress. The insights helped Calm’s team double down on content geared toward those goals, launching “sleep stories” to help users unwind at the end of each day and adding content around anxiety relief and relaxation. Sleep stories are now Calm’s most popular type of content, and Calm has grown rapidly to millions of people around the world.
Calm’s story shows the power of letting user behavior drive product decisions. Amplitude has also helped the online fundraising site GoFundMe increase donations by showing users more compelling campaigns and the exercise bike company Peloton realize the importance of social features like leaderboards.
Moving forward, the founders believe Amplitude’s platform will continue helping companies adapt to an increasingly digital world in which users expect more compelling, personalized experiences.
“If you think about the online experience for companies today compared to 10 years ago, now [digital] is the main point of contact, whether you’re a media company streaming content, a retail company, or a finance company,” Skates says. “That’s only going to continue. That’s where we’re trying to help.” More
163 Shares139 Views
in Data Management & Statistics
Smarter regulation of global shipping emissions could improve air quality and health outcomes
by Markus Andrews 17 August 2021, 22:00
Emissions from shipping activities around the world account for nearly 3 percent of total human-caused greenhouse gas emissions, and could increase by up to 50 percent by 2050, making them an important and often overlooked target for global climate mitigation. At the same time, shipping-related emissions of additional pollutants, particularly nitrogen and sulfur oxides, pose a significant threat to global health, as they degrade air quality enough to cause premature deaths.
The main source of shipping emissions is the combustion of heavy fuel oil in large diesel engines, which disperses pollutants into the air over coastal areas. The nitrogen and sulfur oxides emitted from these engines contribute to the formation of PM2.5, airborne particulates with diameters of up to 2.5 micrometers that are linked to respiratory and cardiovascular diseases. Previous studies have estimated that PM2.5 from shipping emissions contribute to about 60,000 cardiopulmonary and lung cancer deaths each year, and that IMO 2020, an international policy that caps engine fuel sulfur content at 0.5 percent, could reduce PM2.5 concentrations enough to lower annual premature mortality by 34 percent.
Global shipping emissions arise from both domestic (between ports in the same country) and international (between ports of different countries) shipping activities, and are governed by national and international policies, respectively. Consequently, effective mitigation of the air quality and health impacts of global shipping emissions will require that policymakers quantify the relative contributions of domestic and international shipping activities to these adverse impacts in an integrated global analysis.
A new study in the journal Environmental Research Letters provides that kind of analysis for the first time. To that end, the study’s co-authors — researchers from MIT and the Hong Kong University of Science and Technology — implement a three-step process. First, they create global shipping emission inventories for domestic and international vessels based on ship activity records of the year 2015 from the Automatic Identification System (AIS). Second, they apply an atmospheric chemistry and transport model to this data to calculate PM2.5 concentrations generated by that year’s domestic and international shipping activities. Finally, they apply a model that estimates mortalities attributable to these pollutant concentrations.
The researchers find that approximately 94,000 premature deaths were associated with PM2.5 exposure due to maritime shipping in 2015 — 83 percent international and 17 percent domestic. While international shipping accounted for the vast majority of the global health impact, some regions experienced significant health burdens from domestic shipping operations. This is especially true in East Asia: In China, 44 percent of shipping-related premature deaths were attributable to domestic shipping activities.
“By comparing the health impacts from international and domestic shipping at the global level, our study could help inform decision-makers’ efforts to coordinate shipping emissions policies across multiple scales, and thereby reduce the air quality and health impacts of these emissions more effectively,” says Yiqi Zhang, a researcher at the Hong Kong University of Science and Technology who led the study as a visiting student supported by the MIT Joint Program on the Science and Policy of Global Change.
In addition to estimating the air-quality and health impacts of domestic and international shipping, the researchers evaluate potential health outcomes under different shipping emissions-control policies that are either currently in effect or likely to be implemented in different regions in the near future.
They estimate about 30,000 avoided deaths per year under a scenario consistent with IMO 2020, an international regulation limiting the sulfur content in shipping fuel oil to 0.5 percent — a finding that tracks with previous studies. Further strengthening regulations on sulfur content would yield only slight improvement; limiting sulfur content to 0.1 percent reduces annual shipping-attributable PM2.5-related premature deaths by an additional 5,000. In contrast, regulating nitrogen oxides instead, involving a Tier III NOx Standard would produce far greater benefits than a 0.1-percent sulfur cap, with 33,000 further avoided deaths.
“Areas with high proportions of mortalities contributed by domestic shipping could effectively use domestic regulations to implement controls,” says study co-author Noelle Selin, a professor at MIT’s Institute for Data, Systems and Society and Department of Earth, Atmospheric and Planetary Sciences, and a faculty affiliate of the MIT Joint Program. “For other regions where much damage comes from international vessels, further international cooperation is required to mitigate impacts.” More
100 Shares189 Views
in Data Management & Statistics
“AI for Impact” lives up to its name
by Markus Andrews 16 August 2021, 21:00
For entrepreneurial MIT students looking to put their skills to work for a greater good, the Media Arts and Sciences class MAS.664 (AI for Impact) has been a destination point. With the onset of the pandemic, that goal came into even sharper focus. Just weeks before the campus shut down in 2020, a team of students from the class launched a project that would make significant strides toward an open-source platform to identify coronavirus exposures without compromising personal privacy.
Their work was at the heart of Safe Paths, one of the earliest contact tracing apps in the United States. The students joined with volunteers from other universities, medical centers, and companies to publish their code, alongside a well-received white paper describing the privacy-preserving, decentralized protocol, all while working with organizations wishing to launch the app within their communities. The app and related software eventually got spun out into the nonprofit PathCheck Foundation, which today engages with public health entities and is providing exposure notifications in Guam, Cyprus, Hawaii, Minnesota, Alabama, and Louisiana.
The formation of Safe Paths demonstrates the special sense among MIT researchers that “we can launch something that can help people around the world,” notes Media Lab Associate Professor Ramesh Raskar, who teaches the class together with Media Lab Professor Alex “Sandy” Pentland and Media Lab Lecturer Joost Bonsen. “To have that kind of passion and ambition — but also the confidence that what you create here can actually be deployed globally — is kind of amazing.”
AI for Impact, created by Pentland, began meeting two decades ago under the course name Development Ventures, and has nurtured multiple thriving businesses. Examples of class ventures that Pentland incubated or co-founded include Dimagi, Cogito, Ginger, Prosperia, and Sanergy.
The aim-high challenge posed to each class is to come up with a business plan that touches a billion people, and it can’t all be in one country, Pentland explains. Not every class effort becomes a business, “but 20 percent to 30 percent of students start something, which is great for an entrepreneur class,” says Pentland.
Opportunities for Impact
The numbers behind Dimagi, for instance, are striking. Its core product CommCare has helped front-line health workers provide care for more than 400 million people in more than 130 countries around the world. When it comes to maternal and child care, Dimagi’s platform has registered one in every 110 pregnancies worldwide. This past year, several governments around the world deployed CommCare applications for Covid-19 response — from Sierra Leone and Somalia to New York and Colorado.
Spinoffs like Cogito, Prosperia, and Ginger have likewise grown into highly successful companies. Cogito helps a million people a day gain access to the health care they need; Prosperia helps manage social support payments to 80 million people in Latin America; and Ginger handles mental health services for over 1 million people.
The passion behind these and other class ventures points to a central idea of the class, Pentland notes: MIT students are often looking for ways to build entrepreneurial businesses that enable positive social change.
During the spring 2021 class, for example, a number of promising student projects included tools to help residents of poor communities transition to owning their homes rather than renting, and to take better control of their community health.
“It’s clear that the people who are graduating from here want to do something significant with their lives … they want to have an impact on their world,” Pentland says. “This class enables them to meet other people who are interested in doing the same thing, and offers them some help in starting a company to do it.”
Many of the students who join the class come in with a broad set of interests. Guest lectures, case studies of other social entrepreneurship projects, and an introduction to a broad ecosystem of expertise and funding, then helps students to refine their general ideas into specific and viable projects.
A path toward confronting a pandemic
Raskar began co-teaching the class in 2019, and brought a “Big AI” focus to the Development Ventures class, inspired by an AI for Impact team he had set up at his former employer, Facebook. “What I realized is that companies like Google or Facebook or Amazon actually have enough data about all of us that they can solve major problems in our society — climate, transportation, health, and so on,” he says. “This is something we should think about more seriously: how to use AI and data for positive social impact, while protecting privacy.”
Early into the spring 2020 class, as students were beginning to consider their own projects, Raskar approached the class about the emerging coronavirus outbreak. Students like Kristen Vilcans recognized the urgency, and the opportunity. She and 10 other students joined forces to work on a project that would focus on Covid-19.
“Students felt empowered to do something to help tackle the spread of this alarming new virus,” Raskar recalls. “They immediately began to develop data- and AI-based solutions to one of the most critical pieces of addressing a pandemic: halting the chain of infections. They created and launched one of the first digital contact tracing and exposure notification solutions in the U.S., developing an early alert system that engaged the public and protected privacy.”
Raskar looks back on the moment when a core group of students coalesced into a team. “It was very rare for a significant part of the class to just come together saying, ‘let’s do this, right away.’ It became as much a movement as a venture.”
Group discussions soon began to center around an open-source, privacy-first digital set of tools for Covid-19 contact tracing. For the next two weeks, right up to the campus shutdown in March 2020, the team took over two adjacent conference rooms in the Media Lab, and started a Slack messaging channel devoted to the project. As the team members reached out to an ever-wider circle of friends, colleagues, and mentors, the number of participants grew to nearly 1,600 people, coming together virtually from all corners of the world.
Kaushal Jain, a Harvard Business School student who had cross-registered for the spring 2020 class to get to know the MIT ecosystem, was also an early participant in Safe Paths. He wrote up an initial plan for the venture and began working with external organizations to figure out how to structure it into a nonprofit company. Jain eventually became the project’s lead for funding and partnerships.
Vilcans, a graduate student in system design and management, served as Safe Paths’ communications lead through July 2020, while still working a part-time job at Draper Laboratory and taking classes.
“There are these moments when you want to dive in, you want to contribute and you want to work nonstop,” she says, adding that the experience was also a wake-up call on how to manage burnout, and how to balance what you need as a person while contributing to a high-impact team. “That’s important to understand as a leader for the future.”
MIT recognized Vilcan’s contributions later that year with the 2020 SDM Student Award for Leadership, Innovation, and Systems Thinking.
Jain, too, says the class gave him more than he could have expected.
“I made strong friendships with like-minded people from very different backgrounds,” he says. “One key thing that I learned was to be flexible about the kind of work you want to do. Be open and see if there’s an opportunity, either through crisis or through something that you believe could really change a lot of things in the world. And then just go for it.” More
138 Shares189 Views
in Data Management & Statistics
Exact symbolic artificial intelligence for faster, better assessment of AI fairness
by Markus Andrews 9 August 2021, 18:30
The justice system, banks, and private companies use algorithms to make decisions that have profound impacts on people’s lives. Unfortunately, those algorithms are sometimes biased — disproportionately impacting people of color as well as individuals in lower income classes when they apply for loans or jobs, or even when courts decide what bail should be set while a person awaits trial.
MIT researchers have developed a new artificial intelligence programming language that can assess the fairness of algorithms more exactly, and more quickly, than available alternatives.
Their Sum-Product Probabilistic Language (SPPL) is a probabilistic programming system. Probabilistic programming is an emerging field at the intersection of programming languages and artificial intelligence that aims to make AI systems much easier to develop, with early successes in computer vision, common-sense data cleaning, and automated data modeling. Probabilistic programming languages make it much easier for programmers to define probabilistic models and carry out probabilistic inference — that is, work backward to infer probable explanations for observed data.
“There are previous systems that can solve various fairness questions. Our system is not the first; but because our system is specialized and optimized for a certain class of models, it can deliver solutions thousands of times faster,” says Feras Saad, a PhD student in electrical engineering and computer science (EECS) and first author on a recent paper describing the work. Saad adds that the speedups are not insignificant: The system can be up to 3,000 times faster than previous approaches.
SPPL gives fast, exact solutions to probabilistic inference questions such as “How likely is the model to recommend a loan to someone over age 40?” or “Generate 1,000 synthetic loan applicants, all under age 30, whose loans will be approved.” These inference results are based on SPPL programs that encode probabilistic models of what kinds of applicants are likely, a priori, and also how to classify them. Fairness questions that SPPL can answer include “Is there a difference between the probability of recommending a loan to an immigrant and nonimmigrant applicant with the same socioeconomic status?” or “What’s the probability of a hire, given that the candidate is qualified for the job and from an underrepresented group?”
SPPL is different from most probabilistic programming languages, as SPPL only allows users to write probabilistic programs for which it can automatically deliver exact probabilistic inference results. SPPL also makes it possible for users to check how fast inference will be, and therefore avoid writing slow programs. In contrast, other probabilistic programming languages such as Gen and Pyro allow users to write down probabilistic programs where the only known ways to do inference are approximate — that is, the results include errors whose nature and magnitude can be hard to characterize.
Error from approximate probabilistic inference is tolerable in many AI applications. But it is undesirable to have inference errors corrupting results in socially impactful applications of AI, such as automated decision-making, and especially in fairness analysis.
Jean-Baptiste Tristan, associate professor at Boston College and former research scientist at Oracle Labs, who was not involved in the new research, says, “I’ve worked on fairness analysis in academia and in real-world, large-scale industry settings. SPPL offers improved flexibility and trustworthiness over other PPLs on this challenging and important class of problems due to the expressiveness of the language, its precise and simple semantics, and the speed and soundness of the exact symbolic inference engine.”
SPPL avoids errors by restricting to a carefully designed class of models that still includes a broad class of AI algorithms, including the decision tree classifiers that are widely used for algorithmic decision-making. SPPL works by compiling probabilistic programs into a specialized data structure called a “sum-product expression.” SPPL further builds on the emerging theme of using probabilistic circuits as a representation that enables efficient probabilistic inference. This approach extends prior work on sum-product networks to models and queries expressed via a probabilistic programming language. However, Saad notes that this approach comes with limitations: “SPPL is substantially faster for analyzing the fairness of a decision tree, for example, but it can’t analyze models like neural networks. Other systems can analyze both neural networks and decision trees, but they tend to be slower and give inexact answers.”
“SPPL shows that exact probabilistic inference is practical, not just theoretically possible, for a broad class of probabilistic programs,” says Vikash Mansinghka, an MIT principal research scientist and senior author on the paper. “In my lab, we’ve seen symbolic inference driving speed and accuracy improvements in other inference tasks that we previously approached via approximate Monte Carlo and deep learning algorithms. We’ve also been applying SPPL to probabilistic programs learned from real-world databases, to quantify the probability of rare events, generate synthetic proxy data given constraints, and automatically screen data for probable anomalies.”
The new SPPL probabilistic programming language was presented in June at the ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI), in a paper that Saad co-authored with MIT EECS Professor Martin Rinard and Mansinghka. SPPL is implemented in Python and is available open source. More
63 Shares99 Views
in Data Management & Statistics
Finding common ground in Malden
by Markus Andrews 4 August 2021, 20:15
When disparate groups convene around a common goal, exciting things can happen.
That is the inspiring story unfolding in Malden, Massachusetts, a city of about 60,000 — nearly half people of color — where a new type of community coalition continues to gain momentum on its plan to build a climate-resilient waterfront park along its river. The Malden River Works (MRW) project, recipient of the inaugural Leventhal City Prize, is seeking to connect to a contiguous greenway network where neighboring cities already have visitors coming to their parks and enjoying recreational boating. More important, the MRW is changing the model for how cities address civic growth, community engagement, equitable climate resilience, and environmental justice.
The MRW’s steering committee consists of eight resident leaders of color, a resident environmental advocate, and three city representatives. One of the committee’s primary responsibilities is providing direction to the MRW’s project team, which includes urban designers, watershed and climate resilience planners, and a community outreach specialist. MIT’s Kathleen Vandiver, director of the Community Outreach Education and Engagement Core at MIT’s Center for Environmental Health Sciences (CEHS), and Marie Law Adams MArch ’06, a lecturer in the School of Architecture and Planning’s Department of Urban Studies and Planning (DUSP), serve on the project team.
“This governance structure is somewhat unusual,” says Adams. “More typical is having city government as the primary decision-maker. It is important that one of the first things our team did was build a steering committee that is the decision maker on this project.”
Evan Spetrini ’18 is the senior planner and policy manager for the Malden Redevelopment Authority and sits on both the steering committee and project team. He says placing the decision-making power with the steering committee and building it to be representative of marginalized communities was intentional.
“Changing that paradigm of power and decision-making in planning processes was the way we approached social resilience,” says Spetrini. “We have always intended this project to be a model for future planning projects in Malden.”
This model ushers in a new history chapter for a city founded in 1640.
Located about six miles north of Boston, Malden was home to mills and factories that used the Malden River for power, and a site for industrial waste over the last two centuries. Decades after the city’s industrial decline, there is little to no public access to the river. Many residents were not even aware there was a river in their city. Before the project was under way, Vandiver initiated a collaborative effort to evaluate the quality of the river’s water. Working with the Mystic River Watershed Association, Gradient Corporation, and CEHS, water samples were tested and a risk analysis conducted.
“Having the study done made it clear the public could safely enjoy boating on the water,” says Vandiver. “It was a breakthrough that allowed people to see the river as an amenity.”
A team effort
Marcia Manong had never seen the river, but the Malden resident was persuaded to join the steering committee with the promise the project would be inclusive and of value to the community. Manong has been involved with civic engagement most of her life in the United States and for 20 years in South Africa.
“It wasn’t going to be a marginalized, token-ized engagement,” says Manong. “It was clear to me that they were looking for people that would actually be sitting at the table.”
Manong agreed to recruit additional people of color to join the team. From the beginning, she says, language was a huge barrier, given that nearly half of Malden’s residents do not speak English at home. Finding the translation efforts at their public events to be inadequate, the steering committee directed more funds to be made available for translation in several languages when public meetings began being held over Zoom this past year.
“It’s unusual for most cities to spend this money, but our population is so diverse that we require it,” says Manong. “We have to do it. If the steering committee wasn’t raising this issue with the rest of the team, perhaps this would be overlooked.”
Another alteration the steering committee has made is how the project engages with the community. While public attendance at meetings had been successful before the pandemic, Manong says they are “constantly working” to reach new people. One method has been to request invitations to attend the virtual meetings of other organizations to keep them apprised of the project.
“We’ve said that people feel most comfortable when they’re in their own surroundings, so why not go where the people are instead of trying to get them to where we are,” says Manong.
Buoyed by the $100,000 grant from MIT’s Norman B. Leventhal Center for Advanced Urbanism (LCAU) in 2019, the project team worked with Malden’s Department of Public Works, which is located along the river, to redesign its site and buildings and to study how to create a flood-resistant public open space as well as an elevated greenway path, connecting with other neighboring cities’ paths. The park’s plans also call for 75 new trees to reduce urban heat island effect, open lawn for gathering, and a dock for boating on the river.
“The storm water infrastructure in these cities is old and isn’t going to be able to keep up with increased precipitation,” says Adams. “We’re looking for ways to store as much water as possible on the DPW site so we can hold it and release it more gradually into the river to avoid flooding.”
The project along the 2.3-mile-long river continues to receive attention. Recently, the city of Malden was awarded a 2021 Accelerating Climate Resilience Grant of more than $50,000 from the state’s Metropolitan Area Planning Council and the Barr Foundation to support the project. Last fall, the project was awarded a $150,015 Municipal Vulnerability Preparedness Action Grant. Both awards are being directed to fund engineering work to refine the project’s design.
“We — and in general, the planning profession — are striving to create more community empowerment in decision-making as to what happens to their community,” says Spetrini. “Putting the power in the community ensures that it’s actually responding to the needs of the community.”
Contagious enthusiasm
Manong says she’s happy she got involved with the project and believes the new governance structure is making a difference.
“This project is definitely engaging with communities of color in a manner that is transformative and that is looking to build a long-lasting power dynamic built on trust,” she says. “It’s a new energized civic engagement and we’re making that happen. It’s very exciting.”
Spetrini finds the challenge of creating an open space that’s publicly accessible and alongside an active work site professionally compelling.
“There is a way to preserve the industrial employment base while also giving the public greater access to this natural resource,” he says. “It has real implications for other communities to follow this type of model.”
Despite the pandemic this past year, enthusiasm for the project is palpable. For Spetrini, a Malden resident, it’s building “the first significant piece of what has been envisioned as the Malden River Greenway.” Adams sees the total project as a way to build social resilience as well as garnering community interest in climate resilience. For Vandiver, it’s the implications for improved community access.
“From a health standpoint, everybody has learned from Covid-19 that the health aspects of walking in nature are really restorative,” says Vandiver. “Creating greater green space gives more attention to health issues. These are seemingly small side benefits, but they’re huge for mental health benefits.”
Leventhal City Prize’s next cycle
The Leventhal City Prize was established by the LCAU to catalyze innovative, interdisciplinary urban design, and planning approaches worldwide to improve both the environment and the quality of life for residents. Support for the LCAU was provided by the Muriel and Norman B. Leventhal Family Foundation and the Sherry and Alan Leventhal Family Foundation.
“We’re thrilled with inaugural recipients of the award and the extensive work they’ve undertaken that is being held up as an exemplary model for others to learn from,” says Sarah Williams, LCAU director and a professor in DUSP. “Their work reflects the prize’s intent. We look forward to catalyzing these types of collaborative partnership in the next prize cycle.”
Submissions for the next cycle of the Leventhal City Prize will open in early 2022. More
75 Shares169 Views
in Data Management & Statistics
A comprehensive study of technological change
by Markus Andrews 2 August 2021, 18:00
The societal impacts of technological change can be seen in many domains, from messenger RNA vaccines and automation to drones and climate change. The pace of that technological change can affect its impact, and how quickly a technology improves in performance can be an indicator of its future importance. For decision-makers like investors, entrepreneurs, and policymakers, predicting which technologies are fast improving (and which are overhyped) can mean the difference between success and failure.
New research from MIT aims to assist in the prediction of technology performance improvement using U.S. patents as a dataset. The study describes 97 percent of the U.S. patent system as a set of 1,757 discrete technology domains, and quantitatively assesses each domain for its improvement potential.
“The rate of improvement can only be empirically estimated when substantial performance measurements are made over long time periods,” says Anuraag Singh SM ’20, lead author of the paper. “In some large technological fields, including software and clinical medicine, such measures have rarely, if ever, been made.”
A previous MIT study provided empirical measures for 30 technological domains, but the patent sets identified for those technologies cover less than 15 percent of the patents in the U.S. patent system. The major purpose of this new study is to provide predictions of the performance improvement rates for the thousands of domains not accessed by empirical measurement. To accomplish this, the researchers developed a method using a new probability-based algorithm, machine learning, natural language processing, and patent network analytics.
Overlap and centrality
A technology domain, as the researchers define it, consists of sets of artifacts fulfilling a specific function using a specific branch of scientific knowledge. To find the patents that best represent a domain, the team built on previous research conducted by co-author Chris Magee, a professor of the practice of engineering systems within the Institute for Data, Systems, and Society (IDSS). Magee and his colleagues found that by looking for patent overlap between the U.S. and international patent-classification systems, they could quickly identify patents that best represent a technology. The researchers ultimately created a correspondence of all patents within the U.S. patent system to a set of 1,757 technology domains.
To estimate performance improvement, Singh employed a method refined by co-authors Magee and Giorgio Triulzi, a researcher with the Sociotechnical Systems Research Center (SSRC) within IDSS and an assistant professor at Universidad de los Andes in Colombia. Their method is based on the average “centrality” of patents in the patent citation network. Centrality refers to multiple criteria for determining the ranking or importance of nodes within a network.
“Our method provides predictions of performance improvement rates for nearly all definable technologies for the first time,” says Singh.
Those rates vary — from a low of 2 percent per year for the “Mechanical skin treatment — Hair removal and wrinkles” domain to a high of 216 percent per year for the “Dynamic information exchange and support systems integrating multiple channels” domain. The researchers found that most technologies improve slowly; more than 80 percent of technologies improve at less than 25 percent per year. Notably, the number of patents in a technological area was not a strong indicator of a higher improvement rate.
“Fast-improving domains are concentrated in a few technological areas,” says Magee. “The domains that show improvement rates greater than the predicted rate for integrated chips — 42 percent, from Moore’s law — are predominantly based upon software and algorithms.”
TechNext Inc.
The researchers built an online interactive system where domains corresponding to technology-related keywords can be found along with their improvement rates. Users can input a keyword describing a technology and the system returns a prediction of improvement for the technological domain, an automated measure of the quality of the match between the keyword and the domain, and patent sets so that the reader can judge the semantic quality of the match.
Moving forward, the researchers have founded a new MIT spinoff called TechNext Inc. to further refine this technology and use it to help leaders make better decisions, from budgets to investment priorities to technology policy. Like any inventors, Magee and his colleagues want to protect their intellectual property rights. To that end, they have applied for a patent for their novel system and its unique methodology.
“Technologies that improve faster win the market,” says Singh. “Our search system enables technology managers, investors, policymakers, and entrepreneurs to quickly look up predictions of improvement rates for specific technologies.”
Adds Magee: “Our goal is to bring greater accuracy, precision, and repeatability to the as-yet fuzzy art of technology forecasting.” More

Data Management & Statistics

More stories

3 Questions: Peko Hosoi on the data-driven reasoning behind MIT’s Covid-19 policies for the fall

Last-mile routing research challenge awards $175,000 to three winning teams

Helping companies optimize their websites and mobile apps

Smarter regulation of global shipping emissions could improve air quality and health outcomes

A comprehensive study of technological change

ITALIAN LANGUAGE

ENGLISH LANGUAGE