More stories

  • in

    Deep-learning technique predicts clinical treatment outcomes

    When it comes to treatment strategies for critically ill patients, clinicians want to be able to consider all their options and timing of administration, and make the optimal decision for their patients. While clinician experience and study has helped them to be successful in this effort, not all patients are the same, and treatment decisions at this crucial time could mean the difference between patient improvement and quick deterioration. Therefore, it would be helpful for doctors to be able to take a patient’s previous known health status and received treatments and use that to predict that patient’s health outcome under different treatment scenarios, in order to pick the best path.

    Now, a deep-learning technique, called G-Net, from researchers at MIT and IBM provides a window into causal counterfactual prediction, affording physicians the opportunity to explore how a patient might fare under different treatment plans. The foundation of G-Net is the g-computation algorithm, a causal inference method that estimates the effect of dynamic exposures in the presence of measured confounding variables — ones that may influence both treatments and outcomes. Unlike previous implementations of the g-computation framework, which have used linear modeling approaches, G-Net uses recurrent neural networks (RNN), which have node connections that allow them to better model temporal sequences with complex and nonlinear dynamics, like those found in the physiological and clinical time series data. In this way, physicians can develop alternative plans based on patient history and test them before making a decision.

    “Our ultimate goal is to develop a machine learning technique that would allow doctors to explore various ‘What if’ scenarios and treatment options,” says Li-wei Lehman, MIT research scientist in the MIT Institute for Medical Engineering and Science and an MIT-IBM Watson AI Lab project lead. “A lot of work has been done in terms of deep learning for counterfactual prediction but [it’s] been focusing on a point exposure setting,” or a static, time-varying treatment strategy, which doesn’t allow for adjustment of treatments as patient history changes. However, her team’s new prediction approach provides for treatment plan flexibility and chances for treatment alteration over time as patient covariate history and past treatments change. “G-Net is the first deep-learning approach based on g-computation that can predict both the population-level and individual-level treatment effects under dynamic and time varying treatment strategies.”

    The research, which was recently published in the Proceedings of Machine Learning Research, was co-authored by Rui Li MEng ’20, Stephanie Hu MEng ’21, former MIT postdoc Mingyu Lu MD, graduate student Yuria Utsumi, IBM research staff member Prithwish Chakraborty, IBM Research director of Hybrid Cloud Services Daby Sow, IBM data scientist Piyush Madan, IBM research scientist Mohamed Ghalwash, and IBM research scientist Zach Shahn.

    Tracking disease progression

    To build, validate, and test G-Net’s predictive abilities, the researchers considered the circulatory system in septic patients in the ICU. During critical care, doctors need to make trade-offs and judgement calls, such as ensuring the organs are receiving adequate blood supply without overworking the heart. For this, they could give intravenous fluids to patients to increase blood pressure; however, too much can cause edema. Alternatively, physicians can administer vasopressors, which act to contract blood vessels and raise blood pressure.

    In order to mimic this and demonstrate G-Net’s proof-of-concept, the team used CVSim, a mechanistic model of a human cardiovascular system that’s governed by 28 input variables characterizing the system’s current state, such as arterial pressure, central venous pressure, total blood volume, and total peripheral resistance, and modified it to simulate various disease processes (e.g., sepsis or blood loss) and effects of interventions (e.g., fluids and vasopressors). The researchers used CVSim to generate observational patient data for training and for “ground truth” comparison against counterfactual prediction. In their G-Net architecture, the researchers ran two RNNs to handle and predict variables that are continuous, meaning they can take on a range of values, like blood pressure, and categorical variables, which have discrete values, like the presence or absence of pulmonary edema. The researchers simulated the health trajectories of thousands of “patients” exhibiting symptoms under one treatment regime, let’s say A, for 66 timesteps, and used them to train and validate their model.

    Testing G-Net’s prediction capability, the team generated two counterfactual datasets. Each contained roughly 1,000 known patient health trajectories, which were created from CVSim using the same “patient” condition as the starting point under treatment A. Then at timestep 33, treatment changed to plan B or C, depending on the dataset. The team then performed 100 prediction trajectories for each of these 1,000 patients, whose treatment and medical history was known up until timestep 33 when a new treatment was administered. In these cases, the prediction agreed well with the “ground-truth” observations for individual patients and averaged population-level trajectories.

    A cut above the rest

    Since the g-computation framework is flexible, the researchers wanted to examine G-Net’s prediction using different nonlinear models — in this case, long short-term memory (LSTM) models, which are a type of RNN that can learn from previous data patterns or sequences — against the more classical linear models and a multilayer perception model (MLP), a type of neural network that can make predictions using a nonlinear approach. Following a similar setup as before, the team found that the error between the known and predicted cases was smallest in the LSTM models compared to the others. Since G-Net is able to model the temporal patterns of the patient’s ICU history and past treatment, whereas a linear model and MLP cannot, it was better able to predict the patient’s outcome.

    The team also compared G-Net’s prediction in a static, time-varying treatment setting against two state-of-the-art deep-learning based counterfactual prediction approaches, a recurrent marginal structural network (rMSN) and a counterfactual recurrent neural network (CRN), as well as a linear model and an MLP. For this, they investigated a model for tumor growth under no treatment, radiation, chemotherapy, and both radiation and chemotherapy scenarios. “Imagine a scenario where there’s a patient with cancer, and an example of a static regime would be if you only give a fixed dosage of chemotherapy, radiation, or any kind of drug, and wait until the end of your trajectory,” comments Lu. For these investigations, the researchers generated simulated observational data using tumor volume as the primary influence dictating treatment plans and demonstrated that G-Net outperformed the other models. One potential reason could be because g-computation is known to be more statistically efficient than rMSN and CRN, when models are correctly specified.

    While G-Net has done well with simulated data, more needs to be done before it can be applied to real patients. Since neural networks can be thought of as “black boxes” for prediction results, the researchers are beginning to investigate the uncertainty in the model to help ensure safety. In contrast to these approaches that recommend an “optimal” treatment plan without any clinician involvement, “as a decision support tool, I believe that G-Net would be more interpretable, since the clinicians would input treatment strategies themselves,” says Lehman, and “G-Net will allow them to be able to explore different hypotheses.” Further, the team has moved on to using real data from ICU patients with sepsis, bringing it one step closer to implementation in hospitals.

    “I think it is pretty important and exciting for real-world applications,” says Hu. “It’d be helpful to have some way to predict whether or not a treatment might work or what the effects might be — a quicker iteration process for developing these hypotheses for what to try, before actually trying to implement them in in a years-long, potentially very involved and very invasive type of clinical trial.”

    This research was funded by the MIT-IBM Watson AI Lab. More

  • in

    The downside of machine learning in health care

    While working toward her dissertation in computer science at MIT, Marzyeh Ghassemi wrote several papers on how machine-learning techniques from artificial intelligence could be applied to clinical data in order to predict patient outcomes. “It wasn’t until the end of my PhD work that one of my committee members asked: ‘Did you ever check to see how well your model worked across different groups of people?’”

    That question was eye-opening for Ghassemi, who had previously assessed the performance of models in aggregate, across all patients. Upon a closer look, she saw that models often worked differently — specifically worse — for populations including Black women, a revelation that took her by surprise. “I hadn’t made the connection beforehand that health disparities would translate directly to model disparities,” she says. “And given that I am a visible minority woman-identifying computer scientist at MIT, I am reasonably certain that many others weren’t aware of this either.”

    In a paper published Jan. 14 in the journal Patterns, Ghassemi — who earned her doctorate in 2017 and is now an assistant professor in the Department of Electrical Engineering and Computer Science and the MIT Institute for Medical Engineering and Science (IMES) — and her coauthor, Elaine Okanyene Nsoesie of Boston University, offer a cautionary note about the prospects for AI in medicine. “If used carefully, this technology could improve performance in health care and potentially reduce inequities,” Ghassemi says. “But if we’re not actually careful, technology could worsen care.”

    It all comes down to data, given that the AI tools in question train themselves by processing and analyzing vast quantities of data. But the data they are given are produced by humans, who are fallible and whose judgments may be clouded by the fact that they interact differently with patients depending on their age, gender, and race, without even knowing it.

    Furthermore, there is still great uncertainty about medical conditions themselves. “Doctors trained at the same medical school for 10 years can, and often do, disagree about a patient’s diagnosis,” Ghassemi says. That’s different from the applications where existing machine-learning algorithms excel — like object-recognition tasks — because practically everyone in the world will agree that a dog is, in fact, a dog.

    Machine-learning algorithms have also fared well in mastering games like chess and Go, where both the rules and the “win conditions” are clearly defined. Physicians, however, don’t always concur on the rules for treating patients, and even the win condition of being “healthy” is not widely agreed upon. “Doctors know what it means to be sick,” Ghassemi explains, “and we have the most data for people when they are sickest. But we don’t get much data from people when they are healthy because they’re less likely to see doctors then.”

    Even mechanical devices can contribute to flawed data and disparities in treatment. Pulse oximeters, for example, which have been calibrated predominately on light-skinned individuals, do not accurately measure blood oxygen levels for people with darker skin. And these deficiencies are most acute when oxygen levels are low — precisely when accurate readings are most urgent. Similarly, women face increased risks during “metal-on-metal” hip replacements, Ghassemi and Nsoesie write, “due in part to anatomic differences that aren’t taken into account in implant design.” Facts like these could be buried within the data fed to computer models whose output will be undermined as a result.

    Coming from computers, the product of machine-learning algorithms offers “the sheen of objectivity,” according to Ghassemi. But that can be deceptive and dangerous, because it’s harder to ferret out the faulty data supplied en masse to a computer than it is to discount the recommendations of a single possibly inept (and maybe even racist) doctor. “The problem is not machine learning itself,” she insists. “It’s people. Human caregivers generate bad data sometimes because they are not perfect.”

    Nevertheless, she still believes that machine learning can offer benefits in health care in terms of more efficient and fairer recommendations and practices. One key to realizing the promise of machine learning in health care is to improve the quality of data, which is no easy task. “Imagine if we could take data from doctors that have the best performance and share that with other doctors that have less training and experience,” Ghassemi says. “We really need to collect this data and audit it.”

    The challenge here is that the collection of data is not incentivized or rewarded, she notes. “It’s not easy to get a grant for that, or ask students to spend time on it. And data providers might say, ‘Why should I give my data out for free when I can sell it to a company for millions?’ But researchers should be able to access data without having to deal with questions like: ‘What paper will I get my name on in exchange for giving you access to data that sits at my institution?’

    “The only way to get better health care is to get better data,” Ghassemi says, “and the only way to get better data is to incentivize its release.”

    It’s not only a question of collecting data. There’s also the matter of who will collect it and vet it. Ghassemi recommends assembling diverse groups of researchers — clinicians, statisticians, medical ethicists, and computer scientists — to first gather diverse patient data and then “focus on developing fair and equitable improvements in health care that can be deployed in not just one advanced medical setting, but in a wide range of medical settings.”

    The objective of the Patterns paper is not to discourage technologists from bringing their expertise in machine learning to the medical world, she says. “They just need to be cognizant of the gaps that appear in treatment and other complexities that ought to be considered before giving their stamp of approval to a particular computer model.” More

  • in

    The promise and pitfalls of artificial intelligence explored at TEDxMIT event

    Scientists, students, and community members came together last month to discuss the promise and pitfalls of artificial intelligence at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) for the fourth TEDxMIT event held at MIT. 

    Attendees were entertained and challenged as they explored “the good and bad of computing,” explained CSAIL Director Professor Daniela Rus, who organized the event with John Werner, an MIT fellow and managing director of Link Ventures; MIT sophomore Lucy Zhao; and grad student Jessica Karaguesian. “As you listen to the talks today,” Rus told the audience, “consider how our world is made better by AI, and also our intrinsic responsibilities for ensuring that the technology is deployed for the greater good.”

    Rus mentioned some new capabilities that could be enabled by AI: an automated personal assistant that could monitor your sleep phases and wake you at the optimal time, as well as on-body sensors that monitor everything from your posture to your digestive system. “Intelligent assistance can help empower and augment our lives. But these intriguing possibilities should only be pursued if we can simultaneously resolve the challenges that these technologies bring,” said Rus. 

    The next speaker, CSAIL principal investigator and professor of electrical engineering and computer science Manolis Kellis, started off by suggesting what sounded like an unattainable goal — using AI to “put an end to evolution as we know it.” Looking at it from a computer science perspective, he said, what we call evolution is basically a brute force search. “You’re just exploring all of the search space, creating billions of copies of every one of your programs, and just letting them fight against each other. This is just brutal. And it’s also completely slow. It took us billions of years to get here.” Might it be possible, he asked, to speed up evolution and make it less messy?

    The answer, Kellis said, is that we can do better, and that we’re already doing better: “We’re not killing people like Sparta used to, throwing the weaklings off the mountain. We are truly saving diversity.”

    Knowledge, moreover, is now being widely shared, passed on “horizontally” through accessible information sources, he noted, rather than “vertically,” from parent to offspring. “I would like to argue that competition in the human species has been replaced by collaboration. Despite having a fixed cognitive hardware, we have software upgrades that are enabled by culture, by the 20 years that our children spend in school to fill their brains with everything that humanity has learned, regardless of which family came up with it. This is the secret of our great acceleration” — the fact that human advancement in recent centuries has vastly out-clipped evolution’s sluggish pace.

    The next step, Kellis said, is to harness insights about evolution in order to combat an individual’s genetic susceptibility to disease. “Our current approach is simply insufficient,” he added. “We’re treating manifestations of disease, not the causes of disease.” A key element in his lab’s ambitious strategy to transform medicine is to identify “the causal pathways through which genetic predisposition manifests. It’s only by understanding these pathways that we can truly manipulate disease causation and reverse the disease circuitry.” 

    Kellis was followed by Aleksander Madry, MIT professor of electrical engineering and computer science and CSAIL principal investigator, who told the crowd, “progress in AI is happening, and it’s happening fast.” Computer programs can routinely beat humans in games like chess, poker, and Go. So should we be worried about AI surpassing humans? 

    Madry, for one, is not afraid — or at least not yet. And some of that reassurance stems from research that has led him to the following conclusion: Despite its considerable success, AI, especially in the form of machine learning, is lazy. “Think about being lazy as this kind of smart student who doesn’t really want to study for an exam. Instead, what he does is just study all the past years’ exams and just look for patterns. Instead of trying to actually learn, he just tries to pass the test. And this is exactly the same way in which current AI is lazy.”

    A machine-learning model might recognize grazing sheep, for instance, simply by picking out pictures that have green grass in them. If a model is trained to identify fish from photos of anglers proudly displaying their catches, Madry explained, “the model figures out that if there’s a human holding something in the picture, I will just classify it as a fish.” The consequences can be more serious for an AI model intended to pick out malignant tumors. If the model is trained on images containing rulers that indicate the size of tumors, the model may end up selecting only those photos that have rulers in them.

    This leads to Madry’s biggest concerns about AI in its present form. “AI is beating us now,” he noted. “But the way it does it [involves] a little bit of cheating.” He fears that we will apply AI “in some way in which this mismatch between what the model actually does versus what we think it does will have some catastrophic consequences.” People relying on AI, especially in potentially life-or-death situations, need to be much more mindful of its current limitations, Madry cautioned.

    There were 10 speakers altogether, and the last to take the stage was MIT associate professor of electrical engineering and computer science and CSAIL principal investigator Marzyeh Ghassemi, who laid out her vision for how AI could best contribute to general health and well-being. But in order for that to happen, its models must be trained on accurate, diverse, and unbiased medical data.

    It’s important to focus on the data, Ghassemi stressed, because these models are learning from us. “Since our data is human-generated … a neural network is learning how to practice from a doctor. But doctors are human, and humans make mistakes. And if a human makes a mistake, and we train an AI from that, the AI will, too. Garbage in, garbage out. But it’s not like the garbage is distributed equally.”

    She pointed out that many subgroups receive worse care from medical practitioners, and members of these subgroups die from certain conditions at disproportionately high rates. This is an area, Ghassemi said, “where AI can actually help. This is something we can fix.” Her group is developing machine-learning models that are robust, private, and fair. What’s holding them back is neither algorithms nor GPUs. It’s data. Once we collect reliable data from diverse sources, Ghassemi added, we might start reaping the benefits that AI can bring to the realm of health care.

    In addition to CSAIL speakers, there were talks from members across MIT’s Institute for Data, Systems, and Society; the MIT Mobility Initiative; the MIT Media Lab; and the SENSEable City Lab.

    The proceedings concluded on that hopeful note. Rus and Werner then thanked everyone for coming. “Please continue to reflect about the good and bad of computing,” Rus urged. “And we look forward to seeing you back here in May for the next TEDxMIT event.”

    The exact theme of the spring 2022 gathering will have something to do with “superpowers.” But — if December’s mind-bending presentations were any indication — the May offering is almost certain to give its attendees plenty to think about. And maybe provide the inspiration for a startup or two. More

  • in

    Reducing food waste to increase access to affordable foods

    About a third of the world’s food supply never gets eaten. That means the water, labor, energy, and fertilizer that went into growing, processing, and distributing the food is wasted.

    On the other end of the supply chain are cash-strapped consumers, who have been further distressed in recent years by factors like the Covid-19 pandemic and inflation.

    Spoiler Alert, a company founded by two MIT alumni, is helping companies bridge the gap between food waste and food insecurity with a platform connecting major food and beverage brands with discount grocers, retailers, and nonprofits. The platform helps brands discount or donate excess and short-dated inventory days, weeks, and months before it expires.

    “There is a tremendous amount of underutilized data that exists in the manufacturing and distribution space that results in good food going to waste,” says Ricky Ashenfelter MBA ’15, who co-founded the company with Emily Malina MBA ’15.

    Spoiler Alert helps brands manage distressed inventory data, create offers for potential buyers, and review and accept bids. The platform is designed to work with companies’ existing inventory and fulfillment systems, using automation and pricing intelligence to further streamline sales.

    “At a high level, we’re a waste-prevention software built for sales and supply-chain teams,” Ashenfelter says. “You can think of it as a private [business-to-business] eBay of sorts.”

    Spoiler Alert is working with global companies like Nestle, Kraft Heinz, and Danone, as well as discount grocers like the United Grocery Outlet and Misfits Market. Those brands are already using the platform to reduce food waste and get more food on people’s tables.

    “Project Drawdown [a nonprofit working on climate solutions] has identified food waste as the number one priority to address the global climate crisis, so these types of corporate initiatives can be really powerful from an environmental standpoint,” Ashenfelter says, noting the nonprofit estimates food waste accounts for 8 percent of global greenhouse gas emissions. “Contrast that with growing levels of food insecurity and folks not being able to access affordable nutrition, and you start to see how tackling supply-chain inefficiency can have a dramatic impact from both an environmental and a social lens. That’s what motivates us.”

    Untapped data for change

    Ashenfelter came to MIT’s Sloan School of Management after several years in sustainability software and management consulting within the retail and consumer products industries.

    “I was really attracted to transitioning into something much more entrepreneurial, and to leverage not only Sloan’s focus on entrepreneurship, but also the broader MIT ecosystem’s focus on technology, entrepreneurship, clean tech innovation, and other themes along that front,” he says.

    Ashenfelter met Malina at one of Sloan’s admitted students events in 2013, and the founders soon set out to use data to decrease food waste.

    “For us, the idea was clear: How do we better leverage data to manage excess and short-dated inventory?” Ashenfelter says. “How we go about that has evolved over the last six years, but it’s all rooted in solving an enormous climate problem, solving a major food insecurity problem, and from a capitalistic standpoint, helping businesses cut costs and generate revenue from otherwise wasted products.”

    The founders spent many hours in the Martin Trust Center for MIT Entrepreneurship with support from the Sloan Sustainability Initiative, and used Spoiler Alert as a case study in nearly every class they took, thinking through product development, sales, marketing, pricing, and more through their coursework.

    “We brought our idea into just about every action learning class that we could at Sloan and MIT,” Ashenfelter says.

    They also participated in the MIT $100K Entrepreneurship Competition and received support from the Venture Mentoring Service and the IDEAS Global Challenge program.

    Upon graduation, the founders initially began building a platform to facilitate donations of excess inventory, but soon learned big companies’ processes for discounting that inventory were also highly manual. Today, more than 90 percent of Spoiler Alert’s transaction volume is discounted, with the remainder donated.

    Different teams within an organization can upload excess inventory reports to Spoiler Alert’s system, eliminating the need to manually aggregate datasets and preparing what the industry refers to as “blowout lists” to sell. Spoiler Alert uses machine-learning-based tools to help both parties with pricing and negotiations to close deals more quickly.

    “Companies are taking pretty manual and slow approaches to deciding [what to do with excess inventory],” Ashenfelter says. “And when you have slow decision-making, you’re losing days or even weeks of shelf life on that product. That can be the difference between selling product versus donating, and donating versus dumping.”

    Once a deal has been made, Spoiler Alert automatically generates the forms and workflows needed by fulfillment teams to get the product out the door. The relationships companies build on the platform are also a major driver for cutting down waste.

    “We’re providing suppliers with the ability to control where their discounted and donated product ends up,” Ashenfelter says. “That’s really powerful because it allows these CPG brands to ensure that this product is, in many cases, getting to affordable nutrition outlets in underserved communities.”

    Ashenfelter says the majority of inventory goes to regional and national discount grocers, supplemented with extensive purchasing from local and nonprofit grocery chains.

    “Everything we do is oriented around helping sell as much product as possible to a reputable set of buyers at the most fair, equitable prices possible,” Ashenfelter says.

    Scaling for impact

    The pandemic has disrupted many aspects of the food supply chains. But Ashenfelter says it has also accelerated the adoption of digital solutions that can better manage such volatility.

    When Campbell began using Spoiler Alert’s system in 2019, for instance, it achieved a 36 percent increase in discount sales and a 27 percent increase in donations over the first five months.

    Ashenfelter says the results have proven that companies’ sustainability targets can go hand in hand with initiatives that boost their bottom lines. In fact, because Spoiler Alert focuses so much on the untapped revenue associated with food waste, many customers don’t even realize Spoiler Alert is a sustainability company until after they’ve signed on.

    “What’s neat about this program is that it becomes an incredibly powerful case study internally for how sustainability and operational outcomes aren’t in conflict and can drive both business results as well as overall environmental impact,” Ashenfelter says.

    Going forward, Spoiler Alert will continue building out algorithmic solutions that could further cut down on waste internationally and across a wider array of products.

    “At every step in our process, we’re collecting a tremendous amount of data in terms of what is and isn’t selling, at what price point, to which buyers, out of which geographies, and with how much remaining shelf life,” Ashenfelter explains. “We are only starting to scratch the surface in terms of bringing our recommendations engine to life for our suppliers and buyers. Ultimately our goal is to power the waste-free economy, and rooted in that is making better decisions faster, in collaboration with a growing ecosystem of supply chain partners, and with as little manual intervention as possible.” More

  • in

    End-to-end supply chain transparency

    For years, companies have managed their extended supply chains with intermittent audits and certifications while attempting to persuade their suppliers to adhere to certain standards and codes of conduct. But they’ve lacked the concrete data necessary to prove their supply chains were working as they should. They most likely had baseline data about their suppliers — what they bought and who they bought it from — but knew little else about the rest of the supply chain.

    With Sourcemap, companies can now trace their supply chains from raw material to finished good with certainty, keeping track of the mines and farms that produce the commodities they rely on to take their goods to market. This unprecedented level of transparency provides Sourcemap’s customers with the assurance that the entire end-to-end supply chain operates within their standards while living up to social and environmental targets.

    And they’re doing it at scale for large multinationals across the food, agricultural, automotive, tech, and apparel industries. Thanks to Sourcemap founder and CEO Leonardo Bonanni MA ’03, SM ’05, PhD ’10, companies like VF Corporation, owner of brands like Timberland, The North Face, Mars, Hershey, and Ferrero, now have enough data to confidently tell the story of how they’re sourcing their raw materials.

    “Coming from the Media Lab, we recognized early on the power of the cloud, the power of social networking-type databases and smartphone diffusion around the world,” says Bonanni of his company’s MIT roots. Rather than providing intermittent glances at the supply chain via an auditor, Sourcemap collects data continuously, in real-time, every step of the way, flagging anything that could indicate counterfeiting, adulteration, fraud, waste, or abuse.

    “We’ve taken our customers from a situation where they had very little control to a world where they have direct visibility over their entire global operations, even allowing them to see ahead of time — before a container reaches the port — whether there is any indication that there might be something wrong with it,” says Bonanni.

    The key problem Sourcemap addresses is a lack of data in companies’ supply chain management databases. According to Bonanni, most Sourcemap customers have invested millions of dollars in enterprise resource planning (ERP) databases, which provide information about internal operations and direct suppliers, but fall short when it comes to global operations, where their secondary and tertiary suppliers operate. Built on relational databases, ERP systems have been around for more than 40 years and work well for simple, static data structures. But they aren’t agile enough to handle big data and rapidly evolving, complex data structures

    Sourcemap, on the other hand, uses NoSQL (non-relational) database technology, which is more flexible, cost-efficient, and scalable. “Our platform is like a LinkedIn for the supply chain,” explains Bonanni. Customers provide information about where they buy their raw materials, the suppliers get invited to the network and provide information to validate those relationships, right down to the farms and the mines where the raw materials are extracted — which is often where the biggest risks lie.

    Initially, the entire supply chain database of a Sourcemap customer might amount to a few megabytes of spreadsheets listing their purchase orders and the names of their suppliers. Sourcemap delivers terabytes of data that paint a detailed picture of the supply chain, capturing everything, right down to the moment a farmer in West Africa delivers cocoa beans to a warehouse, onto a truck heading to a port, to a factory, all the way to the finished goods.

    “We’ve seen the amount of data collected grow by a factor of 1 million, which tells us that the world is finally ready for full visibility of supply chains,” says Bonanni. “The fact is that we’ve seen supply chain transparency go from a fringe concern to a broad-based requirement as a license to operate in most of Europe and North America,” says Bonanni.

    These days, disruptions in supply chains, combined with price volatility and new laws requiring companies to prove that the goods they import were not made illegally (such as by causing deforestation or involving forced or child labor), means that companies are often required to know where they source their raw materials from, even if they only import the materials through an intermediary.

    Sourcemap uses its full suite of tools to walk customers through a step-by-step process that maps their suppliers while measuring performance, ultimately verifying the entire supply chain and providing them with the confidence to import goods while being customs-compliant. At the end of the day, Sourcemap customers can communicate to their stakeholders and the end consumer exactly where their commodities come from while ensuring that social, environmental, and compliance standards are met.

    The company was recently named to the newest cohort of firms honored by the MIT Startup Exchange (STEX) as STEX25 startups. Bonanni is quick to point out the benefits of STEX and of MIT’s Industrial Liaison Program (ILP): “Our best feedback and our most constructive relationships have been with companies that sponsored our research early on at the Media Lab and ILP,” he says. “The innovative exchange of ideas inherent in the MIT startup ecosystem has helped to build up Sourcemap as a company and to grow supply chain transparency as a future-facing technology that more and more companies are now scrambling to adopt.” More

  • in

    Helping companies optimize their websites and mobile apps

    Creating a good customer experience increasingly means creating a good digital experience. But metrics like pageviews and clicks offer limited insight into how much customers actually like a digital product.

    That’s the problem the digital optimization company Amplitude is solving. Amplitude gives companies a clearer picture into how users interact with their digital products to help them understand exactly which features to promote or improve.

    “It’s all about using product data to drive your business,” says Amplitude CEO Spenser Skates ’10, who co-founded the company with Curtis Liu ’10 and Stanford University graduate Jeffrey Wang. “Mobile apps and websites are really complex. The average app or website will have thousands of things you can do with it. The question is how you know which of those things are driving a great user experience and which parts are really frustrating for users.”

    Amplitude’s database can gather millions of details about how users behave inside an app or website and allow customers to explore that information without needing data science degrees.

    “It provides an interface for very easy, accessible ways of looking at your data, understanding your data, and asking questions of that data,” Skates says.

    Amplitude, which recently announced it will be going public, is already helping 23 of the 100 largest companies in the U.S. Customers include media companies like NBC, tech companies like Twitter, and retail companies like Walmart.

    “Our platform helps businesses understand how people are using their apps and websites so they can create better versions of their products,” Skates says. “It’s all about creating a really compelling product.”

    Learning entrepreneurship

    The founders say their years at MIT were among the best of their lives. Skates and Liu were undergraduates from 2006 to 2010. Skates majored in biological engineering while Liu majored in mathematics and electrical engineering and computer science. The two first met as opponents in MIT’s Battlecode competition, in which students use artificial intelligence algorithms to control teams of robots that compete in a strategy game against other teams. The following year they teamed up.

    “There are a lot of parallels between what you’re trying to do in Battlecode and what you end up having to do in the early stages of a startup,” Liu says. “You have limited resources, limited time, and you’re trying to accomplish a goal. What we found is trying a lot of different things, putting our ideas out there and testing them with real data, really helped us focus on the things that actually mattered. That method of iteration and continual improvement set the foundation for how we approach building products and startups.”

    Liu and Skates next participated in the MIT $100K Entrepreneurship Competition with an idea for a cloud-based music streaming service. After graduation, Skates began working in finance and Liu got a job at Google, but they continued pursuing startup ideas on the side, including a website that let alumni see where their classmates ended up and a marketplace for finding photographers.

    A year after graduation, the founders decided to quit their jobs and work on a startup full time. Skates moved into Liu’s apartment in San Francisco, setting up a mattress on the floor, and they began working on a project that became Sonalight, a voice recognition app. As part of the project, the founders built an internal system to understand where users got stuck in the app and what features were used the most.

    Despite getting over 100,000 downloads, the founders decided Sonalight was a little too early for its time and started thinking their analytics feature could be useful to other companies. They spoke with about 30 different product teams to learn more about what companies wanted from their digital analytics. Amplitude was officially founded in 2012.

    Amplitude gathers fine details about digital product usage, parsing out individual features and actions to give customers a better view of how their products are being used. Using the data in Amplitude’s intuitive, no-code interface, customers can make strategic decisions like whether to launch a feature or change a distribution channel.

    The platform is designed to ease the bottlenecks that arise when executives, product teams, salespeople, and marketers want to answer questions about customer experience or behavior but need the data science team to crunch the numbers for them.

    “It’s a very collaborative interface to encourage customers to work together to understand how users are engaging with their apps,” Skates says.

    Amplitude’s database also uses machine learning to segment users, predict user outcomes, and uncover novel correlations. Earlier this year, the company unveiled a service called Recommend that helps companies create personalized user experiences across their entire platform in minutes. The service goes beyond demographics to personalize customer experiences based on what users have done or seen before within the product.

    “We’re very conscious on the privacy front,” Skates says. “A lot of analytics companies will resell your data to third parties or use it for advertising purposes. We don’t do any of that. We’re only here to provide product insights to our customers. We’re not using data to track you across the web. Everyone expects Netflix to use the data on what you’ve watched before to recommend what to watch next. That’s effectively what we’re helping other companies do.”

    Optimizing digital experiences

    The meditation app Calm is on a mission to help users build habits that improve their mental wellness. Using Amplitude, the company learned that users most often use the app to get better sleep and reduce stress. The insights helped Calm’s team double down on content geared toward those goals, launching “sleep stories” to help users unwind at the end of each day and adding content around anxiety relief and relaxation. Sleep stories are now Calm’s most popular type of content, and Calm has grown rapidly to millions of people around the world.

    Calm’s story shows the power of letting user behavior drive product decisions. Amplitude has also helped the online fundraising site GoFundMe increase donations by showing users more compelling campaigns and the exercise bike company Peloton realize the importance of social features like leaderboards.

    Moving forward, the founders believe Amplitude’s platform will continue helping companies adapt to an increasingly digital world in which users expect more compelling, personalized experiences.

    “If you think about the online experience for companies today compared to 10 years ago, now [digital] is the main point of contact, whether you’re a media company streaming content, a retail company, or a finance company,” Skates says. “That’s only going to continue. That’s where we’re trying to help.” More

  • in

    A comprehensive study of technological change

    The societal impacts of technological change can be seen in many domains, from messenger RNA vaccines and automation to drones and climate change. The pace of that technological change can affect its impact, and how quickly a technology improves in performance can be an indicator of its future importance. For decision-makers like investors, entrepreneurs, and policymakers, predicting which technologies are fast improving (and which are overhyped) can mean the difference between success and failure.

    New research from MIT aims to assist in the prediction of technology performance improvement using U.S. patents as a dataset. The study describes 97 percent of the U.S. patent system as a set of 1,757 discrete technology domains, and quantitatively assesses each domain for its improvement potential.

    “The rate of improvement can only be empirically estimated when substantial performance measurements are made over long time periods,” says Anuraag Singh SM ’20, lead author of the paper. “In some large technological fields, including software and clinical medicine, such measures have rarely, if ever, been made.”

    A previous MIT study provided empirical measures for 30 technological domains, but the patent sets identified for those technologies cover less than 15 percent of the patents in the U.S. patent system. The major purpose of this new study is to provide predictions of the performance improvement rates for the thousands of domains not accessed by empirical measurement. To accomplish this, the researchers developed a method using a new probability-based algorithm, machine learning, natural language processing, and patent network analytics.

    Overlap and centrality

    A technology domain, as the researchers define it, consists of sets of artifacts fulfilling a specific function using a specific branch of scientific knowledge. To find the patents that best represent a domain, the team built on previous research conducted by co-author Chris Magee, a professor of the practice of engineering systems within the Institute for Data, Systems, and Society (IDSS). Magee and his colleagues found that by looking for patent overlap between the U.S. and international patent-classification systems, they could quickly identify patents that best represent a technology. The researchers ultimately created a correspondence of all patents within the U.S. patent system to a set of 1,757 technology domains.

    To estimate performance improvement, Singh employed a method refined by co-authors Magee and Giorgio Triulzi, a researcher with the Sociotechnical Systems Research Center (SSRC) within IDSS and an assistant professor at Universidad de los Andes in Colombia. Their method is based on the average “centrality” of patents in the patent citation network. Centrality refers to multiple criteria for determining the ranking or importance of nodes within a network.

    “Our method provides predictions of performance improvement rates for nearly all definable technologies for the first time,” says Singh.

    Those rates vary — from a low of 2 percent per year for the “Mechanical skin treatment — Hair removal and wrinkles” domain to a high of 216 percent per year for the “Dynamic information exchange and support systems integrating multiple channels” domain. The researchers found that most technologies improve slowly; more than 80 percent of technologies improve at less than 25 percent per year. Notably, the number of patents in a technological area was not a strong indicator of a higher improvement rate.

    “Fast-improving domains are concentrated in a few technological areas,” says Magee. “The domains that show improvement rates greater than the predicted rate for integrated chips — 42 percent, from Moore’s law — are predominantly based upon software and algorithms.”

    TechNext Inc.

    The researchers built an online interactive system where domains corresponding to technology-related keywords can be found along with their improvement rates. Users can input a keyword describing a technology and the system returns a prediction of improvement for the technological domain, an automated measure of the quality of the match between the keyword and the domain, and patent sets so that the reader can judge the semantic quality of the match.

    Moving forward, the researchers have founded a new MIT spinoff called TechNext Inc. to further refine this technology and use it to help leaders make better decisions, from budgets to investment priorities to technology policy. Like any inventors, Magee and his colleagues want to protect their intellectual property rights. To that end, they have applied for a patent for their novel system and its unique methodology.

    “Technologies that improve faster win the market,” says Singh. “Our search system enables technology managers, investors, policymakers, and entrepreneurs to quickly look up predictions of improvement rates for specific technologies.”

    Adds Magee: “Our goal is to bring greater accuracy, precision, and repeatability to the as-yet fuzzy art of technology forecasting.” More