More stories

  • in

    A simpler method for learning to control a robot

    Researchers from MIT and Stanford University have devised a new machine-learning approach that could be used to control a robot, such as a drone or autonomous vehicle, more effectively and efficiently in dynamic environments where conditions can change rapidly.

    This technique could help an autonomous vehicle learn to compensate for slippery road conditions to avoid going into a skid, allow a robotic free-flyer to tow different objects in space, or enable a drone to closely follow a downhill skier despite being buffeted by strong winds.

    The researchers’ approach incorporates certain structure from control theory into the process for learning a model in such a way that leads to an effective method of controlling complex dynamics, such as those caused by impacts of wind on the trajectory of a flying vehicle. One way to think about this structure is as a hint that can help guide how to control a system.

    “The focus of our work is to learn intrinsic structure in the dynamics of the system that can be leveraged to design more effective, stabilizing controllers,” says Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS). “By jointly learning the system’s dynamics and these unique control-oriented structures from data, we’re able to naturally create controllers that function much more effectively in the real world.”

    Using this structure in a learned model, the researchers’ technique immediately extracts an effective controller from the model, as opposed to other machine-learning methods that require a controller to be derived or learned separately with additional steps. With this structure, their approach is also able to learn an effective controller using fewer data than other approaches. This could help their learning-based control system achieve better performance faster in rapidly changing environments.

    “This work tries to strike a balance between identifying structure in your system and just learning a model from data,” says lead author Spencer M. Richards, a graduate student at Stanford University. “Our approach is inspired by how roboticists use physics to derive simpler models for robots. Physical analysis of these models often yields a useful structure for the purposes of control — one that you might miss if you just tried to naively fit a model to data. Instead, we try to identify similarly useful structure from data that indicates how to implement your control logic.”

    Additional authors of the paper are Jean-Jacques Slotine, professor of mechanical engineering and of brain and cognitive sciences at MIT, and Marco Pavone, associate professor of aeronautics and astronautics at Stanford. The research will be presented at the International Conference on Machine Learning (ICML).

    Learning a controller

    Determining the best way to control a robot to accomplish a given task can be a difficult problem, even when researchers know how to model everything about the system.

    A controller is the logic that enables a drone to follow a desired trajectory, for example. This controller would tell the drone how to adjust its rotor forces to compensate for the effect of winds that can knock it off a stable path to reach its goal.

    This drone is a dynamical system — a physical system that evolves over time. In this case, its position and velocity change as it flies through the environment. If such a system is simple enough, engineers can derive a controller by hand. 

    Modeling a system by hand intrinsically captures a certain structure based on the physics of the system. For instance, if a robot were modeled manually using differential equations, these would capture the relationship between velocity, acceleration, and force. Acceleration is the rate of change in velocity over time, which is determined by the mass of and forces applied to the robot.

    But often the system is too complex to be exactly modeled by hand. Aerodynamic effects, like the way swirling wind pushes a flying vehicle, are notoriously difficult to derive manually, Richards explains. Researchers would instead take measurements of the drone’s position, velocity, and rotor speeds over time, and use machine learning to fit a model of this dynamical system to the data. But these approaches typically don’t learn a control-based structure. This structure is useful in determining how to best set the rotor speeds to direct the motion of the drone over time.

    Once they have modeled the dynamical system, many existing approaches also use data to learn a separate controller for the system.

    “Other approaches that try to learn dynamics and a controller from data as separate entities are a bit detached philosophically from the way we normally do it for simpler systems. Our approach is more reminiscent of deriving models by hand from physics and linking that to control,” Richards says.

    Identifying structure

    The team from MIT and Stanford developed a technique that uses machine learning to learn the dynamics model, but in such a way that the model has some prescribed structure that is useful for controlling the system.

    With this structure, they can extract a controller directly from the dynamics model, rather than using data to learn an entirely separate model for the controller.

    “We found that beyond learning the dynamics, it’s also essential to learn the control-oriented structure that supports effective controller design. Our approach of learning state-dependent coefficient factorizations of the dynamics has outperformed the baselines in terms of data efficiency and tracking capability, proving to be successful in efficiently and effectively controlling the system’s trajectory,” Azizan says. 

    When they tested this approach, their controller closely followed desired trajectories, outpacing all the baseline methods. The controller extracted from their learned model nearly matched the performance of a ground-truth controller, which is built using the exact dynamics of the system.

    “By making simpler assumptions, we got something that actually worked better than other complicated baseline approaches,” Richards adds.

    The researchers also found that their method was data-efficient, which means it achieved high performance even with few data. For instance, it could effectively model a highly dynamic rotor-driven vehicle using only 100 data points. Methods that used multiple learned components saw their performance drop much faster with smaller datasets.

    This efficiency could make their technique especially useful in situations where a drone or robot needs to learn quickly in rapidly changing conditions.

    Plus, their approach is general and could be applied to many types of dynamical systems, from robotic arms to free-flying spacecraft operating in low-gravity environments.

    In the future, the researchers are interested in developing models that are more physically interpretable, and that would be able to identify very specific information about a dynamical system, Richards says. This could lead to better-performing controllers.

    “Despite its ubiquity and importance, nonlinear feedback control remains an art, making it especially suitable for data-driven and learning-based methods. This paper makes a significant contribution to this area by proposing a method that jointly learns system dynamics, a controller, and control-oriented structure,” says Nikolai Matni, an assistant professor in the Department of Electrical and Systems Engineering at the University of Pennsylvania, who was not involved with this work. “What I found particularly exciting and compelling was the integration of these components into a joint learning algorithm, such that control-oriented structure acts as an inductive bias in the learning process. The result is a data-efficient learning process that outputs dynamic models that enjoy intrinsic structure that enables effective, stable, and robust control. While the technical contributions of the paper are excellent themselves, it is this conceptual contribution that I view as most exciting and significant.”

    This research is supported, in part, by the NASA University Leadership Initiative and the Natural Sciences and Engineering Research Council of Canada. More

  • in

    A new dataset of Arctic images will spur artificial intelligence research

    As the U.S. Coast Guard (USCG) icebreaker Healy takes part in a voyage across the North Pole this summer, it is capturing images of the Arctic to further the study of this rapidly changing region. Lincoln Laboratory researchers installed a camera system aboard the Healy while at port in Seattle before it embarked on a three-month science mission on July 11. The resulting dataset, which will be one of the first of its kind, will be used to develop artificial intelligence tools that can analyze Arctic imagery.

    “This dataset not only can help mariners navigate more safely and operate more efficiently, but also help protect our nation by providing critical maritime domain awareness and an improved understanding of how AI analysis can be brought to bear in this challenging and unique environment,” says Jo Kurucar, a researcher in Lincoln Laboratory’s AI Software Architectures and Algorithms Group, which led this project.

    As the planet warms and sea ice melts, Arctic passages are opening up to more traffic, both to military vessels and ships conducting illegal fishing. These movements may pose national security challenges to the United States. The opening Arctic also leaves questions about how its climate, wildlife, and geography are changing.

    Today, very few imagery datasets of the Arctic exist to study these changes. Overhead images from satellites or aircraft can only provide limited information about the environment. An outward-looking camera attached to a ship can capture more details of the setting and different angles of objects, such as other ships, in the scene. These types of images can then be used to train AI computer-vision tools, which can help the USCG plan naval missions and automate analysis. According to Kurucar, USCG assets in the Arctic are spread thin and can benefit greatly from AI tools, which can act as a force multiplier.

    The Healy is the USCG’s largest and most technologically advanced icebreaker. Given its current mission, it was a fitting candidate to be equipped with a new sensor to gather this dataset. The laboratory research team collaborated with the USCG Research and Development Center to determine the sensor requirements. Together, they developed the Cold Region Imaging and Surveillance Platform (CRISP).

    “Lincoln Laboratory has an excellent relationship with the Coast Guard, especially with the Research and Development Center. Over a decade, we’ve established ties that enabled the deployment of the CRISP system,” says Amna Greaves, the CRISP project lead and an assistant leader in the AI Software Architectures and Algorithms Group. “We have strong ties not only because of the USCG veterans working at the laboratory and in our group, but also because our technology missions are complementary. Today it was deploying infrared sensing in the Arctic; tomorrow it could be operating quadruped robot dogs on a fast-response cutter.”

    The CRISP system comprises a long-wave infrared camera, manufactured by Teledyne FLIR (for forward-looking infrared), that is designed for harsh maritime environments. The camera can stabilize itself during rough seas and image in complete darkness, fog, and glare. It is paired with a GPS-enabled time-synchronized clock and a network video recorder to record both video and still imagery along with GPS-positional data.  

    The camera is mounted at the front of the ship’s fly bridge, and the electronics are housed in a ruggedized rack on the bridge. The system can be operated manually from the bridge or be placed into an autonomous surveillance mode, in which it slowly pans back and forth, recording 15 minutes of video every three hours and a still image once every 15 seconds.

    “The installation of the equipment was a unique and fun experience. As with any good project, our expectations going into the install did not meet reality,” says Michael Emily, the project’s IT systems administrator who traveled to Seattle for the install. Working with the ship’s crew, the laboratory team had to quickly adjust their route for running cables from the camera to the observation station after they discovered that the expected access points weren’t in fact accessible. “We had 100-foot cables made for this project just in case of this type of scenario, which was a good thing because we only had a few inches to spare,” Emily says.

    The CRISP project team plans to publicly release the dataset, anticipated to be about 4 terabytes in size, once the USCG science mission concludes in the fall.

    The goal in releasing the dataset is to enable the wider research community to develop better tools for those operating in the Arctic, especially as this region becomes more navigable. “Collecting and publishing the data allows for faster and greater progress than what we could accomplish on our own,” Kurucar adds. “It also enables the laboratory to engage in more advanced AI applications while others make more incremental advances using the dataset.”

    On top of providing the dataset, the laboratory team plans to provide a baseline object-detection model, from which others can make progress on their own models. More advanced AI applications planned for development are classifiers for specific objects in the scene and the ability to identify and track objects across images.

    Beyond assisting with USCG missions, this project could create an influential dataset for researchers looking to apply AI to data from the Arctic to help combat climate change, says Paul Metzger, who leads the AI Software Architectures and Algorithms Group.

    Metzger adds that the group was honored to be a part of this project and is excited to see the advances that come from applying AI to novel challenges facing the United States: “I’m extremely proud of how our group applies AI to the highest-priority challenges in our nation, from predicting outbreaks of Covid-19 and assisting the U.S. European Command in their support of Ukraine to now employing AI in the Arctic for maritime awareness.”

    Once the dataset is available, it will be free to download on the Lincoln Laboratory dataset website. More

  • in

    A faster way to teach a robot

    Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

    “Right now, the way we train these robots, when they fail, we don’t really know why. So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that is missing from this system is enabling the robot to demonstrate why it is failing so the user can give it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

    Peng and her collaborators at MIT, New York University, and the University of California at Berkeley created a framework that enables humans to quickly teach a robot what they want it to do, with a minimal amount of effort.

    When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

    Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it can perform a second, similar task.

    The researchers tested this technique in simulations and found that it could teach a robot more efficiently than other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

    This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

    Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

    On-the-job training

    Robots often fail due to distribution shift — the robot is presented with objects and spaces it did not see during training, and it doesn’t understand what to do in this new environment.

    One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug, but demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

    Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

    “I don’t want to have to demonstrate with 30,000 mugs. I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color,” Peng says.

    To accomplish this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

    The framework has three steps. First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

    The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts do not impact the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

    In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

    Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

    From human reasoning to robot reasoning

    Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

    “It was so clear right off the bat. Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” she says.

    Then they applied their framework to three simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

    Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes the system to create new data using generative machine-learning models.

    “We want robots to do what humans do, and we want them to do it in a semantically meaningful way. Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level,” Peng says.

    This research is supported, in part, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. More

  • in

    Understanding viral justice

    In the wake of the Covid-19 pandemic, the word “viral” has a new resonance, and it’s not necessarily positive. Ruha Benjamin, a scholar who investigates the social dimensions of science, medicine, and technology, advocates a shift in perspective. She thinks justice can also be contagious. That’s the premise of Benjamin’s award-winning book “Viral Justice: How We Grow the World We Want,” as she shared with MIT Libraries staff on a June 14 visit. 

    “If this pandemic has taught us anything, it’s that something almost undetectable can be deadly, and that we can transmit it without even knowing,” said Benjamin, professor of African American studies at Princeton University. “Doesn’t this imply that small things, seemingly minor actions, decisions, or habits, could have exponential effects in the other direction, tipping the scales towards justice?” 

    To seek a more just world, Benjamin exhorted library staff to notice the ways exclusion is built into our daily lives, showing examples of park benches with armrests at regular intervals. On the surface they appear welcoming, but they also make lying down — or sleeping — impossible. This idea is taken to the extreme with “Pay and Sit,” an art installation by Fabian Brunsing in the form of a bench that deploys sharp spikes on the seat if the user doesn’t pay a meter. It serves as a powerful metaphor for discriminatory design. 

    “Dr. Benjamin’s keynote was seriously mind-blowing,” said Cherry Ibrahim, human resources generalist in the MIT Libraries. “One part that really grabbed my attention was when she talked about benches purposely designed to prevent unhoused people from sleeping on them. There are these hidden spikes in our community that we might not even realize because they don’t directly impact us.” 

    Benjamin urged the audience to look for those “spikes,” which new technologies can make even more insidious — gender and racial bias in facial recognition, the use of racial data in software used to predict student success, algorithmic bias in health care — often in the guise of progress. She coined the term “the New Jim Code” to describe the combination of coded bias and the imagined objectivity we ascribe to technology. 

    “At the MIT Libraries, we’re deeply concerned with combating inequities through our work, whether it’s democratizing access to data or investigating ways disparate communities can participate in scholarship with minimal bias or barriers,” says Director of Libraries Chris Bourg. “It’s our mission to remove the ‘spikes’ in the systems through which we create, use, and share knowledge.”

    Calling out the harms encoded into our digital world is critical, argues Benjamin, but we must also create alternatives. This is where the collective power of individuals can be transformative. Benjamin shared examples of those who are “re-imagining the default settings of technology and society,” citing initiatives like Data for Black Lives movement and the Detroit Community Technology Project. “I’m interested in the way that everyday people are changing the digital ecosystem and demanding different kinds of rights and responsibilities and protections,” she said.

    In 2020, Benjamin founded the Ida B. Wells Just Data Lab with a goal of bringing together students, educators, activists, and artists to develop a critical and creative approach to data conception, production, and circulation. Its projects have examined different aspects of data and racial inequality: assessing the impact of Covid-19 on student learning; providing resources that confront the experience of Black mourning, grief, and mental health; or developing a playbook for Black maternal mental health. Through the lab’s student-led projects Benjamin sees the next generation re-imagining technology in ways that respond to the needs of marginalized people.

    “If inequity is woven into the very fabric of our society — we see it from policing to education to health care to work — then each twist, coil, and code is a chance for us to weave new patterns, practices, and politics,” she said. “The vastness of the problems that we’re up against will be their undoing.” More

  • in

    System tracks movement of food through global humanitarian supply chain

    Although more than enough food is produced to feed everyone in the world, as many as 828 million people face hunger today. Poverty, social inequity, climate change, natural disasters, and political conflicts all contribute to inhibiting access to food. For decades, the U.S. Agency for International Development (USAID) Bureau for Humanitarian Assistance (BHA) has been a leader in global food assistance, supplying millions of metric tons of food to recipients worldwide. Alleviating hunger — and the conflict and instability hunger causes — is critical to U.S. national security.

    But BHA is only one player within a large, complex supply chain in which food gets handed off between more than 100 partner organizations before reaching its final destination. Traditionally, the movement of food through the supply chain has been a black-box operation, with stakeholders largely out of the loop about what happens to the food once it leaves their custody. This lack of direct visibility into operations is due to siloed data repositories, insufficient data sharing among stakeholders, and different data formats that operators must manually sort through and standardize. As a result, accurate, real-time information — such as where food shipments are at any given time, which shipments are affected by delays or food recalls, and when shipments have arrived at their final destination — is lacking. A centralized system capable of tracing food along its entire journey, from manufacture through delivery, would enable a more effective humanitarian response to food-aid needs.

    In 2020, a team from MIT Lincoln Laboratory began engaging with BHA to create an intelligent dashboard for their supply-chain operations. This dashboard brings together the expansive food-aid datasets from BHA’s existing systems into a single platform, with tools for visualizing and analyzing the data. When the team started developing the dashboard, they quickly realized the need for considerably more data than BHA had access to.

    “That’s where traceability comes in, with each handoff partner contributing key pieces of information as food moves through the supply chain,” explains Megan Richardson, a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group.

    Richardson and the rest of the team have been working with BHA and their partners to scope, build, and implement such an end-to-end traceability system. This system consists of serialized, unique identifiers (IDs) — akin to fingerprints — that are assigned to individual food items at the time they are produced. These individual IDs remain linked to items as they are aggregated along the supply chain, first domestically and then internationally. For example, individually tagged cans of vegetable oil get packaged into cartons; cartons are placed onto pallets and transported via railway and truck to warehouses; pallets are loaded onto shipping containers at U.S. ports; and pallets are unloaded and cartons are unpackaged overseas.

    With a trace

    Today, visibility at the single-item level doesn’t exist. Most suppliers mark pallets with a lot number (a lot is a batch of items produced in the same run), but this is for internal purposes (i.e., to track issues stemming back to their production supply, like over-enriched ingredients or machinery malfunction), not data sharing. So, organizations know which supplier lot a pallet and carton are associated with, but they can’t track the unique history of an individual carton or item within that pallet. As the lots move further downstream toward their final destination, they are often mixed with lots from other productions, and possibly other commodity types altogether, because of space constraints. On the international side, such mixing and the lack of granularity make it difficult to quickly pull commodities out of the supply chain if food safety concerns arise. Current response times can span several months.

    “Commodities are grouped differently at different stages of the supply chain, so it is logical to track them in those groupings where needed,” Richardson says. “Our item-level granularity serves as a form of Rosetta Stone to enable stakeholders to efficiently communicate throughout these stages. We’re trying to enable a way to track not only the movement of commodities, including through their lot information, but also any problems arising independent of lot, like exposure to high humidity levels in a warehouse. Right now, we have no way to associate commodities with histories that may have resulted in an issue.”

    “You can now track your checked luggage across the world and the fish on your dinner plate,” adds Brice MacLaren, also a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group. “So, this technology isn’t new, but it’s new to BHA as they evolve their methodology for commodity tracing. The traceability system needs to be versatile, working across a wide variety of operators who take custody of the commodity along the supply chain and fitting into their existing best practices.”

    As food products make their way through the supply chain, operators at each receiving point would be able to scan these IDs via a Lincoln Laboratory-developed mobile application (app) to indicate a product’s current location and transaction status — for example, that it is en route on a particular shipping container or stored in a certain warehouse. This information would get uploaded to a secure traceability server. By scanning a product, operators would also see its history up until that point.   

    Hitting the mark

    At the laboratory, the team tested the feasibility of their traceability technology, exploring different ways to mark and scan items. In their testing, they considered barcodes and radio-frequency identification (RFID) tags and handheld and fixed scanners. Their analysis revealed 2D barcodes (specifically data matrices) and smartphone-based scanners were the most feasible options in terms of how the technology works and how it fits into existing operations and infrastructure.

    “We needed to come up with a solution that would be practical and sustainable in the field,” MacLaren says. “While scanners can automatically read any RFID tags in close proximity as someone is walking by, they can’t discriminate exactly where the tags are coming from. RFID is expensive, and it’s hard to read commodities in bulk. On the other hand, a phone can scan a barcode on a particular box and tell you that code goes with that box. The challenge then becomes figuring out how to present the codes for people to easily scan without significantly interrupting their usual processes for handling and moving commodities.” 

    As the team learned from partner representatives in Kenya and Djibouti, offloading at the ports is a chaotic, fast operation. At manual warehouses, porters fling bags over their shoulders or stack cartons atop their heads any which way they can and run them to a drop point; at bagging terminals, commodities come down a conveyor belt and land this way or that way. With this variability comes several questions: How many barcodes do you need on an item? Where should they be placed? What size should they be? What will they cost? The laboratory team is considering these questions, keeping in mind that the answers will vary depending on the type of commodity; vegetable oil cartons will have different specifications than, say, 50-kilogram bags of wheat or peas.

    Leaving a mark

    Leveraging results from their testing and insights from international partners, the team has been running a traceability pilot evaluating how their proposed system meshes with real-world domestic and international operations. The current pilot features a domestic component in Houston, Texas, and an international component in Ethiopia, and focuses on tracking individual cartons of vegetable oil and identifying damaged cans. The Ethiopian team with Catholic Relief Services recently received a container filled with pallets of uniquely barcoded cartons of vegetable oil cans (in the next pilot, the cans will be barcoded, too). They are now scanning items and collecting data on product damage by using smartphones with the laboratory-developed mobile traceability app on which they were trained. 

    “The partners in Ethiopia are comparing a couple lid types to determine whether some are more resilient than others,” Richardson says. “With the app — which is designed to scan commodities, collect transaction data, and keep history — the partners can take pictures of damaged cans and see if a trend with the lid type emerges.”

    Next, the team will run a series of pilots with the World Food Program (WFP), the world’s largest humanitarian organization. The first pilot will focus on data connectivity and interoperability, and the team will engage with suppliers to directly print barcodes on individual commodities instead of applying barcode labels to packaging, as they did in the initial feasibility testing. The WFP will provide input on which of their operations are best suited for testing the traceability system, considering factors like the network bandwidth of WFP staff and local partners, the commodity types being distributed, and the country context for scanning. The BHA will likely also prioritize locations for system testing.

    “Our goal is to provide an infrastructure to enable as close to real-time data exchange as possible between all parties, given intermittent power and connectivity in these environments,” MacLaren says.

    In subsequent pilots, the team will try to integrate their approach with existing systems that partners rely on for tracking procurements, inventory, and movement of commodities under their custody so that this information is automatically pushed to the traceability server. The team also hopes to add a capability for real-time alerting of statuses, like the departure and arrival of commodities at a port or the exposure of unclaimed commodities to the elements. Real-time alerts would enable stakeholders to more efficiently respond to food-safety events. Currently, partners are forced to take a conservative approach, pulling out more commodities from the supply chain than are actually suspect, to reduce risk of harm. Both BHA and WHP are interested in testing out a food-safety event during one of the pilots to see how the traceability system works in enabling rapid communication response.

    To implement this technology at scale will require some standardization for marking different commodity types as well as give and take among the partners on best practices for handling commodities. It will also require an understanding of country regulations and partner interactions with subcontractors, government entities, and other stakeholders.

    “Within several years, I think it’s possible for BHA to use our system to mark and trace all their food procured in the United States and sent internationally,” MacLaren says.

    Once collected, the trove of traceability data could be harnessed for other purposes, among them analyzing historical trends, predicting future demand, and assessing the carbon footprint of commodity transport. In the future, a similar traceability system could scale for nonfood items, including medical supplies distributed to disaster victims, resources like generators and water trucks localized in emergency-response scenarios, and vaccines administered during pandemics. Several groups at the laboratory are also interested in such a system to track items such as tools deployed in space or equipment people carry through different operational environments.

    “When we first started this program, colleagues were asking why the laboratory was involved in simple tasks like making a dashboard, marking items with barcodes, and using hand scanners,” MacLaren says. “Our impact here isn’t about the technology; it’s about providing a strategy for coordinated food-aid response and successfully implementing that strategy. Most importantly, it’s about people getting fed.” More

  • in

    A new way to look at data privacy

    Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.

    But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.

    MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.

    The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.

    In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings.

    “PAC Privacy exploits the uncertainty or entropy of the sensitive data in a meaningful way,  and this allows us to add, in many cases, an order of magnitude less noise. This framework allows us to understand the characteristics of arbitrary data processing and privatize it automatically without artificial modifications. While we are in the early days and we are doing simple examples, we are excited about the promise of this technique,” says Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and co-author of a new paper on PAC Privacy.

    Devadas wrote the paper with lead author Hanshen Xiao, an electrical engineering and computer science graduate student. The research will be presented at the International Cryptography Conference (Crypto 2023).

    Defining privacy

    A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?

    Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.

    PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem.

    For instance, if the sensitive data are images of human faces, differential privacy would focus on whether the adversary can tell if someone’s face was in the dataset. PAC Privacy, on the other hand, could look at whether an adversary could extract a silhouette — an approximation — that someone could recognize as a particular individual’s face.

    Once they established the definition of PAC Privacy, the researchers created an algorithm that automatically tells the user how much noise to add to a model to prevent an adversary from confidently reconstructing a close approximation of the sensitive data. This algorithm guarantees privacy even if the adversary has infinite computing power, Xiao says.

    To find the optimal amount of noise, the PAC Privacy algorithm relies on the uncertainty, or entropy, in the original data from the viewpoint of the adversary.

    This automatic technique takes samples randomly from a data distribution or a large data pool and runs the user’s machine-learning training algorithm on that subsampled data to produce an output learned model. It does this many times on different subsamplings and compares the variance across all outputs. This variance determines how much noise one must add — a smaller variance means less noise is needed.

    Algorithm advantages

    Different from other privacy approaches, the PAC Privacy algorithm does not need knowledge of the inner workings of a model, or the training process.

    When implementing PAC Privacy, a user can specify their desired level of confidence at the outset. For instance, perhaps the user wants a guarantee that an adversary will not be more than 1 percent confident that they have successfully reconstructed the sensitive data to within 5 percent of its actual value. The PAC Privacy algorithm automatically tells the user the optimal amount of noise that needs to be added to the output model before it is shared publicly, in order to achieve those goals.

    “The noise is optimal, in the sense that if you add less than we tell you, all bets could be off. But the effect of adding noise to neural network parameters is complicated, and we are making no promises on the utility drop the model may experience with the added noise,” Xiao says.

    This points to one limitation of PAC Privacy — the technique does not tell the user how much accuracy the model will lose once the noise is added. PAC Privacy also involves repeatedly training a machine-learning model on many subsamplings of data, so it can be computationally expensive.  

    To improve PAC Privacy, one approach is to modify a user’s machine-learning training process so it is more stable, meaning that the output model it produces does not change very much when the input data is subsampled from a data pool.  This stability would create smaller variances between subsample outputs, so not only would the PAC Privacy algorithm need to be run fewer times to identify the optimal amount of noise, but it would also need to add less noise.

    An added benefit of stabler models is that they often have less generalization error, which means they can make more accurate predictions on previously unseen data, a win-win situation between machine learning and privacy, Devadas adds.

    “In the next few years, we would love to look a little deeper into this relationship between stability and privacy, and the relationship between privacy and generalization error. We are knocking on a door here, but it is not clear yet where the door leads,” he says.

    “Obfuscating the usage of an individual’s data in a model is paramount to protecting their privacy. However, to do so can come at the cost of the datas’ and therefore model’s utility,” says Jeremy Goodsitt, senior machine learning engineer at Capital One, who was not involved with this research. “PAC provides an empirical, black-box solution, which can reduce the added noise compared to current practices while maintaining equivalent privacy guarantees. In addition, its empirical approach broadens its reach to more data consuming applications.”

    This research is funded, in part, by DSTA Singapore, Cisco Systems, Capital One, and a MathWorks Fellowship. More

  • in

    Making sense of all things data

    Data, and more specifically using data, is not a new concept, but it remains an elusive one. It comes with terms like “the internet of things” (IoT) and “the cloud,” and no matter how often those are explained, smart people can still be confused. And then there’s the amount of information available and the speed with which it comes in. Software is omnipresent. It’s in coffeemakers and watches, gathering data every second. The question becomes how to take all the new technology and take advantage of the potential insights and analytics. It’s not a small ask.

    “Putting our arms around what digital transformation is can be difficult to do,” says Abel Sanchez. But as the executive director and research director of MIT’s Geospatial Data Center, that’s exactly what he does with his work in helping industries and executives shift their operations in order to make sense of their data and be able to use it to help their bottom lines.

    Play video

    Handling the pace

    Data can lead to making better business decisions. That’s not a new or surprising insight, but as Sanchez says, people still tend to work off of intuition. Part of the problem is that they don’t know what to do with their available data, and there’s usually plenty of available data. Part of that problem is that there’s so much information being produced from so many sources. As soon as a person wakes up and turns on their phone or starts their car, software is running. It’s coming in fast, but because it’s also complex, “it outperforms people,” he says.

    As an example with Uber, once a person clicks on the app for a ride, predictive models start firing at the rate of 1 million per second. It’s all in order to optimize the trip, taking into account factors such as school schedules, roadway conditions, traffic, and a driver’s availability. It’s helpful for the task, but it’s something that “no human would be able to do,” he says. 

    The solution requires a few components. One is a new way to store data. In the past, the classic was creating the “perfect library,” which was too structured. The response to that was to create a “data lake,” where all the information would go in and somehow people would make sense of it. “This also failed,” Sanchez says.

    Data storage needs to be re-imaged, in which a key element is greater accessibility. In most corporations, only 10-20 percent of employees have the access and technical skill to work with the data. The rest have to go through a centralized resource and get into a queue, an inefficient system. The goal, Sanchez says, is to democratize the information by going to a modern stack, which would convert what he calls “dormant data” into “active data.” The result? Better decisions could be made.

    The first, big step companies need to take is the will to make the change. Part of it is an investment of money, but it’s also an attitude shift. Corporations can have an embedded culture where things have always been done a certain way and deviating from that is resisted because it’s different. But when it comes to data, a new approach is needed. Managing and curating the information can no longer rest in the hands of one person with the institutional memory. It’s not possible. It’s also not practical because companies are losing out on efficiency and productivity, because with technology, “What use to take years to do, now you can do in days,” Sanchez says.

    Play video

    The new player

    The above exemplifies what’s been involved with coordinating data along four intertwined components: IoT, AI, the cloud, and security. The first two create the information, which then gets stored in the cloud, but it’s all for naught without robust security. But one relative newcomer has come into the picture. It’s blockchain technology, a term that is often said but still not fully understood, adding further to the confusion.

    Sanchez says that information has been handled and organized a certain way with the World Wide Web. Blockchain is an opportunity to be more nimble and productive by offering the chance to have an accepted identity, currency, and logic that works on a global scale. The holdup has always been that there’s never been any agreement on those three components on a global scale. It leads to people being shut out, inefficiency, and lost business.

    One example, Sanchez says, of blockchain’s potential is with hospitals. In the United States, they’re private and information has to be constantly integrated from doctors, insurance companies, labs, government regulators, and pharmaceutical companies. It leads to repeated steps to do something as simple as recognizing a patient’s identity, which often can’t be agreed upon. With blockchain, these various entities can create a consortium using open source code with no barriers of access, and it could quickly and easily identify a patient because it set up an agreement, and with it “remove that level of effort.” It’s an incremental step, but one which can be built upon that reduces cost and risk.

    Another example — “one of the best examples,” Sanchez says — is what was done in Indonesia. Most of the rice, corn, and wheat that comes from this area is produced from smallholder farms. For the people making loans, it’s expensive to understand the risk of cultivating these plots of land. Compounding that is that these farmers don’t have state-issued identities or credit records, so, “They don’t exist in the modern economic sense,” he says. They don’t have access to loans, and banks are losing out on potential good customers.

    With this project, blockchain allowed local people to gather information about the farms on their smartphones. Banks could acquire the information and compensate the people with tokens, thereby incentivizing the work. The bank would see the creditworthiness of the farms, and farmers could end up getting fair loans.

    In the end, it creates a beneficial circle for the banks, farmers, and community, but it also represents what can be done with digital transformation by allowing businesses to optimize their processes, make better decisions, and ultimately profit.

    “It’s a tremendous new platform,” Sanchez says. “This is the promise.” More

  • in

    Statistics, operations research, and better algorithms

    In this day and age, many companies and institutions are not just data-driven, but data-intensive. Insurers, health providers, government agencies, and social media platforms are all heavily dependent on data-rich models and algorithms to identify the characteristics of the people who use them, and to nudge their behavior in various ways.

    That doesn’t mean organizations are always using optimal models, however. Determining efficient algorithms is a research area of its own — and one where Rahul Mazumder happens to be a leading expert.

    Mazumder, an associate professor in the MIT Sloan School of Management and an affiliate of the Operations Research Center, works both to expand the techniques of model-building and to refine models that apply to particular problems. His work pertains to a wealth of areas, including statistics and operations research, with applications in finance, health care, advertising, online recommendations, and more.

    “There is engineering involved, there is science involved, there is implementation involved, there is theory involved, it’s at the junction of various disciplines,” says Mazumder, who is also affiliated with the Center for Statistics and Data Science and the MIT-IBM Watson AI Lab.

    There is also a considerable amount of practical-minded judgment, logic, and common-sense decision-making at play, in order to bring the right techniques to bear on any individual task.

    “Statistics is about having data coming from a physical system, or computers, or humans, and you want to make sense of the data,” Mazumder says. “And you make sense of it by building models because that gives some pattern to a dataset. But of course, there is a lot of subjectivity in that. So, there is subjectivity in statistics, but also mathematical rigor.”

    Over roughly the last decade, Mazumder, often working with co-authors, has published about 40 peer-reviewed papers, won multiple academic awards, collaborated with major companies about their work, and helped advise graduate students. For his research and teaching, Mazumder was granted tenure by MIT last year.

    From deep roots to new tools

    Mazumder grew up in Kolkata, India, where his father was a professor at the Indian Statistical Institute and his mother was a schoolteacher. Mazumder received his undergraduate and master’s degrees from the Indian Statistical Institute as well, although without really focusing on the same areas as his father, whose work was in fluid mechanics.

    For his doctoral work, Mazumder attended Stanford University, where he earned his PhD in 2012. After a year as a postdoc at MIT’s Operations Research Center, he joined the faculty at Columbia University, then moved to MIT in 2015.

    While Mazumder’s work has many facets, his research portfolio does have notable central achievements. Mazumder has helped combine ideas from two branches of optimization to facilitate addressing computational problems in statistics. One of these branches, discrete optimization, uses discrete variables — integers — to find the best candidate among a finite set of options. This can relate to operational efficiency: What is the shortest route someone might take while making a designated set of stops? Convex optimization, on the other hand, encompasses an array of algorithms that can obtain the best solution for what Mazumder calls “nicely behaved” mathematical functions. They are typically applied to optimize continuous decisions in financial portfolio allocation and health care outcomes, among other things.

    In some recent papers, such as “Fast best subset selection: Coordinate descent and local combinatorial optimization algorithms,” co-authored with Hussein Hazimeh and published in Operations Research in 2020, and in “Sparse regression at scale: branch-and-bound rooted in first-order optimization,” co-authored with Hazimeh and A. Saab and published in Mathematical Programming in 2022, Mazumder has found ways to combine ideas from the two branches.

    “The tools and techniques we are using are new for the class of statistical problems because we are combining different developments in convex optimization and exploring that within discrete optimization,” Mazumder says.

    As new as these tools are, however, Mazumder likes working on techniques that “have old roots,” as he puts it. The two types of optimization methods were considered less separate in the 1950s or 1960s, he says, then grew apart.

    “I like to go back and see how things developed,” Mazumder says. “If I look back in history at [older] papers, it’s actually very fascinating. One thing was developed, another was developed, another was developed kind of independently, and after a while you see connections across them. If I go back, I see some parallels. And that actually helps in my thought process.”

    Predictions and parsimony

    Mazumder’s work is often aimed at simplifying the model or algorithm being applied to a problem. In some instances, bigger models would require enormous amounts of processing power, so simpler methods can provide equally good results while using fewer resources. In other cases — ranging from the finance and tech firms Mazumder has sometimes collaborated with — simpler models may work better by having fewer moving parts.

    “There is a notion of parsimony involved,” Mazumder says. Genomic studies aim to find particularly influential genes; similarly, tech giants may benefit from simpler models of consumer behavior, not more complex ones, when they are recommending a movie to you.

    Very often, Mazumder says, modeling “is a very large-scale prediction problem. But we don’t think all the features or attributes are going to be important. A small collection is going to be important. Why? Because if you think about movies, there are not really 20,000 different movies; there are genres of movies. If you look at individual users, there are hundreds of millions of users, but really they are grouped together into cliques. Can you capture the parsimony in a model?”

    One part of his career that does not lend itself to parsimony, Mazumder feels, is crediting others. In conversation he emphasizes how grateful he is to his mentors in academia, and how much of his work is developed in concert with collaborators and, in particular, his students at MIT. 

    “I really, really like working with my students,” Mazumder says. “I perceive my students as my colleagues. Some of these problems, I thought they could not be solved, but then we just made it work. Of course, no method is perfect. But the fact we can use ideas from different areas in optimization with very deep roots, to address problems of core statistics and machine learning interest, is very exciting.”

    Teaching and doing research at MIT, Mazumder says, allows him to push forward on difficult problems — while also being pushed along by the interest and work of others around him.

    “MIT is a very vibrant community,” Mazumder says. “The thing I find really fascinating is, people here are very driven. They want to make a change in whatever area they are working in. And I also feel motivated to do this.” More