More stories

  • in

    Generating opportunities with generative AI

    Talking with retail executives back in 2010, Rama Ramakrishnan came to two realizations. First, although retail systems that offered customers personalized recommendations were getting a great deal of attention, these systems often provided little payoff for retailers. Second, for many of the firms, most customers shopped only once or twice a year, so companies didn’t really know much about them.

    “But by being very diligent about noting down the interactions a customer has with a retailer or an e-commerce site, we can create a very nice and detailed composite picture of what that person does and what they care about,” says Ramakrishnan, professor of the practice at the MIT Sloan School of Management. “Once you have that, then you can apply proven algorithms from machine learning.”

    These realizations led Ramakrishnan to found CQuotient, a startup whose software has now become the foundation for Salesforce’s widely adopted AI e-commerce platform. “On Black Friday alone, CQuotient technology probably sees and interacts with over a billion shoppers on a single day,” he says.

    After a highly successful entrepreneurial career, in 2019 Ramakrishnan returned to MIT Sloan, where he had earned master’s and PhD degrees in operations research in the 1990s. He teaches students “not just how these amazing technologies work, but also how do you take these technologies and actually put them to use pragmatically in the real world,” he says.

    Additionally, Ramakrishnan enjoys participating in MIT executive education. “This is a great opportunity for me to convey the things that I have learned, but also as importantly, to learn what’s on the minds of these senior executives, and to guide them and nudge them in the right direction,” he says.

    For example, executives are understandably concerned about the need for massive amounts of data to train machine learning systems. He can now guide them to a wealth of models that are pre-trained for specific tasks. “The ability to use these pre-trained AI models, and very quickly adapt them to your particular business problem, is an incredible advance,” says Ramakrishnan.

    Rama Ramakrishnan – Utilizing AI in Real World Applications for Intelligent WorkVideo: MIT Industrial Liaison Program

    Understanding AI categories

    “AI is the quest to imbue computers with the ability to do cognitive tasks that typically only humans can do,” he says. Understanding the history of this complex, supercharged landscape aids in exploiting the technologies.

    The traditional approach to AI, which basically solved problems by applying if/then rules learned from humans, proved useful for relatively few tasks. “One reason is that we can do lots of things effortlessly, but if asked to explain how we do them, we can’t actually articulate how we do them,” Ramakrishnan comments. Also, those systems may be baffled by new situations that don’t match up to the rules enshrined in the software.

    Machine learning takes a dramatically different approach, with the software fundamentally learning by example. “You give it lots of examples of inputs and outputs, questions and answers, tasks and responses, and get the computer to automatically learn how to go from the input to the output,” he says. Credit scoring, loan decision-making, disease prediction, and demand forecasting are among the many tasks conquered by machine learning.

    But machine learning only worked well when the input data was structured, for instance in a spreadsheet. “If the input data was unstructured, such as images, video, audio, ECGs, or X-rays, it wasn’t very good at going from that to a predicted output,” Ramakrishnan says. That means humans had to manually structure the unstructured data to train the system.

    Around 2010 deep learning began to overcome that limitation, delivering the ability to directly work with unstructured input data, he says. Based on a longstanding AI strategy known as neural networks, deep learning became practical due to the global flood tide of data, the availability of extraordinarily powerful parallel processing hardware called graphics processing units (originally invented for video games) and advances in algorithms and math.

    Finally, within deep learning, the generative AI software packages appearing last year can create unstructured outputs, such as human-sounding text, images of dogs, and three-dimensional models. Large language models (LLMs) such as OpenAI’s ChatGPT go from text inputs to text outputs, while text-to-image models such as OpenAI’s DALL-E can churn out realistic-appearing images.

    Rama Ramakrishnan – Making Note of Little Data to Improve Customer ServiceVideo: MIT Industrial Liaison Program

    What generative AI can (and can’t) do

    Trained on the unimaginably vast text resources of the internet, a LLM’s “fundamental capability is to predict the next most likely, most plausible word,” Ramakrishnan says. “Then it attaches the word to the original sentence, predicts the next word again, and keeps on doing it.”

    “To the surprise of many, including a lot of researchers, an LLM can do some very complicated things,” he says. “It can compose beautifully coherent poetry, write Seinfeld episodes, and solve some kinds of reasoning problems. It’s really quite remarkable how next-word prediction can lead to these amazing capabilities.”

    “But you have to always keep in mind that what it is doing is not so much finding the correct answer to your question as finding a plausible answer your question,” Ramakrishnan emphasizes. Its content may be factually inaccurate, irrelevant, toxic, biased, or offensive.

    That puts the burden on users to make sure that the output is correct, relevant, and useful for the task at hand. “You have to make sure there is some way for you to check its output for errors and fix them before it goes out,” he says.

    Intense research is underway to find techniques to address these shortcomings, adds Ramakrishnan, who expects many innovative tools to do so.

    Finding the right corporate roles for LLMs

    Given the astonishing progress in LLMs, how should industry think about applying the software to tasks such as generating content?

    First, Ramakrishnan advises, consider costs: “Is it a much less expensive effort to have a draft that you correct, versus you creating the whole thing?” Second, if the LLM makes a mistake that slips by, and the mistaken content is released to the outside world, can you live with the consequences?

    “If you have an application which satisfies both considerations, then it’s good to do a pilot project to see whether these technologies can actually help you with that particular task,” says Ramakrishnan. He stresses the need to treat the pilot as an experiment rather than as a normal IT project.

    Right now, software development is the most mature corporate LLM application. “ChatGPT and other LLMs are text-in, text-out, and a software program is just text-out,” he says. “Programmers can go from English text-in to Python text-out, as well as you can go from English-to-English or English-to-German. There are lots of tools which help you write code using these technologies.”

    Of course, programmers must make sure the result does the job properly. Fortunately, software development already offers infrastructure for testing and verifying code. “This is a beautiful sweet spot,” he says, “where it’s much cheaper to have the technology write code for you, because you can very quickly check and verify it.”

    Another major LLM use is content generation, such as writing marketing copy or e-commerce product descriptions. “Again, it may be much cheaper to fix ChatGPT’s draft than for you to write the whole thing,” Ramakrishnan says. “However, companies must be very careful to make sure there is a human in the loop.”

    LLMs also are spreading quickly as in-house tools to search enterprise documents. Unlike conventional search algorithms, an LLM chatbot can offer a conversational search experience, because it remembers each question you ask. “But again, it will occasionally make things up,” he says. “In terms of chatbots for external customers, these are very early days, because of the risk of saying something wrong to the customer.”

    Overall, Ramakrishnan notes, we’re living in a remarkable time to grapple with AI’s rapidly evolving potentials and pitfalls. “I help companies figure out how to take these very transformative technologies and put them to work, to make products and services much more intelligent, employees much more productive, and processes much more efficient,” he says. More

  • in

    Improving accessibility of online graphics for blind users

    The beauty of a nice infographic published alongside a news or magazine story is that it makes numeric data more accessible to the average reader. But for blind and visually impaired users, such graphics often have the opposite effect.

    For visually impaired users — who frequently rely on screen-reading software that speaks words or numbers aloud as the user moves a cursor across the screen — a graphic may be nothing more than a few words of alt text, such as a chart’s title. For instance, a map of the United States displaying population rates by county might have alt text in the HTML that says simply, “A map of the United States with population rates by county.” The data has been buried in an image, making it entirely inaccessible.

    “Charts have these various visual features that, as a [sighted] reader, you can shift your attention around, look at high-level patterns, look at individual data points, and you can do this on the fly,” says Jonathan Zong, a 2022 MIT Morningside Academy for Design (MAD) Fellow and PhD student in computer science, who points out that even when a graphic includes alt text that interprets the data, the visually impaired user must accept the findings as presented.

    “If you’re [blind and] using a screen reader, the text description imposes a linear predefined reading order. So, you’re beholden to the decisions that the person who wrote the text made about what information was important to include.”

    While some graphics do include data tables that a screen reader can read, it requires the user to remember all the data from each row and column as they move on to the next one. According to the National Federation of the Blind, Zong says, there are 7 million people living in the United States with visual disabilities, and nearly 97 percent of top-level pages on the internet are not accessible to screen readers. The problem, he points out, is an especially difficult one for blind researchers to get around. Some researchers with visual impairments rely on a sighted collaborator to read and help interpret graphics in peer-reviewed research.

    Working with the Visualization Group at the Computer Science and Artificial Intelligence Lab (CSAIL) on a project led by Associate Professor Arvind Satyanarayan that includes Daniel Hajas, a blind researcher and innovation manager at the Global Disability Innovation Hub in England, Zong and others have written an open-source Javascript software program named Olli that solves this problem when it’s included on a website. Olli is able to go from big-picture analysis of a chart to the finest grain of detail to give the user the ability to select the degree of granularity that interests them.

    “We want to design richer screen-reader experiences for visualization with a hierarchical structure, multiple ways to navigate, and descriptions at varying levels of granularity to provide self-guided, open-ended exploration for the user.”

    Next steps with Olli are incorporating multi-sensory software to integrate text and visuals with sound, such as having a musical note that moves up or down the harmonic scale to indicate the direction of data on a linear graph, and possibly even developing tactile interpretations of data. Like most of the MAD Fellows, Zong integrates his science and engineering skills with design and art to create solutions to real-world problems affecting individuals. He’s been recognized for his work in both the visual arts and computer science. He holds undergraduate degrees in computer science and visual arts with a focus on graphic design from Princeton University, where his research was on the ethics of data collection.

    “The throughline is the idea that design can help us make progress on really tough social and ethical questions,” Zong says, calling software for accessible data visualization an “intellectually rich area for design.” “We’re thinking about ways to translate charts and graphs into text descriptions that can get read aloud as speech, or thinking about other kinds of audio mappings to sonify data, and we’re even exploring some tactile methods to understand data,” he says.

    “I get really excited about design when it’s a way to both create things that are useful to people in everyday life and also make progress on larger conversations about technology and society. I think working in accessibility is a great way to do that.”

    Another problem at the intersection of technology and society is the ethics of taking user data from social media for large-scale studies without the users’ awareness. While working as a summer graduate research fellow at Cornell’s Citizens and Technology Lab, Zong helped create an open-source software called Bartleby that can be used in large anonymous data research studies. After researchers collect data, but before analysis, Bartleby would automatically send an email message to every user whose data was included, alert them to that fact and offer them the choice to review the resulting data table and opt out of the study. Bartleby was honored in the student category of Fast Company’s Innovation by Design Awards for 2022. In November the same year, Forbes magazine named Jonathan Zong in its Forbes 30 Under 30 in Science 2023 list for his work in data visualization accessibility.

    The underlying theme to all Zong’s work is the exploration of autonomy and agency, even in his artwork, which is heavily inclusive of text and semiotic play. In “Public Display,” he created a handmade digital display font by erasing parts of celebrity faces that were taken from a facial recognition dataset. The piece was exhibited in 2020 in MIT’s Wiesner Gallery, and received the third-place prize in the MIT Schnitzer Prize in the Visual Arts that year. The work deals not only with the neurological aspects of distinguishing faces from typefaces, but also with the implications for erasing individuals’ identities through the practice of using facial recognition programs that often target individuals in communities of color in unfair ways. Another of his works, “Biometric Sans,” a typography system that stretches letters based on a person’s typing speed, will be included in a show at the Harvard Science Center sometime next fall.

    “MAD, particularly the large events MAD jointly hosted, played a really important function in showing the rest of MIT that this is the kind of work we value. This is what design can look like and is capable of doing. I think it all contributes to that culture shift where this kind of interdisciplinary work can be valued, recognized, and serve the public.

    “There are shared ideas around embodiment and representation that tie these different pursuits together for me,” Zong says. “In the ethics work, and the art on surveillance, I’m thinking about whether data collectors are representing people the way they want to be seen through data. And similarly, the accessibility work is about whether we can make systems that are flexible to the way people want to use them.” More

  • in

    Fast-tracking fusion energy’s arrival with AI and accessibility

    As the impacts of climate change continue to grow, so does interest in fusion’s potential as a clean energy source. While fusion reactions have been studied in laboratories since the 1930s, there are still many critical questions scientists must answer to make fusion power a reality, and time is of the essence. As part of their strategy to accelerate fusion energy’s arrival and reach carbon neutrality by 2050, the U.S. Department of Energy (DoE) has announced new funding for a project led by researchers at MIT’s Plasma Science and Fusion Center (PSFC) and four collaborating institutions.

    Cristina Rea, a research scientist and group leader at the PSFC, will serve as the primary investigator for the newly funded three-year collaboration to pilot the integration of fusion data into a system that can be read by AI-powered tools. The PSFC, together with scientists from William & Mary, the University of Wisconsin at Madison, Auburn University, and the nonprofit HDF Group, plan to create a holistic fusion data platform, the elements of which could offer unprecedented access for researchers, especially underrepresented students. The project aims to encourage diverse participation in fusion and data science, both in academia and the workforce, through outreach programs led by the group’s co-investigators, of whom four out of five are women. 

    The DoE’s award, part of a $29 million funding package for seven projects across 19 institutions, will support the group’s efforts to distribute data produced by fusion devices like the PSFC’s Alcator C-Mod, a donut-shaped “tokamak” that utilized powerful magnets to control and confine fusion reactions. Alcator C-Mod operated from 1991 to 2016 and its data are still being studied, thanks in part to the PSFC’s commitment to the free exchange of knowledge.

    Currently, there are nearly 50 public experimental magnetic confinement-type fusion devices; however, both historical and current data from these devices can be difficult to access. Some fusion databases require signing user agreements, and not all data are catalogued and organized the same way. Moreover, it can be difficult to leverage machine learning, a class of AI tools, for data analysis and to enable scientific discovery without time-consuming data reorganization. The result is fewer scientists working on fusion, greater barriers to discovery, and a bottleneck in harnessing AI to accelerate progress.

    The project’s proposed data platform addresses technical barriers by being FAIR — Findable, Interoperable, Accessible, Reusable — and by adhering to UNESCO’s Open Science (OS) recommendations to improve the transparency and inclusivity of science; all of the researchers’ deliverables will adhere to FAIR and OS principles, as required by the DoE. The platform’s databases will be built using MDSplusML, an upgraded version of the MDSplus open-source software developed by PSFC researchers in the 1980s to catalogue the results of Alcator C-Mod’s experiments. Today, nearly 40 fusion research institutes use MDSplus to store and provide external access to their fusion data. The release of MDSplusML aims to continue that legacy of open collaboration.

    The researchers intend to address barriers to participation for women and disadvantaged groups not only by improving general access to fusion data, but also through a subsidized summer school that will focus on topics at the intersection of fusion and machine learning, which will be held at William & Mary for the next three years.

    Of the importance of their research, Rea says, “This project is about responding to the fusion community’s needs and setting ourselves up for success. Scientific advancements in fusion are enabled via multidisciplinary collaboration and cross-pollination, so accessibility is absolutely essential. I think we all understand now that diverse communities have more diverse ideas, and they allow faster problem-solving.”

    The collaboration’s work also aligns with vital areas of research identified in the International Atomic Energy Agency’s “AI for Fusion” Coordinated Research Project (CRP). Rea was selected as the technical coordinator for the IAEA’s CRP emphasizing community engagement and knowledge access to accelerate fusion research and development. In a letter of support written for the group’s proposed project, the IAEA stated that, “the work [the researchers] will carry out […] will be beneficial not only to our CRP but also to the international fusion community in large.”

    PSFC Director and Hitachi America Professor of Engineering Dennis Whyte adds, “I am thrilled to see PSFC and our collaborators be at the forefront of applying new AI tools while simultaneously encouraging and enabling extraction of critical data from our experiments.”

    “Having the opportunity to lead such an important project is extremely meaningful, and I feel a responsibility to show that women are leaders in STEM,” says Rea. “We have an incredible team, strongly motivated to improve our fusion ecosystem and to contribute to making fusion energy a reality.” More

  • in

    System tracks movement of food through global humanitarian supply chain

    Although more than enough food is produced to feed everyone in the world, as many as 828 million people face hunger today. Poverty, social inequity, climate change, natural disasters, and political conflicts all contribute to inhibiting access to food. For decades, the U.S. Agency for International Development (USAID) Bureau for Humanitarian Assistance (BHA) has been a leader in global food assistance, supplying millions of metric tons of food to recipients worldwide. Alleviating hunger — and the conflict and instability hunger causes — is critical to U.S. national security.

    But BHA is only one player within a large, complex supply chain in which food gets handed off between more than 100 partner organizations before reaching its final destination. Traditionally, the movement of food through the supply chain has been a black-box operation, with stakeholders largely out of the loop about what happens to the food once it leaves their custody. This lack of direct visibility into operations is due to siloed data repositories, insufficient data sharing among stakeholders, and different data formats that operators must manually sort through and standardize. As a result, accurate, real-time information — such as where food shipments are at any given time, which shipments are affected by delays or food recalls, and when shipments have arrived at their final destination — is lacking. A centralized system capable of tracing food along its entire journey, from manufacture through delivery, would enable a more effective humanitarian response to food-aid needs.

    In 2020, a team from MIT Lincoln Laboratory began engaging with BHA to create an intelligent dashboard for their supply-chain operations. This dashboard brings together the expansive food-aid datasets from BHA’s existing systems into a single platform, with tools for visualizing and analyzing the data. When the team started developing the dashboard, they quickly realized the need for considerably more data than BHA had access to.

    “That’s where traceability comes in, with each handoff partner contributing key pieces of information as food moves through the supply chain,” explains Megan Richardson, a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group.

    Richardson and the rest of the team have been working with BHA and their partners to scope, build, and implement such an end-to-end traceability system. This system consists of serialized, unique identifiers (IDs) — akin to fingerprints — that are assigned to individual food items at the time they are produced. These individual IDs remain linked to items as they are aggregated along the supply chain, first domestically and then internationally. For example, individually tagged cans of vegetable oil get packaged into cartons; cartons are placed onto pallets and transported via railway and truck to warehouses; pallets are loaded onto shipping containers at U.S. ports; and pallets are unloaded and cartons are unpackaged overseas.

    With a trace

    Today, visibility at the single-item level doesn’t exist. Most suppliers mark pallets with a lot number (a lot is a batch of items produced in the same run), but this is for internal purposes (i.e., to track issues stemming back to their production supply, like over-enriched ingredients or machinery malfunction), not data sharing. So, organizations know which supplier lot a pallet and carton are associated with, but they can’t track the unique history of an individual carton or item within that pallet. As the lots move further downstream toward their final destination, they are often mixed with lots from other productions, and possibly other commodity types altogether, because of space constraints. On the international side, such mixing and the lack of granularity make it difficult to quickly pull commodities out of the supply chain if food safety concerns arise. Current response times can span several months.

    “Commodities are grouped differently at different stages of the supply chain, so it is logical to track them in those groupings where needed,” Richardson says. “Our item-level granularity serves as a form of Rosetta Stone to enable stakeholders to efficiently communicate throughout these stages. We’re trying to enable a way to track not only the movement of commodities, including through their lot information, but also any problems arising independent of lot, like exposure to high humidity levels in a warehouse. Right now, we have no way to associate commodities with histories that may have resulted in an issue.”

    “You can now track your checked luggage across the world and the fish on your dinner plate,” adds Brice MacLaren, also a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group. “So, this technology isn’t new, but it’s new to BHA as they evolve their methodology for commodity tracing. The traceability system needs to be versatile, working across a wide variety of operators who take custody of the commodity along the supply chain and fitting into their existing best practices.”

    As food products make their way through the supply chain, operators at each receiving point would be able to scan these IDs via a Lincoln Laboratory-developed mobile application (app) to indicate a product’s current location and transaction status — for example, that it is en route on a particular shipping container or stored in a certain warehouse. This information would get uploaded to a secure traceability server. By scanning a product, operators would also see its history up until that point.   

    Hitting the mark

    At the laboratory, the team tested the feasibility of their traceability technology, exploring different ways to mark and scan items. In their testing, they considered barcodes and radio-frequency identification (RFID) tags and handheld and fixed scanners. Their analysis revealed 2D barcodes (specifically data matrices) and smartphone-based scanners were the most feasible options in terms of how the technology works and how it fits into existing operations and infrastructure.

    “We needed to come up with a solution that would be practical and sustainable in the field,” MacLaren says. “While scanners can automatically read any RFID tags in close proximity as someone is walking by, they can’t discriminate exactly where the tags are coming from. RFID is expensive, and it’s hard to read commodities in bulk. On the other hand, a phone can scan a barcode on a particular box and tell you that code goes with that box. The challenge then becomes figuring out how to present the codes for people to easily scan without significantly interrupting their usual processes for handling and moving commodities.” 

    As the team learned from partner representatives in Kenya and Djibouti, offloading at the ports is a chaotic, fast operation. At manual warehouses, porters fling bags over their shoulders or stack cartons atop their heads any which way they can and run them to a drop point; at bagging terminals, commodities come down a conveyor belt and land this way or that way. With this variability comes several questions: How many barcodes do you need on an item? Where should they be placed? What size should they be? What will they cost? The laboratory team is considering these questions, keeping in mind that the answers will vary depending on the type of commodity; vegetable oil cartons will have different specifications than, say, 50-kilogram bags of wheat or peas.

    Leaving a mark

    Leveraging results from their testing and insights from international partners, the team has been running a traceability pilot evaluating how their proposed system meshes with real-world domestic and international operations. The current pilot features a domestic component in Houston, Texas, and an international component in Ethiopia, and focuses on tracking individual cartons of vegetable oil and identifying damaged cans. The Ethiopian team with Catholic Relief Services recently received a container filled with pallets of uniquely barcoded cartons of vegetable oil cans (in the next pilot, the cans will be barcoded, too). They are now scanning items and collecting data on product damage by using smartphones with the laboratory-developed mobile traceability app on which they were trained. 

    “The partners in Ethiopia are comparing a couple lid types to determine whether some are more resilient than others,” Richardson says. “With the app — which is designed to scan commodities, collect transaction data, and keep history — the partners can take pictures of damaged cans and see if a trend with the lid type emerges.”

    Next, the team will run a series of pilots with the World Food Program (WFP), the world’s largest humanitarian organization. The first pilot will focus on data connectivity and interoperability, and the team will engage with suppliers to directly print barcodes on individual commodities instead of applying barcode labels to packaging, as they did in the initial feasibility testing. The WFP will provide input on which of their operations are best suited for testing the traceability system, considering factors like the network bandwidth of WFP staff and local partners, the commodity types being distributed, and the country context for scanning. The BHA will likely also prioritize locations for system testing.

    “Our goal is to provide an infrastructure to enable as close to real-time data exchange as possible between all parties, given intermittent power and connectivity in these environments,” MacLaren says.

    In subsequent pilots, the team will try to integrate their approach with existing systems that partners rely on for tracking procurements, inventory, and movement of commodities under their custody so that this information is automatically pushed to the traceability server. The team also hopes to add a capability for real-time alerting of statuses, like the departure and arrival of commodities at a port or the exposure of unclaimed commodities to the elements. Real-time alerts would enable stakeholders to more efficiently respond to food-safety events. Currently, partners are forced to take a conservative approach, pulling out more commodities from the supply chain than are actually suspect, to reduce risk of harm. Both BHA and WHP are interested in testing out a food-safety event during one of the pilots to see how the traceability system works in enabling rapid communication response.

    To implement this technology at scale will require some standardization for marking different commodity types as well as give and take among the partners on best practices for handling commodities. It will also require an understanding of country regulations and partner interactions with subcontractors, government entities, and other stakeholders.

    “Within several years, I think it’s possible for BHA to use our system to mark and trace all their food procured in the United States and sent internationally,” MacLaren says.

    Once collected, the trove of traceability data could be harnessed for other purposes, among them analyzing historical trends, predicting future demand, and assessing the carbon footprint of commodity transport. In the future, a similar traceability system could scale for nonfood items, including medical supplies distributed to disaster victims, resources like generators and water trucks localized in emergency-response scenarios, and vaccines administered during pandemics. Several groups at the laboratory are also interested in such a system to track items such as tools deployed in space or equipment people carry through different operational environments.

    “When we first started this program, colleagues were asking why the laboratory was involved in simple tasks like making a dashboard, marking items with barcodes, and using hand scanners,” MacLaren says. “Our impact here isn’t about the technology; it’s about providing a strategy for coordinated food-aid response and successfully implementing that strategy. Most importantly, it’s about people getting fed.” More

  • in

    Making sense of all things data

    Data, and more specifically using data, is not a new concept, but it remains an elusive one. It comes with terms like “the internet of things” (IoT) and “the cloud,” and no matter how often those are explained, smart people can still be confused. And then there’s the amount of information available and the speed with which it comes in. Software is omnipresent. It’s in coffeemakers and watches, gathering data every second. The question becomes how to take all the new technology and take advantage of the potential insights and analytics. It’s not a small ask.

    “Putting our arms around what digital transformation is can be difficult to do,” says Abel Sanchez. But as the executive director and research director of MIT’s Geospatial Data Center, that’s exactly what he does with his work in helping industries and executives shift their operations in order to make sense of their data and be able to use it to help their bottom lines.

    Play video

    Handling the pace

    Data can lead to making better business decisions. That’s not a new or surprising insight, but as Sanchez says, people still tend to work off of intuition. Part of the problem is that they don’t know what to do with their available data, and there’s usually plenty of available data. Part of that problem is that there’s so much information being produced from so many sources. As soon as a person wakes up and turns on their phone or starts their car, software is running. It’s coming in fast, but because it’s also complex, “it outperforms people,” he says.

    As an example with Uber, once a person clicks on the app for a ride, predictive models start firing at the rate of 1 million per second. It’s all in order to optimize the trip, taking into account factors such as school schedules, roadway conditions, traffic, and a driver’s availability. It’s helpful for the task, but it’s something that “no human would be able to do,” he says. 

    The solution requires a few components. One is a new way to store data. In the past, the classic was creating the “perfect library,” which was too structured. The response to that was to create a “data lake,” where all the information would go in and somehow people would make sense of it. “This also failed,” Sanchez says.

    Data storage needs to be re-imaged, in which a key element is greater accessibility. In most corporations, only 10-20 percent of employees have the access and technical skill to work with the data. The rest have to go through a centralized resource and get into a queue, an inefficient system. The goal, Sanchez says, is to democratize the information by going to a modern stack, which would convert what he calls “dormant data” into “active data.” The result? Better decisions could be made.

    The first, big step companies need to take is the will to make the change. Part of it is an investment of money, but it’s also an attitude shift. Corporations can have an embedded culture where things have always been done a certain way and deviating from that is resisted because it’s different. But when it comes to data, a new approach is needed. Managing and curating the information can no longer rest in the hands of one person with the institutional memory. It’s not possible. It’s also not practical because companies are losing out on efficiency and productivity, because with technology, “What use to take years to do, now you can do in days,” Sanchez says.

    Play video

    The new player

    The above exemplifies what’s been involved with coordinating data along four intertwined components: IoT, AI, the cloud, and security. The first two create the information, which then gets stored in the cloud, but it’s all for naught without robust security. But one relative newcomer has come into the picture. It’s blockchain technology, a term that is often said but still not fully understood, adding further to the confusion.

    Sanchez says that information has been handled and organized a certain way with the World Wide Web. Blockchain is an opportunity to be more nimble and productive by offering the chance to have an accepted identity, currency, and logic that works on a global scale. The holdup has always been that there’s never been any agreement on those three components on a global scale. It leads to people being shut out, inefficiency, and lost business.

    One example, Sanchez says, of blockchain’s potential is with hospitals. In the United States, they’re private and information has to be constantly integrated from doctors, insurance companies, labs, government regulators, and pharmaceutical companies. It leads to repeated steps to do something as simple as recognizing a patient’s identity, which often can’t be agreed upon. With blockchain, these various entities can create a consortium using open source code with no barriers of access, and it could quickly and easily identify a patient because it set up an agreement, and with it “remove that level of effort.” It’s an incremental step, but one which can be built upon that reduces cost and risk.

    Another example — “one of the best examples,” Sanchez says — is what was done in Indonesia. Most of the rice, corn, and wheat that comes from this area is produced from smallholder farms. For the people making loans, it’s expensive to understand the risk of cultivating these plots of land. Compounding that is that these farmers don’t have state-issued identities or credit records, so, “They don’t exist in the modern economic sense,” he says. They don’t have access to loans, and banks are losing out on potential good customers.

    With this project, blockchain allowed local people to gather information about the farms on their smartphones. Banks could acquire the information and compensate the people with tokens, thereby incentivizing the work. The bank would see the creditworthiness of the farms, and farmers could end up getting fair loans.

    In the end, it creates a beneficial circle for the banks, farmers, and community, but it also represents what can be done with digital transformation by allowing businesses to optimize their processes, make better decisions, and ultimately profit.

    “It’s a tremendous new platform,” Sanchez says. “This is the promise.” More

  • in

    Report: CHIPS Act just the first step in addressing threats to US leadership in advanced computing

    When Liu He, a Chinese economist, politician, and “chip czar,” was tapped to lead the charge in a chipmaking arms race with the United States, his message lingered in the air, leaving behind a dewy glaze of tension: “For our country, technology is not just for growth… it is a matter of survival.”

    Once upon a time, the United States’ early technological prowess positioned the nation to outpace foreign rivals and cultivate a competitive advantage for domestic businesses. Yet, 30 years later, America’s lead in advanced computing is continuing to wane. What happened?

    A new report from an MIT researcher and two colleagues sheds light on the decline in U.S. leadership. The scientists looked at high-level measures to examine the shrinkage: overall capabilities, supercomputers, applied algorithms, and semiconductor manufacturing. Through their analysis, they found that not only has China closed the computing gap with the U.S., but nearly 80 percent of American leaders in the field believe that their Chinese competitors are improving capabilities faster — which, the team says, suggests a “broad threat to U.S. competitiveness.”

    To delve deeply into the fray, the scientists conducted the Advanced Computing Users Survey, sampling 120 top-tier organizations, including universities, national labs, federal agencies, and industry. The team estimates that this group comprises one-third and one-half of all the most significant computing users in the United States.

    “Advanced computing is crucial to scientific improvement, economic growth and the competitiveness of U.S. companies,” says Neil Thompson, director of the FutureTech Research Project at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), who helped lead the study.

    Thompson, who is also a principal investigator at MIT’s Initiative on the Digital Economy, wrote the paper with Chad Evans, executive vice president and secretary and treasurer to the board at the Council on Competitiveness, and Daniel Armbrust, who is the co-founder, initial CEO, and member of the board of directors at Silicon Catalyst and former president of SEMATECH, the semiconductor consortium that developed industry roadmaps.

    The semiconductor, supercomputer, and algorithm bonanza

    Supercomputers — the room-sized, “giant calculators” of the hardware world — are an industry no longer dominated by the United States. Through 2015, about half of the most powerful computers were sitting firmly in the U.S., and China was growing slowly from a very slow base. But in the past six years, China has swiftly caught up, reaching near parity with America.

    This disappearing lead matters. Eighty-four percent of U.S. survey respondents said they’re computationally constrained in running essential programs. “This result was telling, given who our respondents are: the vanguard of American research enterprises and academic institutions with privileged access to advanced national supercomputing resources,” says Thompson. 

    With regards to advanced algorithms, historically, the U.S. has fronted the charge, with two-thirds of all significant improvements dominated by U.S.-born inventors. But in recent decades, U.S. dominance in algorithms has relied on bringing in foreign talent to work in the U.S., which the researchers say is now in jeopardy. China has outpaced the U.S. and many other countries in churning out PhDs in STEM fields since 2007, with one report postulating a near-distant future (2025) where China will be home to nearly twice as many PhDs than in the U.S. China’s rise in algorithms can also be seen with the “Gordon Bell Prize,” an achievement for outstanding work in harnessing the power of supercomputers in varied applications. U.S. winners historically dominated the prize, but China has now equaled or surpassed Americans’ performance in the past five years.

    While the researchers note the CHIPS and Science Act of 2022 is a critical step in re-establishing the foundation of success for advanced computing, they propose recommendations to the U.S. Office of Science and Technology Policy. 

    First, they suggest democratizing access to U.S. supercomputing by building more mid-tier systems that push boundaries for many users, as well as building tools so users scaling up computations can have less up-front resource investment. They also recommend increasing the pool of innovators by funding many more electrical engineers and computer scientists being trained with longer-term US residency incentives and scholarships. Finally, in addition to this new framework, the scientists urge taking advantage of what already exists, via providing the private sector access to experimentation with high-performance computing through supercomputing sites in academia and national labs.

    All that and a bag of chips

    Computing improvements depend on continuous advances in transistor density and performance, but creating robust, new chips necessitate a harmonious blend of design and manufacturing.

    Over the last six years, China was not known as the savants of noteworthy chips. In fact, in the past five decades, the U.S. designed most of them. But this changed in the past six years when China created the HiSilicon Kirin 9000, propelling itself to the international frontier. This success was mainly obtained through partnerships with leading global chip designers that began in the 2000s. Now, China now has 14 companies among the world’s top 50 fabless designers. A decade ago, there was only one. 

    Competitive semiconductor manufacturing has been more mixed, where U.S.-led policies and internal execution issues have slowed China’s rise, but as of July 2022, the Semiconductor Manufacturing International Corporation (SMIC) has evidence of 7 nanometer logic, which was not expected until much later. However, with extreme ultraviolet export restrictions, progress below 7 nm means domestic technology development would be expensive. Currently, China is only at parity or better in two out of 12 segments of the semiconductor supply chain. Still, with government policy and investments, the team expects a whopping increase to seven segments in 10 years. So, for the moment, the U.S. retains leadership in hardware manufacturing, but with fewer dimensions of advantage.

    The authors recommend that the White House Office of Science and Technology Policy work with key national agencies, such as the U.S. Department of Defense, U.S. Department of Energy, and the National Science Foundation, to define initiatives to build the hardware and software systems needed for important computing paradigms and workloads critical for economic and security goals. “It is crucial that American enterprises can get the benefit of faster computers,” says Thompson. “With Moore’s Law slowing down, the best way to do this is to create a portfolio of specialized chips (or “accelerators”) that are customized to our needs.”

    The scientists further believe that to lead the next generation of computing, four areas must be addressed. First, by issuing grand challenges to the CHIPS Act National Semiconductor Technology Center, researchers and startups would be motivated to invest in research and development and to seek startup capital for new technologies in areas such as spintronics, neuromorphics, optical and quantum computing, and optical interconnect fabrics. By supporting allies in passing similar acts, overall investment in these technologies would increase, and supply chains would become more aligned and secure. Establishing test beds for researchers to test algorithms on new computing architectures and hardware would provide an essential platform for innovation and discovery. Finally, planning for post-exascale systems that achieve higher levels of performance through next-generation advances would ensure that current commercial technologies don’t limit future computing systems.

    “The advanced computing landscape is in rapid flux — technologically, economically, and politically, with both new opportunities for innovation and rising global rivalries,” says Daniel Reed, Presidential Professor and professor of computer science and electrical and computer engineering at the University of Utah. “The transformational insights from both deep learning and computational modeling depend on both continued semiconductor advances and their instantiation in leading edge, large-scale computing systems — hyperscale clouds and high-performance computing systems. Although the U.S. has historically led the world in both advanced semiconductors and high-performance computing, other nations have recognized that these capabilities are integral to 21st century economic competitiveness and national security, and they are investing heavily.”

    The research was funded, in part, through Thompson’s grant from Good Ventures, which supports his FutureTech Research Group. The paper is being published by the Georgetown Public Policy Review. More

  • in

    Researchers release open-source photorealistic simulator for autonomous driving

    Hyper-realistic virtual worlds have been heralded as the best driving schools for autonomous vehicles (AVs), since they’ve proven fruitful test beds for safely trying out dangerous driving scenarios. Tesla, Waymo, and other self-driving companies all rely heavily on data to enable expensive and proprietary photorealistic simulators, since testing and gathering nuanced I-almost-crashed data usually isn’t the most easy or desirable to recreate. 

    To that end, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being open-sourced to the public. 

    “Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says MIT Professor and CSAIL Director Daniela Rus, senior author on a paper about the research. 

    Play video

    VISTA is a data-driven, photorealistic simulator for autonomous driving. It can simulate not just live video but LiDAR data and event cameras, and also incorporate other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found below.

    VISTA 2.0 builds off of the team’s previous model, VISTA, and it’s fundamentally different from existing AV simulators since it’s data-driven — meaning it was built and photorealistically rendered from real-world data — thereby enabling direct transfer to reality. While the initial iteration supported only single car lane-following with one camera sensor, achieving high-fidelity data-driven simulation required rethinking the foundations of how different sensors and behavioral interactions can be synthesized. 

    Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massively interactive scenarios and intersections at scale. With much less data than previous models, the team was able to train autonomous vehicles that could be substantially more robust than those trained on large amounts of real-world data. 

    “This is a massive jump in capabilities of data-driven simulation for autonomous vehicles, as well as the increase of scale and ability to handle greater driving complexity,” says Alexander Amini, CSAIL PhD student and co-lead author on two new papers, together with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high dimensional 3D lidars with millions of points, irregularly timed event-based cameras, and even interactive and dynamic scenarios with other vehicles as well.” 

    The team was able to scale the complexity of the interactive driving tasks for things like overtaking, following, and negotiating, including multiagent scenarios in highly photorealistic environments. 

    Training AI models for autonomous vehicles involves hard-to-secure fodder of different varieties of edge cases and strange, dangerous scenarios, because most of our data (thankfully) is just run-of-the-mill, day-to-day driving. Logically, we can’t just crash into other cars just to teach a neural network how to not crash into other cars.

    Recently, there’s been a shift away from more classic, human-designed simulation environments to those built up from real-world data. The latter have immense photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question has emerged: Can the richness and complexity of all of the sensors that autonomous vehicles need, such as lidar and event-based cameras that are more sparse, accurately be synthesized? 

    Lidar sensor data is much harder to interpret in a data-driven world — you’re effectively trying to generate brand-new 3D point clouds with millions of points, only from sparse views of the world. To synthesize 3D lidar point clouds, the team used the data that the car collected, projected it into a 3D space coming from the lidar data, and then let a new virtual vehicle drive around locally from where that original vehicle was. Finally, they projected all of that sensory information back into the frame of view of this new virtual vehicle, with the help of neural networks. 

    Together with the simulation of event-based cameras, which operate at speeds greater than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but also doing so all in real time — making it possible to train neural nets offline, but also test online on the car in augmented reality setups for safe evaluations. “The question of if multisensor simulation at this scale of complexity and photorealism was possible in the realm of data-driven simulation was very much an open question,” says Amini. 

    With that, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and just drop in brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following, and more dicey scenarios like static and dynamic overtaking (seeing obstacles and moving around so you don’t collide). With the multi-agency, both real and simulated agents interact, and new agents can be dropped into the scene and controlled any which way. 

    Taking their full-scale car out into the “wild” — a.k.a. Devens, Massachusetts — the team saw  immediate transferability of results, with both failures and successes. They were also able to demonstrate the bodacious, magic word of self-driving car models: “robust.” They showed that AVs, trained entirely in VISTA 2.0, were so robust in the real world that they could handle that elusive tail of challenging failures. 

    Now, one guardrail humans rely on that can’t yet be simulated is human emotion. It’s the friendly wave, nod, or blinker switch of acknowledgement, which are the type of nuances the team wants to implement in future work. 

    “The central algorithm of this research is how we can take a dataset and build a completely synthetic world for learning and autonomy,” says Amini. “It’s a platform that I believe one day could extend in many different axes across robotics. Not just autonomous driving, but many areas that rely on vision and complex behaviors. We’re excited to release VISTA 2.0 to help enable the community to collect their own datasets and convert them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds, and then can directly transfer them to full-sized, real self-driving cars.” 

    Amini and Wang wrote the paper alongside Zhijian Liu, MIT CSAIL PhD student; Igor Gilitschenski, assistant professor in computer science at the University of Toronto; Wilko Schwarting, AI research scientist and MIT CSAIL PhD ’20; Song Han, associate professor at MIT’s Department of Electrical Engineering and Computer Science; Sertac Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Rus, MIT professor and CSAIL director. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia. 

    This work was supported by the National Science Foundation and Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of the Drive AGX Pegasus. More

  • in

    Making data visualization more accessible for blind and low-vision individuals

    Data visualizations on the web are largely inaccessible for blind and low-vision individuals who use screen readers, an assistive technology that reads on-screen elements as text-to-speech. This excludes millions of people from the opportunity to probe and interpret insights that are often presented through charts, such as election results, health statistics, and economic indicators. 

    When a designer attempts to make a visualization accessible, best practices call for including a few sentences of text that describe the chart and a link to the underlying data table — a far cry from the rich reading experience available to sighted users.

    An interdisciplinary team of researchers from MIT and elsewhere is striving to create screen-reader-friendly data visualizations that offer a similarly rich experience. They prototyped several visualization structures that provide text descriptions at varying levels of detail, enabling a screen-reader user to drill down from high-level data to more detailed information using just a few keystrokes.

    The MIT team embarked on an iterative co-design process with collaborator Daniel Hajas, a researcher at University College London who works with the Global Disability Innovation Hub and lost his sight at age 16. They collaborated to develop prototypes and ran a detailed user study with blind and low-vision individuals to gather feedback.

    “Researchers might see some connections between problems and be aware of potential solutions, but very often they miss it by a little bit. Insights from people who have the lived experience of a certain specific, measurable problem are really important for a lot of disability-related solutions. I think we found a really nice fit,” says Hajas.

    They created a framework to help designers think systematically about how to develop accessible visualizations. In the future, they plan to use their prototypes and design framework to build a user-friendly tool that could convert visualizations into accessible formats.

    MIT collaborators include co-lead authors and Computer Science and Artificial Intelligence Laboratory (CSAIL) graduate students Jonathan Zong, Crystal Lee, and Alan Lundgard, as well as JiWoong Jang, an undergraduate at Carnegie Mellon University who worked on this project during MIT’s Summer Research Program (MSRP), and senior author Arvind Satyanarayan, assistant professor of computer science who leads the Visualization Group in CSAIL. The research paper, which will be presented at the Eurographics Conference on Visualization, won a best paper honorable mention award.

    “Push what is possible”

    The researchers defined three design dimensions as key to making accessible visualizations: structure, navigation, and description. Structure involves arranging the information into a hierarchy. Navigation refers to how the user moves through different levels of detail. Description is how the information is spoken, including how much information is conveyed.

    Using these design dimensions, they developed several visualization prototypes that emphasized ease-of-navigation for screen-reader users. One prototype, known as multiview, enabled individuals to use the up and down arrows to navigate between different levels of information (like the chart title as the top level, the legend as the second level, etc.), and the right and left arrow keys to cycle through information on the same level (such as adjacent scatterplots). Another prototype, known as target, included the same arrow key navigation but also a drop-down menu of key chart locations so the user could quickly jump to an area of interest.

    “Our goal is not just to work within existing standards to make them serviceable. We really set out to do grounded speculation and imagine where we can push what is possible with these existing standards. We didn’t want to limit ourselves to refitting tools that were designed for images,” says Zong.

    They tested these prototypes and an accessible data table, the existing best practice for accessible visualizations, with 13 blind and visually impaired screen-reader users. They asked users to rate each tool on several criteria, including how easy it was to learn and how easy it was to locate data or answer questions.

    “One thing I thought was really interesting was how much people were constantly testing their own hypotheses or trying to make specific patterns as they moved through the visualization. The implication for navigation is that you want to be able to orient yourself within the visualization so you know where the limits are,” says Lee. “Can you accurately and easily know where the walls are in the room you are exploring?”

    Improved insights

    Users said both prototypes enabled them to more rapidly identify patterns in the data. Scrolling from a high level to deeper levels of information helped them gain insights more easily than when browsing the data table, they said. They also enjoyed faster navigation using the menu in the target prototype.

    But the data table got top marks for ease of use.

    “I expected people to be disappointed with the everyday tools when compared to the new prototypes, but they still clung to the data table a bit, likely because of their familiarity with it. That shows that principles like familiarity, learnability, and usability still matter. No matter how ‘good’ our new invention is, if it is not easy enough to learn, people might stick with an older version,” Hajas says.

    Drawing on these insights, the researchers are refining the prototypes and using them to build a software package that can be used with existing design tools to give visualizations an accessible, navigable structure.

    They also want to explore multimodal solutions. Some study participants used different devices together, like screen readers and braille displays, or data sonification tools that convey information using non-speech audio. How these tools can complement each other when applied to a visualization is still an open question, Zong says.

    In the long-run, they hope their work might lead to careful rethinking of web accessibility standards.

    “There is no one-size-fits-all solution for accessibility. While existing standards don’t presume that, they only offer simple approaches, like data tables and alt text. One of the key benefits of our research contribution is that we are proposing a framework — different preferences and data representations are situated at different points in this design space,” says Lundgard.

    “We have been working hard toward reducing the inequities that screen-reader users face when extracting information from online data visualizations for the past few years. So, we are really appreciative of this work and the knowledge that it adds to the existing literature,” says Ather Sharif, a graduate student who researches accessibility and visualization in the labs of professors Jacob Wobbrock and Katharina Reinecke at the Paul G. Allen School of Computer Science and Engineering of the University of Washington at Seattle, and who was not involved with this work.

    “I like to think of it as a movement where we’re all finally coming together and improving the experiences of a demographic that has been largely ignored, especially when presenting data through visualizations. Kudos to Jonathan, Arvind, and their team for this insightful and timely work! I am looking forward to what’s next,” adds Sharif, who is lead author of several recent papers related to accessible data visualizations.

    Amy Bower, a senior scientist in the Department of Physical Oceanography at the Woods Hole Oceanographic Institution who suffers from a degenerative retinal disease and uses a screen reader extensively in her work as a researcher and also for basic living tasks, found the researchers’ explanations of the importance of co-design to be powerful and compelling.  

    “As a blind scientist, I’m constantly searching for effective tools that will allow me to access the information conveyed in data visualizations. The layered approach taken by these researchers, which provides the option to get the ‘big picture’ from the data as well as drill down into the data points themselves, allows the user to choose how they want to explore the data,” says Bower, who also was not involved with this work. “I think the ability to freely explore the data is necessary not just to learn the ‘story’ that the data are telling, but to allow a blind researcher such as myself to formulate the next questions that need to be tackled to advance understanding in any field of study.”

    This work was supported, in part, by the National Science Foundation.   More