More stories

  • in

    A new chip for decoding data transmissions demonstrates record-breaking energy efficiency

    Imagine using an online banking app to deposit money into your account. Like all information sent over the internet, those communications could be corrupted by noise that inserts errors into the data.

    To overcome this problem, senders encode data before they are transmitted, and then a receiver uses a decoding algorithm to correct errors and recover the original message. In some instances, data are received with reliability information that helps the decoder figure out which parts of a transmission are likely errors.

    Researchers at MIT and elsewhere have developed a decoder chip that employs a new statistical model to use this reliability information in a way that is much simpler and faster than conventional techniques.

    Their chip uses a universal decoding algorithm the team previously developed, which can unravel any error correcting code. Typically, decoding hardware can only process one particular type of code. This new, universal decoder chip has broken the record for energy-efficient decoding, performing between 10 and 100 times better than other hardware.

    This advance could enable mobile devices with fewer chips, since they would no longer need separate hardware for multiple codes. This would reduce the amount of material needed for fabrication, cutting costs and improving sustainability. By making the decoding process less energy intensive, the chip could also improve device performance and lengthen battery life. It could be especially useful for demanding applications like augmented and virtual reality and 5G networks.

    “This is the first time anyone has broken below the 1 picojoule-per-bit barrier for decoding. That is roughly the same amount of energy you need to transmit a bit inside the system. It had been a big symbolic threshold, but it also changes the balance in the receiver of what might be the most pressing part from an energy perspective — we can move that away from the decoder to other elements,” says Muriel Médard, the School of Science NEC Professor of Software Science and Engineering, a professor in the Department of Electrical Engineering and Computer Science, and a co-author of a paper presenting the new chip.

    Médard’s co-authors include lead author Arslan Riaz, a graduate student at Boston University (BU); Rabia Tugce Yazicigil, assistant professor of electrical and computer engineering at BU; and Ken R. Duffy, then director of the Hamilton Institute at Maynooth University and now a professor at Northeastern University, as well as others from MIT, BU, and Maynooth University. The work is being presented at the International Solid-States Circuits Conference.

    Smarter sorting

    Digital data are transmitted over a network in the form of bits (0s and 1s). A sender encodes data by adding an error-correcting code, which is a redundant string of 0s and 1s that can be viewed as a hash. Information about this hash is held in a specific code book. A decoding algorithm at the receiver, designed for this particular code, uses its code book and the hash structure to retrieve the original information, which may have been jumbled by noise. Since each algorithm is code-specific, and most require dedicated hardware, a device would need many chips to decode different codes.

    The researchers previously demonstrated GRAND (Guessing Random Additive Noise Decoding), a universal decoding algorithm that can crack any code. GRAND works by guessing the noise that affected the transmission, subtracting that noise pattern from the received data, and then checking what remains in a code book. It guesses a series of noise patterns in the order they are likely to occur.

    Data are often received with reliability information, also called soft information, that helps a decoder figure out which pieces are errors. The new decoding chip, called ORBGRAND (Ordered Reliability Bits GRAND), uses this reliability information to sort data based on how likely each bit is to be an error.

    But it isn’t as simple as ordering single bits. While the most unreliable bit might be the likeliest error, perhaps the third and fourth most unreliable bits together are as likely to be an error as the seventh-most unreliable bit. ORBGRAND uses a new statistical model that can sort bits in this fashion, considering that multiple bits together are as likely to be an error as some single bits.

    “If your car isn’t working, soft information might tell you that it is probably the battery. But if it isn’t the battery alone, maybe it is the battery and the alternator together that are causing the problem. This is how a rational person would troubleshoot — you’d say that it could actually be these two things together before going down the list to something that is much less likely,” Médard says.

    This is a much more efficient approach than traditional decoders, which would instead look at the code structure and have a performance that is generally designed for the worst-case.

    “With a traditional decoder, you’d pull out the blueprint of the car and examine each and every piece. You’ll find the problem, but it will take you a long time and you’ll get very frustrated,” Médard explains.

    ORBGRAND stops sorting as soon as a code word is found, which is often very soon. The chip also employs parallelization, generating and testing multiple noise patterns simultaneously so it finds the code word faster. Because the decoder stops working once it finds the code word, its energy consumption stays low even though it runs multiple processes simultaneously.

    Record-breaking efficiency

    When they compared their approach to other chips, ORBGRAND decoded with maximum accuracy while consuming only 0.76 picojoules of energy per bit, breaking the previous performance record. ORBGRAND consumes between 10 and 100 times less energy than other devices.

    One of the biggest challenges of developing the new chip came from this reduced energy consumption, Médard says. With ORBGRAND, generating noise sequences is now so energy-efficient that other processes the researchers hadn’t focused on before, like checking the code word in a code book, consume most of the effort.

    “Now, this checking process, which is like turning on the car to see if it works, is the hardest part. So, we need to find more efficient ways to do that,” she says.

    The team is also exploring ways to change the modulation of transmissions so they can take advantage of the improved efficiency of the ORBGRAND chip. They also plan to see how their technique could be utilized to more efficiently manage multiple transmissions that overlap.

    The research is funded, in part, by the U.S. Defense Advanced Research Projects Agency (DARPA) and Science Foundation Ireland. More

  • in

    Study: Carbon-neutral pavements are possible by 2050, but rapid policy and industry action are needed

    Almost 2.8 million lane-miles, or about 4.6 million lane-kilometers, of the United States are paved.

    Roads and streets form the backbone of our built environment. They take us to work or school, take goods to their destinations, and much more.

    However, a new study by MIT Concrete Sustainability Hub (CSHub) researchers shows that the annual greenhouse gas (GHG) emissions of all construction materials used in the U.S. pavement network are 11.9 to 13.3 megatons. This is equivalent to the emissions of a gasoline-powered passenger vehicle driving about 30 billion miles in a year.

    As roads are built, repaved, and expanded, new approaches and thoughtful material choices are necessary to dampen their carbon footprint. 

    The CSHub researchers found that, by 2050, mixtures for pavements can be made carbon-neutral if industry and governmental actors help to apply a range of solutions — like carbon capture — to reduce, avoid, and neutralize embodied impacts. (A neutralization solution is any compensation mechanism in the value chain of a product that permanently removes the global warming impact of the processes after avoiding and reducing the emissions.) Furthermore, nearly half of pavement-related greenhouse gas (GHG) savings can be achieved in the short term with a negative or nearly net-zero cost.

    The research team, led by Hessam AzariJafari, MIT CSHub’s deputy director, closed gaps in our understanding of the impacts of pavements decisions by developing a dynamic model quantifying the embodied impact of future pavements materials demand for the U.S. road network. 

    The team first split the U.S. road network into 10-mile (about 16 kilometer) segments, forecasting the condition and performance of each. They then developed a pavement management system model to create benchmarks helping to understand the current level of emissions and the efficacy of different decarbonization strategies. 

    This model considered factors such as annual traffic volume and surface conditions, budget constraints, regional variation in pavement treatment choices, and pavement deterioration. The researchers also used a life-cycle assessment to calculate annual state-level emissions from acquiring pavement construction materials, considering future energy supply and materials procurement.

    The team considered three scenarios for the U.S. pavement network: A business-as-usual scenario in which technology remains static, a projected improvement scenario aligned with stated industry and national goals, and an ambitious improvement scenario that intensifies or accelerates projected strategies to achieve carbon neutrality. 

    If no steps are taken to decarbonize pavement mixtures, the team projected that GHG emissions of construction materials used in the U.S. pavement network would increase by 19.5 percent by 2050. Under the projected scenario, there was an estimated 38 percent embodied impact reduction for concrete and 14 percent embodied impact reduction for asphalt by 2050.

    The keys to making the pavement network carbon neutral by 2050 lie in multiple places. Fully renewable energy sources should be used for pavement materials production, transportation, and other processes. The federal government must contribute to the development of these low-carbon energy sources and carbon capture technologies, as it would be nearly impossible to achieve carbon neutrality for pavements without them. 

    Additionally, increasing pavements’ recycled content and improving their design and production efficiency can lower GHG emissions to an extent. Still, neutralization is needed to achieve carbon neutrality.

    Making the right pavement construction and repair choices would also contribute to the carbon neutrality of the network. For instance, concrete pavements can offer GHG savings across the whole life cycle as they are stiffer and stay smoother for longer, meaning they require less maintenance and have a lesser impact on the fuel efficiency of vehicles. 

    Concrete pavements have other use-phase benefits including a cooling effect through an intrinsically high albedo, meaning they reflect more sunlight than regular pavements. Therefore, they can help combat extreme heat and positively affect the earth’s energy balance through positive radiative forcing, making albedo a potential neutralization mechanism.

    At the same time, a mix of fixes, including using concrete and asphalt in different contexts and proportions, could produce significant GHG savings for the pavement network; decision-makers must consider scenarios on a case-by-case basis to identify optimal solutions. 

    In addition, it may appear as though the GHG emissions of materials used in local roads are dwarfed by the emissions of interstate highway materials. However, the study found that the two road types have a similar impact. In fact, all road types contribute heavily to the total GHG emissions of pavement materials in general. Therefore, stakeholders at the federal, state, and local levels must be involved if our roads are to become carbon neutral. 

    The path to pavement network carbon-neutrality is, therefore, somewhat of a winding road. It demands regionally specific policies and widespread investment to help implement decarbonization solutions, just as renewable energy initiatives have been supported. Providing subsidies and covering the costs of premiums, too, are vital to avoid shifts in the market that would derail environmental savings.

    When planning for these shifts, we must recall that pavements have impacts not just in their production, but across their entire life cycle. As pavements are used, maintained, and eventually decommissioned, they have significant impacts on the surrounding environment.

    If we are to meet climate goals such as the Paris Agreement, which demands that we reach carbon-neutrality by 2050 to avoid the worst impacts of climate change, we — as well as industry and governmental stakeholders — must come together to take a hard look at the roads we use every day and work to reduce their life cycle emissions. 

    The study was published in the International Journal of Life Cycle Assessment. In addition to AzariJafari, the authors include Fengdi Guo of the MIT Department of Civil and Environmental Engineering; Jeremy Gregory, executive director of the MIT Climate and Sustainability Consortium; and Randolph Kirchain, director of the MIT CSHub. More

  • in

    A new way for quantum computing systems to keep their cool

    Heat causes errors in the qubits that are the building blocks of a quantum computer, so quantum systems are typically kept inside refrigerators that keep the temperature just above absolute zero (-459 degrees Fahrenheit).

    But quantum computers need to communicate with electronics outside the refrigerator, in a room-temperature environment. The metal cables that connect these electronics bring heat into the refrigerator, which has to work even harder and draw extra power to keep the system cold. Plus, more qubits require more cables, so the size of a quantum system is limited by how much heat the fridge can remove.

    To overcome this challenge, an interdisciplinary team of MIT researchers has developed a wireless communication system that enables a quantum computer to send and receive data to and from electronics outside the refrigerator using high-speed terahertz waves.

    A transceiver chip placed inside the fridge can receive and transmit data. Terahertz waves generated outside the refrigerator are beamed in through a glass window. Data encoded onto these waves can be received by the chip. That chip also acts as a mirror, delivering data from the qubits on the terahertz waves it reflects to their source.

    This reflection process also bounces back much of the power sent into the fridge, so the process generates only a minimal amount of heat. The contactless communication system consumes up to 10 times less power than systems with metal cables.

    “By having this reflection mode, you really save the power consumption inside the fridge and leave all those dirty jobs on the outside. While this is still just a preliminary prototype and we have some room to improve, even at this point, we have shown low power consumption inside the fridge that is already better than metallic cables. I believe this could be a way to build largescale quantum systems,” says senior author Ruonan Han, an associate professor in the Department of Electrical Engineering and Computer Sciences (EECS) who leads the Terahertz Integrated Electronics Group.

    Han and his team, with expertise in terahertz waves and electronic devices, joined forces with associate professor Dirk Englund and the Quantum Photonics Laboratory team, who provided quantum engineering expertise and joined in conducting the cryogenic experiments.

    Joining Han and Englund on the paper are first author and EECS graduate student Jinchen Wang; Mohamed Ibrahim PhD ’21; Isaac Harris, a graduate student in the Quantum Photonics Laboratory; Nathan M. Monroe PhD ’22; Wasiq Khan PhD ’22; and Xiang Yi, a former postdoc who is now a professor at the South China University of Technology. The paper will be presented at the International Solid-States Circuits Conference.

    Tiny mirrors

    The researchers’ square transceiver chip, measuring about 2 millimeters on each side, is placed on a quantum computer inside the refrigerator, which is called a cryostat because it maintains cryogenic temperatures. These super-cold temperatures don’t damage the chip; in fact, they enable it to run more efficiently than it would at room temperature.

    The chip sends and receives data from a terahertz wave source outside the cryostat using a passive communication process known as backscatter, which involves reflections. An array of antennas on top of the chip, each of which is only about 200 micrometers in size, act as tiny mirrors. These mirrors can be “turned on” to reflect waves or “turned off.”

    The terahertz wave generation source encodes data onto the waves it sends into the cryostat, and the antennas in their “off” state can receive those waves and the data they carry.

    When the tiny mirrors are turned on, they can be set so they either reflect a wave in its current form or invert its phase before bouncing it back. If the reflected wave has the same phase, that represents a 0, but if the phase is inverted, that represents a 1. Electronics outside the cryostat can interpret those binary signals to decode the data.

    “This backscatter technology is not new. For instance, RFIDs are based on backscatter communication. We borrow that idea and bring it into this very unique scenario, and I think this leads to a good combination of all these technologies,” Han says.

    Terahertz advantages

    The data are transmitted using high-speed terahertz waves, which are located on the electromagnetic spectrum between radio waves and infrared light.

    Because terahertz waves are much smaller than radio waves, the chip and its antennas can be smaller, too, which would make the device easier to manufacture at scale. Terahertz waves also have higher frequencies than radio waves, so they can transmit data much faster and move larger amounts of information.

    But because terahertz waves have lower frequencies than the light waves used in photonic systems, the terahertz waves carry less quantum noise, which leads to less interference with quantum processors.

    Importantly, the transceiver chip and terahertz link can be fully constructed with standard fabrication processes on a CMOS chip, so they can be integrated into many current systems and techniques.

    “CMOS compatibility is important. For example, one terahertz link could deliver a large amount of data and feed it to another cryo-CMOS controller, which can split the signal to control multiple qubits simultaneously, so we can reduce the quantity of RF cables dramatically. This is very promising.” Wang says.

    The researchers were able to transmit data at 4 gigabits per second with their prototype, but Han says the sky is nearly the limit when it comes to boosting that speed. The downlink of the contactless system posed about 10 times less heat load than a system with metallic cables, and the temperature of the cryostat fluctuated up to a few millidegrees during experiments.

    Now that the researchers have demonstrated this wireless technology, they want to improve the system’s speed and efficiency using special terahertz fibers, which are only a few hundred micrometers wide. Han’s group has shown that these plastic wires can transmit data at a rate of 100 gigabits per second and have much better thermal insulation than fatter, metal cables.

    The researchers also want to refine the design of their transceiver to improve scalability and continue boosting its energy efficiency. Generating terahertz waves requires a lot of power, but Han’s group is studying more efficient methods that utilize low-cost chips. Incorporating this technology into the system could make the device more cost-effective.

    The transceiver chip was fabricated through the Intel University Shuttle Program. More

  • in

    New chip for mobile devices knocks out unwanted signals

    Imagine sitting in a packed stadium for a pivotal football game — tens of thousands of people are using mobile phones at the same time, perhaps video chatting with friends or posting photos on social media. The radio frequency signals being sent and received by all these devices could cause interference, which slows device performance and drains batteries.

    Designing devices that can efficiently block unwanted signals is no easy task, especially as 5G networks become more universal and future generations of wireless communication systems are developed. Conventional techniques utilize many filters to block a range of signals, but filters are bulky, expensive, and drive up production costs.

    MIT researchers have developed a circuit architecture that targets and blocks unwanted signals at a receiver’s input without hurting its performance. They borrowed a technique from digital signal processing and used a few tricks that enable it to work effectively in a radio frequency system across a wide frequency range.

    Their receiver blocked even high-power unwanted signals without introducing more noise, or inaccuracies, into the signal processing operations. The chip, which performed about 40 times better than other wideband receivers at blocking a special type of interference, does not require any additional hardware or circuitry. This would make the chip easier to manufacture at scale.

    “We are interested in developing electronic circuits and systems that meet the demands of 5G and future generations of wireless communication systems. In designing our circuits, we look for inspirations from other domains, such as digital signal processing and applied electromagnetics. We believe in circuit elegance and simplicity and try to come up with multifunctional hardware that doesn’t require additional power and chip area,” says senior author Negar Reiskarimian, the X-Window Consortium Career Development Assistant Professor in the Department of Electrical Engineering and Computer Science (EECS) and a core faculty member of the Microsystems Technology Laboratories.

    Reiskarimian wrote the paper with EECS graduate students Soroush Araei, who is the lead author, and Shahabeddin Mohin. The work is being presented at the International Solid-States Circuits Conference.

    Harmonic interference

    The researchers developed the receiver chip using what is known as a mixer-first architecture. This means that when a radio frequency signal is received by the device, it is immediately converted to a lower-frequency signal before being passed on to the analog-to-digital converter to extract the digital bits that it is carrying. This approach enables the radio to cover a wide frequency range while filtering out interference located close to the operation frequency.

    While effective, mixer-first receivers are susceptible to a particular kind of interference known as harmonic interference. Harmonic interference comes from signals that have frequencies which are multiples of a device’s operating frequency. For instance, if a device operates at 1 gigahertz, then signals at 2 gigahertz, 3 gigahertz, 5 gigahertz, etc., will cause harmonic interference. These harmonics can be indistinguishable from the original signal during the frequency conversion process.    

    “A lot of other wideband receivers don’t do anything about the harmonics until it is time to see what the bits mean. They do it later in the chain, but this doesn’t work well if you have high-power signals at the harmonic frequencies. Instead, we want to remove harmonics as soon as possible to avoid losing information,” Araei says.

    To do this, the researchers were inspired by a concept from digital signal processing known as block digital filtering. They adapted this technique to the analog domain using capacitors, which hold electric charges. The capacitors are charged up at different times as the signal is received, then they are switched off so that charge can be held and used later for processing the data.  

    These capacitors can be connected to each other in various ways, including connecting them in parallel, which enables the capacitors to exchange the stored charges. While this technique can target harmonic interference, the process results in significant signal loss. Stacking capacitors is another possibility, but this method alone is not enough to provide harmonic resilience.

    Most radio receivers already use switched-capacitor circuits to perform frequency conversion. This frequency conversion circuitry can be combined with block filtering to target harmonic interference.

    A precise arrangement

    The researchers found that arranging capacitors in a specific layout, by connecting some of them in series and then performing charge sharing, enabled the device to block harmonic interference without losing any information.

    “People have used these techniques, charge sharing and capacitor stacking, separately before, but never together. We found that both techniques must be done simultaneously to get this benefit. Moreover, we have found out how to do this in a passive way within the mixer without using any additional hardware while maintaining signal integrity and keeping the costs down,” he says.

    They tested the device by simultaneously sending a desired signal and harmonic interference. Their chip was able to block harmonic signals effectively with only a slight reduction in signal strength. It was able to handle signals that were 40 times more powerful than previous, state-of-the-art wideband receivers. More

  • in

    MIT community members elected to the National Academy of Engineering for 2023

    Seven MIT researchers are among the 106 new members and 18 international members elected to the National Academy of Engineering (NAE) this week. Fourteen additional MIT alumni, including one member of the MIT Corporation, were also elected as new members.

    One of the highest professional distinctions for engineers, membership to the NAE is given to individuals who have made outstanding contributions to “engineering research, practice, or education, including, where appropriate, significant contributions to the engineering literature” and to “the pioneering of new and developing fields of technology, making major advancements in traditional fields of engineering, or developing/implementing innovative approaches to engineering education.”

    The seven MIT researchers elected this year include:

    Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health in the Department of Electrical Engineering and Computer Science, principal investigator at the Computer Science and Artificial Intelligence Laboratory, and faculty lead for the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, for machine learning models that understand structures in text, molecules, and medical images.

    Markus J. Buehler, the Jerry McAfee (1940) Professor in Engineering from the Department of Civil and Environmental Engineering, for implementing the use of nanomechanics to model and design fracture-resistant bioinspired materials.

    Elfatih A.B. Eltahir SM ’93, ScD ’93, the H.M. King Bhumibol Professor in the Department of Civil and Environmental Engineering, for advancing understanding of how climate and land use impact water availability, environmental and human health, and vector-borne diseases.

    Neil Gershenfeld, director of the Center for Bits and Atoms, for eliminating boundaries between digital and physical worlds, from quantum computing to digital materials to the internet of things.

    Roger D. Kamm SM ’73, PhD ’77, the Cecil and Ida Green Distinguished Professor of Biological and Mechanical Engineering, for contributions to the understanding of mechanics in biology and medicine, and leadership in biomechanics.

    David W. Miller ’82, SM ’85, ScD ’88, the Jerome C. Hunsaker Professor in the Department of Aeronautics and Astronautics, for contributions in control technology for space-based telescope design, and leadership in cross-agency guidance of space technology.

    David Simchi-Levi, professor of civil and environmental engineering, core faculty member in the Institute for Data, Systems, and Society, and principal investigator at the Laboratory for Information and Decision Systems, for contributions using optimization and stochastic modeling to enhance supply chain management and operations.

    Fariborz Maseeh ScD ’90, life member of the MIT Corporation and member of the School of Engineering Dean’s Advisory Council, was also elected as a member for leadership and advances in efficient design, development, and manufacturing of microelectromechanical systems, and for empowering engineering talent through public service.

    Thirteen additional alumni were elected to the National Academy of Engineering this year. They are: Mark George Allen SM ’86, PhD ’89; Shorya Awtar ScD ’04; Inderjit Chopra ScD ’77; David Huang ’85, SM ’89, PhD ’93; Eva Lerner-Lam SM ’78; David F. Merrion SM ’59; Virginia Norwood ’47; Martin Gerard Plys ’80, SM ’81, ScD ’84; Mark Prausnitz PhD ’94; Anil Kumar Sachdev ScD ’77; Christopher Scholz PhD ’67; Melody Ann Swartz PhD ’98; and Elias Towe ’80, SM ’81, PhD ’87.

    “I am delighted that seven members of MIT’s faculty and many members of the wider MIT community were elected to the National Academy of Engineering this year,” says Anantha Chandrakasan, the dean of the MIT School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. “My warmest congratulations on this recognition of their many contributions to engineering research and education.”

    Including this year’s inductees, 156 members of the National Academy of Engineering are current or retired members of the MIT faculty and staff, or members of the MIT Corporation. More

  • in

    Efficient technique improves machine-learning models’ reliability

    Powerful machine-learning models are being used to help people tackle tough problems such as identifying disease in medical images or detecting road obstacles for autonomous vehicles. But machine-learning models can make mistakes, so in high-stakes settings it’s critical that humans know when to trust a model’s predictions.

    Uncertainty quantification is one tool that improves a model’s reliability; the model produces a score along with the prediction that expresses a confidence level that the prediction is correct. While uncertainty quantification can be useful, existing methods typically require retraining the entire model to give it that ability. Training involves showing a model millions of examples so it can learn a task. Retraining then requires millions of new data inputs, which can be expensive and difficult to obtain, and also uses huge amounts of computing resources.

    Researchers at MIT and the MIT-IBM Watson AI Lab have now developed a technique that enables a model to perform more effective uncertainty quantification, while using far fewer computing resources than other methods, and no additional data. Their technique, which does not require a user to retrain or modify a model, is flexible enough for many applications.

    The technique involves creating a simpler companion model that assists the original machine-learning model in estimating uncertainty. This smaller model is designed to identify different types of uncertainty, which can help researchers drill down on the root cause of inaccurate predictions.

    “Uncertainty quantification is essential for both developers and users of machine-learning models. Developers can utilize uncertainty measurements to help develop more robust models, while for users, it can add another layer of trust and reliability when deploying models in the real world. Our work leads to a more flexible and practical solution for uncertainty quantification,” says Maohao Shen, an electrical engineering and computer science graduate student and lead author of a paper on this technique.

    Shen wrote the paper with Yuheng Bu, a former postdoc in the Research Laboratory of Electronics (RLE) who is now an assistant professor at the University of Florida; Prasanna Sattigeri, Soumya Ghosh, and Subhro Das, research staff members at the MIT-IBM Watson AI Lab; and senior author Gregory Wornell, the Sumitomo Professor in Engineering who leads the Signals, Information, and Algorithms Laboratory RLE and is a member of the MIT-IBM Watson AI Lab. The research will be presented at the AAAI Conference on Artificial Intelligence.

    Quantifying uncertainty

    In uncertainty quantification, a machine-learning model generates a numerical score with each output to reflect its confidence in that prediction’s accuracy. Incorporating uncertainty quantification by building a new model from scratch or retraining an existing model typically requires a large amount of data and expensive computation, which is often impractical. What’s more, existing methods sometimes have the unintended consequence of degrading the quality of the model’s predictions.

    The MIT and MIT-IBM Watson AI Lab researchers have thus zeroed in on the following problem: Given a pretrained model, how can they enable it to perform effective uncertainty quantification?

    They solve this by creating a smaller and simpler model, known as a metamodel, that attaches to the larger, pretrained model and uses the features that larger model has already learned to help it make uncertainty quantification assessments.

    “The metamodel can be applied to any pretrained model. It is better to have access to the internals of the model, because we can get much more information about the base model, but it will also work if you just have a final output. It can still predict a confidence score,” Sattigeri says.

    They design the metamodel to produce the uncertainty quantification output using a technique that includes both types of uncertainty: data uncertainty and model uncertainty. Data uncertainty is caused by corrupted data or inaccurate labels and can only be reduced by fixing the dataset or gathering new data. In model uncertainty, the model is not sure how to explain the newly observed data and might make incorrect predictions, most likely because it hasn’t seen enough similar training examples. This issue is an especially challenging but common problem when models are deployed. In real-world settings, they often encounter data that are different from the training dataset.

    “Has the reliability of your decisions changed when you use the model in a new setting? You want some way to have confidence in whether it is working in this new regime or whether you need to collect training data for this particular new setting,” Wornell says.

    Validating the quantification

    Once a model produces an uncertainty quantification score, the user still needs some assurance that the score itself is accurate. Researchers often validate accuracy by creating a smaller dataset, held out from the original training data, and then testing the model on the held-out data. However, this technique does not work well in measuring uncertainty quantification because the model can achieve good prediction accuracy while still being over-confident, Shen says.

    They created a new validation technique by adding noise to the data in the validation set — this noisy data is more like out-of-distribution data that can cause model uncertainty. The researchers use this noisy dataset to evaluate uncertainty quantifications.

    They tested their approach by seeing how well a meta-model could capture different types of uncertainty for various downstream tasks, including out-of-distribution detection and misclassification detection. Their method not only outperformed all the baselines in each downstream task but also required less training time to achieve those results.

    This technique could help researchers enable more machine-learning models to effectively perform uncertainty quantification, ultimately aiding users in making better decisions about when to trust predictions.

    Moving forward, the researchers want to adapt their technique for newer classes of models, such as large language models that have a different structure than a traditional neural network, Shen says.

    The work was funded, in part, by the MIT-IBM Watson AI Lab and the U.S. National Science Foundation. More

  • in

    Helping companies deploy AI models more responsibly

    Companies today are incorporating artificial intelligence into every corner of their business. The trend is expected to continue until machine-learning models are incorporated into most of the products and services we interact with every day.

    As those models become a bigger part of our lives, ensuring their integrity becomes more important. That’s the mission of Verta, a startup that spun out of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).

    Verta’s platform helps companies deploy, monitor, and manage machine-learning models safely and at scale. Data scientists and engineers can use Verta’s tools to track different versions of models, audit them for bias, test them before deployment, and monitor their performance in the real world.

    “Everything we do is to enable more products to be built with AI, and to do that safely,” Verta founder and CEO Manasi Vartak SM ’14, PhD ’18 says. “We’re already seeing with ChatGPT how AI can be used to generate data, artefacts — you name it — that look correct but aren’t correct. There needs to be more governance and control in how AI is being used, particularly for enterprises providing AI solutions.”

    Verta is currently working with large companies in health care, finance, and insurance to help them understand and audit their models’ recommendations and predictions. It’s also working with a number of high-growth tech companies looking to speed up deployment of new, AI-enabled solutions while ensuring those solutions are used appropriately.

    Vartak says the company has been able to decrease the time it takes customers to deploy AI models by orders of magnitude while ensuring those models are explainable and fair — an especially important factor for companies in highly regulated industries.

    Health care companies, for example, can use Verta to improve AI-powered patient monitoring and treatment recommendations. Such systems need to be thoroughly vetted for errors and biases before they’re used on patients.

    “Whether it’s bias or fairness or explainability, it goes back to our philosophy on model governance and management,” Vartak says. “We think of it like a preflight checklist: Before an airplane takes off, there’s a set of checks you need to do before you get your airplane off the ground. It’s similar with AI models. You need to make sure you’ve done your bias checks, you need to make sure there’s some level of explainability, you need to make sure your model is reproducible. We help with all of that.”

    From project to product

    Before coming to MIT, Vartak worked as a data scientist for a social media company. In one project, after spending weeks tuning machine-learning models that curated content to show in people’s feeds, she learned an ex-employee had already done the same thing. Unfortunately, there was no record of what they did or how it affected the models.

    For her PhD at MIT, Vartak decided to build tools to help data scientists develop, test, and iterate on machine-learning models. Working in CSAIL’s Database Group, Vartak recruited a team of graduate students and participants in MIT’s Undergraduate Research Opportunities Program (UROP).

    “Verta would not exist without my work at MIT and MIT’s ecosystem,” Vartak says. “MIT brings together people on the cutting edge of tech and helps us build the next generation of tools.”

    The team worked with data scientists in the CSAIL Alliances program to decide what features to build and iterated based on feedback from those early adopters. Vartak says the resulting project, named ModelDB, was the first open-source model management system.

    Vartak also took several business classes at the MIT Sloan School of Management during her PhD and worked with classmates on startups that recommended clothing and tracked health, spending countless hours in the Martin Trust Center for MIT Entrepreneurship and participating in the center’s delta v summer accelerator.

    “What MIT lets you do is take risks and fail in a safe environment,” Vartak says. “MIT afforded me those forays into entrepreneurship and showed me how to go about building products and finding first customers, so by the time Verta came around I had done it on a smaller scale.”

    ModelDB helped data scientists train and track models, but Vartak quickly saw the stakes were higher once models were deployed at scale. At that point, trying to improve (or accidentally breaking) models can have major implications for companies and society. That insight led Vartak to begin building Verta.

    “At Verta, we help manage models, help run models, and make sure they’re working as expected, which we call model monitoring,” Vartak explains. “All of those pieces have their roots back to MIT and my thesis work. Verta really evolved from my PhD project at MIT.”

    Verta’s platform helps companies deploy models more quickly, ensure they continue working as intended over time, and manage the models for compliance and governance. Data scientists can use Verta to track different versions of models and understand how they were built, answering questions like how data were used and which explainability or bias checks were run. They can also vet them by running them through deployment checklists and security scans.

    “Verta’s platform takes the data science model and adds half a dozen layers to it to transform it into something you can use to power, say, an entire recommendation system on your website,” Vartak says. “That includes performance optimizations, scaling, and cycle time, which is how quickly you can take a model and turn it into a valuable product, as well as governance.”

    Supporting the AI wave

    Vartak says large companies often use thousands of different models that influence nearly every part of their operations.

    “An insurance company, for example, will use models for everything from underwriting to claims, back-office processing, marketing, and sales,” Vartak says. “So, the diversity of models is really high, there’s a large volume of them, and the level of scrutiny and compliance companies need around these models are very high. They need to know things like: Did you use the data you were supposed to use? Who were the people who vetted it? Did you run explainability checks? Did you run bias checks?”

    Vartak says companies that don’t adopt AI will be left behind. The companies that ride AI to success, meanwhile, will need well-defined processes in place to manage their ever-growing list of models.

    “In the next 10 years, every device we interact with is going to have intelligence built in, whether it’s a toaster or your email programs, and it’s going to make your life much, much easier,” Vartak says. “What’s going to enable that intelligence are better models and software, like Verta, that help you integrate AI into all of these applications very quickly.” More

  • in

    3 Questions: Leo Anthony Celi on ChatGPT and medicine

    Launched in November 2022, ChatGPT is a chatbot that can not only engage in human-like conversation, but also provide accurate answers to questions in a wide range of knowledge domains. The chatbot, created by the firm OpenAI, is based on a family of “large language models” — algorithms that can recognize, predict, and generate text based on patterns they identify in datasets containing hundreds of millions of words.

    In a study appearing in PLOS Digital Health this week, researchers report that ChatGPT performed at or near the passing threshold of the U.S. Medical Licensing Exam (USMLE) — a comprehensive, three-part exam that doctors must pass before practicing medicine in the United States. In an editorial accompanying the paper, Leo Anthony Celi, a principal research scientist at MIT’s Institute for Medical Engineering and Science, a practicing physician at Beth Israel Deaconess Medical Center, and an associate professor at Harvard Medical School, and his co-authors argue that ChatGPT’s success on this exam should be a wake-up call for the medical community.

    Q: What do you think the success of ChatGPT on the USMLE reveals about the nature of the medical education and evaluation of students? 

    A: The framing of medical knowledge as something that can be encapsulated into multiple choice questions creates a cognitive framing of false certainty. Medical knowledge is often taught as fixed model representations of health and disease. Treatment effects are presented as stable over time despite constantly changing practice patterns. Mechanistic models are passed on from teachers to students with little emphasis on how robustly those models were derived, the uncertainties that persist around them, and how they must be recalibrated to reflect advances worthy of incorporation into practice. 

    ChatGPT passed an examination that rewards memorizing the components of a system rather than analyzing how it works, how it fails, how it was created, how it is maintained. Its success demonstrates some of the shortcomings in how we train and evaluate medical students. Critical thinking requires appreciation that ground truths in medicine continually shift, and more importantly, an understanding how and why they shift.

    Q: What steps do you think the medical community should take to modify how students are taught and evaluated?  

    A: Learning is about leveraging the current body of knowledge, understanding its gaps, and seeking to fill those gaps. It requires being comfortable with and being able to probe the uncertainties. We fail as teachers by not teaching students how to understand the gaps in the current body of knowledge. We fail them when we preach certainty over curiosity, and hubris over humility.  

    Medical education also requires being aware of the biases in the way medical knowledge is created and validated. These biases are best addressed by optimizing the cognitive diversity within the community. More than ever, there is a need to inspire cross-disciplinary collaborative learning and problem-solving. Medical students need data science skills that will allow every clinician to contribute to, continually assess, and recalibrate medical knowledge.

    Q: Do you see any upside to ChatGPT’s success in this exam? Are there beneficial ways that ChatGPT and other forms of AI can contribute to the practice of medicine? 

    A: There is no question that large language models (LLMs) such as ChatGPT are very powerful tools in sifting through content beyond the capabilities of experts, or even groups of experts, and extracting knowledge. However, we will need to address the problem of data bias before we can leverage LLMs and other artificial intelligence technologies. The body of knowledge that LLMs train on, both medical and beyond, is dominated by content and research from well-funded institutions in high-income countries. It is not representative of most of the world.

    We have also learned that even mechanistic models of health and disease may be biased. These inputs are fed to encoders and transformers that are oblivious to these biases. Ground truths in medicine are continuously shifting, and currently, there is no way to determine when ground truths have drifted. LLMs do not evaluate the quality and the bias of the content they are being trained on. Neither do they provide the level of uncertainty around their output. But the perfect should not be the enemy of the good. There is tremendous opportunity to improve the way health care providers currently make clinical decisions, which we know are tainted with unconscious bias. I have no doubt AI will deliver its promise once we have optimized the data input. More