More stories

  • in

    This tiny chip can safeguard user data while enabling efficient computing on a smartphone

    Health-monitoring apps can help people manage chronic diseases or stay on track with fitness goals, using nothing more than a smartphone. However, these apps can be slow and energy-inefficient because the vast machine-learning models that power them must be shuttled between a smartphone and a central memory server.

    Engineers often speed things up using hardware that reduces the need to move so much data back and forth. While these machine-learning accelerators can streamline computation, they are susceptible to attackers who can steal secret information.

    To reduce this vulnerability, researchers from MIT and the MIT-IBM Watson AI Lab created a machine-learning accelerator that is resistant to the two most common types of attacks. Their chip can keep a user’s health records, financial information, or other sensitive data private while still enabling huge AI models to run efficiently on devices.

    The team developed several optimizations that enable strong security while only slightly slowing the device. Moreover, the added security does not impact the accuracy of computations. This machine-learning accelerator could be particularly beneficial for demanding AI applications like augmented and virtual reality or autonomous driving.

    While implementing the chip would make a device slightly more expensive and less energy-efficient, that is sometimes a worthwhile price to pay for security, says lead author Maitreyi Ashok, an electrical engineering and computer science (EECS) graduate student at MIT.

    “It is important to design with security in mind from the ground up. If you are trying to add even a minimal amount of security after a system has been designed, it is prohibitively expensive. We were able to effectively balance a lot of these tradeoffs during the design phase,” says Ashok.

    Her co-authors include Saurav Maji, an EECS graduate student; Xin Zhang and John Cohn of the MIT-IBM Watson AI Lab; and senior author Anantha Chandrakasan, MIT’s chief innovation and strategy officer, dean of the School of Engineering, and the Vannevar Bush Professor of EECS. The research will be presented at the IEEE Custom Integrated Circuits Conference.

    Side-channel susceptibility

    The researchers targeted a type of machine-learning accelerator called digital in-memory compute. A digital IMC chip performs computations inside a device’s memory, where pieces of a machine-learning model are stored after being moved over from a central server.

    The entire model is too big to store on the device, but by breaking it into pieces and reusing those pieces as much as possible, IMC chips reduce the amount of data that must be moved back and forth.

    But IMC chips can be susceptible to hackers. In a side-channel attack, a hacker monitors the chip’s power consumption and uses statistical techniques to reverse-engineer data as the chip computes. In a bus-probing attack, the hacker can steal bits of the model and dataset by probing the communication between the accelerator and the off-chip memory.

    Digital IMC speeds computation by performing millions of operations at once, but this complexity makes it tough to prevent attacks using traditional security measures, Ashok says.

    She and her collaborators took a three-pronged approach to blocking side-channel and bus-probing attacks.

    First, they employed a security measure where data in the IMC are split into random pieces. For instance, a bit zero might be split into three bits that still equal zero after a logical operation. The IMC never computes with all pieces in the same operation, so a side-channel attack could never reconstruct the real information.

    But for this technique to work, random bits must be added to split the data. Because digital IMC performs millions of operations at once, generating so many random bits would involve too much computing. For their chip, the researchers found a way to simplify computations, making it easier to effectively split data while eliminating the need for random bits.

    Second, they prevented bus-probing attacks using a lightweight cipher that encrypts the model stored in off-chip memory. This lightweight cipher only requires simple computations. In addition, they only decrypted the pieces of the model stored on the chip when necessary.

    Third, to improve security, they generated the key that decrypts the cipher directly on the chip, rather than moving it back and forth with the model. They generated this unique key from random variations in the chip that are introduced during manufacturing, using what is known as a physically unclonable function.

    “Maybe one wire is going to be a little bit thicker than another. We can use these variations to get zeros and ones out of a circuit. For every chip, we can get a random key that should be consistent because these random properties shouldn’t change significantly over time,” Ashok explains.

    They reused the memory cells on the chip, leveraging the imperfections in these cells to generate the key. This requires less computation than generating a key from scratch.

    “As security has become a critical issue in the design of edge devices, there is a need to develop a complete system stack focusing on secure operation. This work focuses on security for machine-learning workloads and describes a digital processor that uses cross-cutting optimization. It incorporates encrypted data access between memory and processor, approaches to preventing side-channel attacks using randomization, and exploiting variability to generate unique codes. Such designs are going to be critical in future mobile devices,” says Chandrakasan.

    Safety testing

    To test their chip, the researchers took on the role of hackers and tried to steal secret information using side-channel and bus-probing attacks.

    Even after making millions of attempts, they couldn’t reconstruct any real information or extract pieces of the model or dataset. The cipher also remained unbreakable. By contrast, it took only about 5,000 samples to steal information from an unprotected chip.

    The addition of security did reduce the energy efficiency of the accelerator, and it also required a larger chip area, which would make it more expensive to fabricate.

    The team is planning to explore methods that could reduce the energy consumption and size of their chip in the future, which would make it easier to implement at scale.

    “As it becomes too expensive, it becomes harder to convince someone that security is critical. Future work could explore these tradeoffs. Maybe we could make it a little less secure but easier to implement and less expensive,” Ashok says.

    The research is funded, in part, by the MIT-IBM Watson AI Lab, the National Science Foundation, and a Mathworks Engineering Fellowship. More

  • in

    Startup accelerates progress toward light-speed computing

    Our ability to cram ever-smaller transistors onto a chip has enabled today’s age of ubiquitous computing. But that approach is finally running into limits, with some experts declaring an end to Moore’s Law and a related principle, known as Dennard’s Scaling.

    Those developments couldn’t be coming at a worse time. Demand for computing power has skyrocketed in recent years thanks in large part to the rise of artificial intelligence, and it shows no signs of slowing down.

    Now Lightmatter, a company founded by three MIT alumni, is continuing the remarkable progress of computing by rethinking the lifeblood of the chip. Instead of relying solely on electricity, the company also uses light for data processing and transport. The company’s first two products, a chip specializing in artificial intelligence operations and an interconnect that facilitates data transfer between chips, use both photons and electrons to drive more efficient operations.

    “The two problems we are solving are ‘How do chips talk?’ and ‘How do you do these [AI] calculations?’” Lightmatter co-founder and CEO Nicholas Harris PhD ’17 says. “With our first two products, Envise and Passage, we’re addressing both of those questions.”

    In a nod to the size of the problem and the demand for AI, Lightmatter raised just north of $300 million in 2023 at a valuation of $1.2 billion. Now the company is demonstrating its technology with some of the largest technology companies in the world in hopes of reducing the massive energy demand of data centers and AI models.

    “We’re going to enable platforms on top of our interconnect technology that are made up of hundreds of thousands of next-generation compute units,” Harris says. “That simply wouldn’t be possible without the technology that we’re building.”

    From idea to $100K

    Prior to MIT, Harris worked at the semiconductor company Micron Technology, where he studied the fundamental devices behind integrated chips. The experience made him see how the traditional approach for improving computer performance — cramming more transistors onto each chip — was hitting its limits.

    “I saw how the roadmap for computing was slowing, and I wanted to figure out how I could continue it,” Harris says. “What approaches can augment computers? Quantum computing and photonics were two of those pathways.”

    Harris came to MIT to work on photonic quantum computing for his PhD under Dirk Englund, an associate professor in the Department of Electrical Engineering and Computer Science. As part of that work, he built silicon-based integrated photonic chips that could send and process information using light instead of electricity.

    The work led to dozens of patents and more than 80 research papers in prestigious journals like Nature. But another technology also caught Harris’s attention at MIT.

    “I remember walking down the hall and seeing students just piling out of these auditorium-sized classrooms, watching relayed live videos of lectures to see professors teach deep learning,” Harris recalls, referring to the artificial intelligence technique. “Everybody on campus knew that deep learning was going to be a huge deal, so I started learning more about it, and we realized that the systems I was building for photonic quantum computing could actually be leveraged to do deep learning.”

    Harris had planned to become a professor after his PhD, but he realized he could attract more funding and innovate more quickly through a startup, so he teamed up with Darius Bunandar PhD ’18, who was also studying in Englund’s lab, and Thomas Graham MBA ’18. The co-founders successfully launched into the startup world by winning the 2017 MIT $100K Entrepreneurship Competition.

    Seeing the light

    Lightmatter’s Envise chip takes the part of computing that electrons do well, like memory, and combines it with what light does well, like performing the massive matrix multiplications of deep-learning models.

    “With photonics, you can perform multiple calculations at the same time because the data is coming in on different colors of light,” Harris explains. “In one color, you could have a photo of a dog. In another color, you could have a photo of a cat. In another color, maybe a tree, and you could have all three of those operations going through the same optical computing unit, this matrix accelerator, at the same time. That drives up operations per area, and it reuses the hardware that’s there, driving up energy efficiency.”

    Passage takes advantage of light’s latency and bandwidth advantages to link processors in a manner similar to how fiber optic cables use light to send data over long distances. It also enables chips as big as entire wafers to act as a single processor. Sending information between chips is central to running the massive server farms that power cloud computing and run AI systems like ChatGPT.

    Both products are designed to bring energy efficiencies to computing, which Harris says are needed to keep up with rising demand without bringing huge increases in power consumption.

    “By 2040, some predict that around 80 percent of all energy usage on the planet will be devoted to data centers and computing, and AI is going to be a huge fraction of that,” Harris says. “When you look at computing deployments for training these large AI models, they’re headed toward using hundreds of megawatts. Their power usage is on the scale of cities.”

    Lightmatter is currently working with chipmakers and cloud service providers for mass deployment. Harris notes that because the company’s equipment runs on silicon, it can be produced by existing semiconductor fabrication facilities without massive changes in process.

    The ambitious plans are designed to open up a new path forward for computing that would have huge implications for the environment and economy.

    “We’re going to continue looking at all of the pieces of computers to figure out where light can accelerate them, make them more energy efficient, and faster, and we’re going to continue to replace those parts,” Harris says. “Right now, we’re focused on interconnect with Passage and on compute with Envise. But over time, we’re going to build out the next generation of computers, and it’s all going to be centered around light.” More

  • in

    Self-powered sensor automatically harvests magnetic energy

    MIT researchers have developed a battery-free, self-powered sensor that can harvest energy from its environment.

    Because it requires no battery that must be recharged or replaced, and because it requires no special wiring, such a sensor could be embedded in a hard-to-reach place, like inside the inner workings of a ship’s engine. There, it could automatically gather data on the machine’s power consumption and operations for long periods of time.

    The researchers built a temperature-sensing device that harvests energy from the magnetic field generated in the open air around a wire. One could simply clip the sensor around a wire that carries electricity — perhaps the wire that powers a motor — and it will automatically harvest and store energy which it uses to monitor the motor’s temperature.

    “This is ambient power — energy that I don’t have to make a specific, soldered connection to get. And that makes this sensor very easy to install,” says Steve Leeb, the Emanuel E. Landsman Professor of Electrical Engineering and Computer Science (EECS) and professor of mechanical engineering, a member of the Research Laboratory of Electronics, and senior author of a paper on the energy-harvesting sensor.

    In the paper, which appeared as the featured article in the January issue of the IEEE Sensors Journal, the researchers offer a design guide for an energy-harvesting sensor that lets an engineer balance the available energy in the environment with their sensing needs.

    The paper lays out a roadmap for the key components of a device that can sense and control the flow of energy continually during operation.

    The versatile design framework is not limited to sensors that harvest magnetic field energy, and can be applied to those that use other power sources, like vibrations or sunlight. It could be used to build networks of sensors for factories, warehouses, and commercial spaces that cost less to install and maintain.

    “We have provided an example of a battery-less sensor that does something useful, and shown that it is a practically realizable solution. Now others will hopefully use our framework to get the ball rolling to design their own sensors,” says lead author Daniel Monagle, an EECS graduate student.

    Monagle and Leeb are joined on the paper by EECS graduate student Eric Ponce.

    John Donnal, an associate professor of weapons and controls engineering at the U.S. Naval Academy who was not involved with this work, studies techniques to monitor ship systems. Getting access to power on a ship can be difficult, he says, since there are very few outlets and strict restrictions as to what equipment can be plugged in.

    “Persistently measuring the vibration of a pump, for example, could give the crew real-time information on the health of the bearings and mounts, but powering a retrofit sensor often requires so much additional infrastructure that the investment is not worthwhile,” Donnal adds. “Energy-harvesting systems like this could make it possible to retrofit a wide variety of diagnostic sensors on ships and significantly reduce the overall cost of maintenance.”

    A how-to guide

    The researchers had to meet three key challenges to develop an effective, battery-free, energy-harvesting sensor.

    First, the system must be able to cold start, meaning it can fire up its electronics with no initial voltage. They accomplished this with a network of integrated circuits and transistors that allow the system to store energy until it reaches a certain threshold. The system will only turn on once it has stored enough power to fully operate.

    Second, the system must store and convert the energy it harvests efficiently, and without a battery. While the researchers could have included a battery, that would add extra complexities to the system and could pose a fire risk.

    “You might not even have the luxury of sending out a technician to replace a battery. Instead, our system is maintenance-free. It harvests energy and operates itself,” Monagle adds.

    To avoid using a battery, they incorporate internal energy storage that can include a series of capacitors. Simpler than a battery, a capacitor stores energy in the electrical field between conductive plates. Capacitors can be made from a variety of materials, and their capabilities can be tuned to a range of operating conditions, safety requirements, and available space.

    The team carefully designed the capacitors so they are big enough to store the energy the device needs to turn on and start harvesting power, but small enough that the charge-up phase doesn’t take too long.

    In addition, since a sensor might go weeks or even months before turning on to take a measurement, they ensured the capacitors can hold enough energy even if some leaks out over time.

    Finally, they developed a series of control algorithms that dynamically measure and budget the energy collected, stored, and used by the device. A microcontroller, the “brain” of the energy management interface, constantly checks how much energy is stored and infers whether to turn the sensor on or off, take a measurement, or kick the harvester into a higher gear so it can gather more energy for more complex sensing needs.

    “Just like when you change gears on a bike, the energy management interface looks at how the harvester is doing, essentially seeing whether it is pedaling too hard or too soft, and then it varies the electronic load so it can maximize the amount of power it is harvesting and match the harvest to the needs of the sensor,” Monagle explains.

    Self-powered sensor

    Using this design framework, they built an energy management circuit for an off-the-shelf temperature sensor. The device harvests magnetic field energy and uses it to continually sample temperature data, which it sends to a smartphone interface using Bluetooth.

    The researchers used super-low-power circuits to design the device, but quickly found that these circuits have tight restrictions on how much voltage they can withstand before breaking down. Harvesting too much power could cause the device to explode.

    To avoid that, their energy harvester operating system in the microcontroller automatically adjusts or reduces the harvest if the amount of stored energy becomes excessive.

    They also found that communication — transmitting data gathered by the temperature sensor — was by far the most power-hungry operation.

    “Ensuring the sensor has enough stored energy to transmit data is a constant challenge that involves careful design,” Monagle says.

    In the future, the researchers plan to explore less energy-intensive means of transmitting data, such as using optics or acoustics. They also want to more rigorously model and predict how much energy might be coming into a system, or how much energy a sensor might need to take measurements, so a device could effectively gather even more data.

    “If you only make the measurements you think you need, you may miss something really valuable. With more information, you might be able to learn something you didn’t expect about a device’s operations. Our framework lets you balance those considerations,” Leeb says.  

    “This paper is well-documented regarding what a practical self-powered sensor node should internally entail for realistic scenarios. The overall design guidelines, particularly on the cold-start issue, are very helpful,” says Jinyeong Moon, an assistant professor of electrical and computer engineering at Florida State University College of Engineering who was not involved with this work. “Engineers planning to design a self-powering module for a wireless sensor node will greatly benefit from these guidelines, easily ticking off traditionally cumbersome cold-start-related checklists.”

    The work is supported, in part, by the Office of Naval Research and The Grainger Foundation. More

  • in

    Accelerating AI tasks while preserving data security

    With the proliferation of computationally intensive machine-learning applications, such as chatbots that perform real-time language translation, device manufacturers often incorporate specialized hardware components to rapidly move and process the massive amounts of data these systems demand.

    Choosing the best design for these components, known as deep neural network accelerators, is challenging because they can have an enormous range of design options. This difficult problem becomes even thornier when a designer seeks to add cryptographic operations to keep data safe from attackers.

    Now, MIT researchers have developed a search engine that can efficiently identify optimal designs for deep neural network accelerators, that preserve data security while boosting performance.

    Their search tool, known as SecureLoop, is designed to consider how the addition of data encryption and authentication measures will impact the performance and energy usage of the accelerator chip. An engineer could use this tool to obtain the optimal design of an accelerator tailored to their neural network and machine-learning task.

    When compared to conventional scheduling techniques that don’t consider security, SecureLoop can improve performance of accelerator designs while keeping data protected.  

    Using SecureLoop could help a user improve the speed and performance of demanding AI applications, such as autonomous driving or medical image classification, while ensuring sensitive user data remains safe from some types of attacks.

    “If you are interested in doing a computation where you are going to preserve the security of the data, the rules that we used before for finding the optimal design are now broken. So all of that optimization needs to be customized for this new, more complicated set of constraints. And that is what [lead author] Kyungmi has done in this paper,” says Joel Emer, an MIT professor of the practice in computer science and electrical engineering and co-author of a paper on SecureLoop.

    Emer is joined on the paper by lead author Kyungmi Lee, an electrical engineering and computer science graduate student; Mengjia Yan, the Homer A. Burnell Career Development Assistant Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Anantha Chandrakasan, dean of the MIT School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science. The research will be presented at the IEEE/ACM International Symposium on Microarchitecture.

    “The community passively accepted that adding cryptographic operations to an accelerator will introduce overhead. They thought it would introduce only a small variance in the design trade-off space. But, this is a misconception. In fact, cryptographic operations can significantly distort the design space of energy-efficient accelerators. Kyungmi did a fantastic job identifying this issue,” Yan adds.

    Secure acceleration

    A deep neural network consists of many layers of interconnected nodes that process data. Typically, the output of one layer becomes the input of the next layer. Data are grouped into units called tiles for processing and transfer between off-chip memory and the accelerator. Each layer of the neural network can have its own data tiling configuration.

    A deep neural network accelerator is a processor with an array of computational units that parallelizes operations, like multiplication, in each layer of the network. The accelerator schedule describes how data are moved and processed.

    Since space on an accelerator chip is at a premium, most data are stored in off-chip memory and fetched by the accelerator when needed. But because data are stored off-chip, they are vulnerable to an attacker who could steal information or change some values, causing the neural network to malfunction.

    “As a chip manufacturer, you can’t guarantee the security of external devices or the overall operating system,” Lee explains.

    Manufacturers can protect data by adding authenticated encryption to the accelerator. Encryption scrambles the data using a secret key. Then authentication cuts the data into uniform chunks and assigns a cryptographic hash to each chunk of data, which is stored along with the data chunk in off-chip memory.

    When the accelerator fetches an encrypted chunk of data, known as an authentication block, it uses a secret key to recover and verify the original data before processing it.

    But the sizes of authentication blocks and tiles of data don’t match up, so there could be multiple tiles in one block, or a tile could be split between two blocks. The accelerator can’t arbitrarily grab a fraction of an authentication block, so it may end up grabbing extra data, which uses additional energy and slows down computation.

    Plus, the accelerator still must run the cryptographic operation on each authentication block, adding even more computational cost.

    An efficient search engine

    With SecureLoop, the MIT researchers sought a method that could identify the fastest and most energy efficient accelerator schedule — one that minimizes the number of times the device needs to access off-chip memory to grab extra blocks of data because of encryption and authentication.  

    They began by augmenting an existing search engine Emer and his collaborators previously developed, called Timeloop. First, they added a model that could account for the additional computation needed for encryption and authentication.

    Then, they reformulated the search problem into a simple mathematical expression, which enables SecureLoop to find the ideal authentical block size in a much more efficient manner than searching through all possible options.

    “Depending on how you assign this block, the amount of unnecessary traffic might increase or decrease. If you assign the cryptographic block cleverly, then you can just fetch a small amount of additional data,” Lee says.

    Finally, they incorporated a heuristic technique that ensures SecureLoop identifies a schedule which maximizes the performance of the entire deep neural network, rather than only a single layer.

    At the end, the search engine outputs an accelerator schedule, which includes the data tiling strategy and the size of the authentication blocks, that provides the best possible speed and energy efficiency for a specific neural network.

    “The design spaces for these accelerators are huge. What Kyungmi did was figure out some very pragmatic ways to make that search tractable so she could find good solutions without needing to exhaustively search the space,” says Emer.

    When tested in a simulator, SecureLoop identified schedules that were up to 33.2 percent faster and exhibited 50.2 percent better energy delay product (a metric related to energy efficiency) than other methods that didn’t consider security.

    The researchers also used SecureLoop to explore how the design space for accelerators changes when security is considered. They learned that allocating a bit more of the chip’s area for the cryptographic engine and sacrificing some space for on-chip memory can lead to better performance, Lee says.

    In the future, the researchers want to use SecureLoop to find accelerator designs that are resilient to side-channel attacks, which occur when an attacker has access to physical hardware. For instance, an attacker could monitor the power consumption pattern of a device to obtain secret information, even if the data have been encrypted. They are also extending SecureLoop so it could be applied to other kinds of computation.

    This work is funded, in part, by Samsung Electronics and the Korea Foundation for Advanced Studies. More

  • in

    New techniques efficiently accelerate sparse tensors for massive AI models

    Researchers from MIT and NVIDIA have developed two techniques that accelerate the processing of sparse tensors, a type of data structure that’s used for high-performance computing tasks. The complementary techniques could result in significant improvements to the performance and energy-efficiency of systems like the massive machine-learning models that drive generative artificial intelligence.

    Tensors are data structures used by machine-learning models. Both of the new methods seek to efficiently exploit what’s known as sparsity — zero values — in the tensors. When processing these tensors, one can skip over the zeros and save on both computation and memory. For instance, anything multiplied by zero is zero, so it can skip that operation. And it can compress the tensor (zeros don’t need to be stored) so a larger portion can be stored in on-chip memory.

    However, there are several challenges to exploiting sparsity. Finding the nonzero values in a large tensor is no easy task. Existing approaches often limit the locations of nonzero values by enforcing a sparsity pattern to simplify the search, but this limits the variety of sparse tensors that can be processed efficiently.

    Another challenge is that the number of nonzero values can vary in different regions of the tensor. This makes it difficult to determine how much space is required to store different regions in memory. To make sure the region fits, more space is often allocated than is needed, causing the storage buffer to be underutilized. This increases off-chip memory traffic, which increases energy consumption.

    The MIT and NVIDIA researchers crafted two solutions to address these problems. For one, they developed a technique that allows the hardware to efficiently find the nonzero values for a wider variety of sparsity patterns.

    For the other solution, they created a method that can handle the case where the data do not fit in memory, which increases the utilization of the storage buffer and reduces off-chip memory traffic.

    Both methods boost the performance and reduce the energy demands of hardware accelerators specifically designed to speed up the processing of sparse tensors.

    “Typically, when you use more specialized or domain-specific hardware accelerators, you lose the flexibility that you would get from a more general-purpose processor, like a CPU. What stands out with these two works is that we show that you can still maintain flexibility and adaptability while being specialized and efficient,” says Vivienne Sze, associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the Research Laboratory of Electronics (RLE), and co-senior author of papers on both advances.

    Her co-authors include lead authors Yannan Nellie Wu PhD ’23 and Zi Yu Xue, an electrical engineering and computer science graduate student; and co-senior author Joel Emer, an MIT professor of the practice in computer science and electrical engineering and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), as well as others at NVIDIA. Both papers will be presented at the IEEE/ACM International Symposium on Microarchitecture.

    HighLight: Efficiently finding zero values

    Sparsity can arise in the tensor for a variety of reasons. For example, researchers sometimes “prune” unnecessary pieces of the machine-learning models by replacing some values in the tensor with zeros, creating sparsity. The degree of sparsity (percentage of zeros) and the locations of the zeros can vary for different models.

    To make it easier to find the remaining nonzero values in a model with billions of individual values, researchers often restrict the location of the nonzero values so they fall into a certain pattern. However, each hardware accelerator is typically designed to support one specific sparsity pattern, limiting its flexibility.  

    By contrast, the hardware accelerator the MIT researchers designed, called HighLight, can handle a wide variety of sparsity patterns and still perform well when running models that don’t have any zero values.

    They use a technique they call “hierarchical structured sparsity” to efficiently represent a wide variety of sparsity patterns that are composed of several simple sparsity patterns. This approach divides the values in a tensor into smaller blocks, where each block has its own simple, sparsity pattern (perhaps two zeros and two nonzeros in a block with four values).

    Then, they combine the blocks into a hierarchy, where each collection of blocks also has its own simple, sparsity pattern (perhaps one zero block and three nonzero blocks in a level with four blocks). They continue combining blocks into larger levels, but the patterns remain simple at each step.

    This simplicity enables HighLight to more efficiently find and skip zeros, so it can take full advantage of the opportunity to cut excess computation. On average, their accelerator design had about six times better energy-delay product (a metric related to energy efficiency) than other approaches.

    “In the end, the HighLight accelerator is able to efficiently accelerate dense models because it does not introduce a lot of overhead, and at the same time it is able to exploit workloads with different amounts of zero values based on hierarchical structured sparsity,” Wu explains.

    In the future, she and her collaborators want to apply hierarchical structured sparsity to more types of machine-learning models and different types of tensors in the models.

    Tailors and Swiftiles: Effectively “overbooking” to accelerate workloads

    Researchers can also leverage sparsity to more efficiently move and process data on a computer chip.

    Since the tensors are often larger than what can be stored in the memory buffer on chip, the chip only grabs and processes a chunk of the tensor at a time. The chunks are called tiles.

    To maximize the utilization of that buffer and limit the number of times the chip must access off-chip memory, which often dominates energy consumption and limits processing speed, researchers seek to use the largest tile that will fit into the buffer.

    But in a sparse tensor, many of the data values are zero, so an even larger tile can fit into the buffer than one might expect based on its capacity. Zero values don’t need to be stored.

    But the number of zero values can vary across different regions of the tensor, so they can also vary for each tile. This makes it difficult to determine a tile size that will fit in the buffer. As a result, existing approaches often conservatively assume there are no zeros and end up selecting a smaller tile, which results in wasted blank spaces in the buffer.

    To address this uncertainty, the researchers propose the use of “overbooking” to allow them to increase the tile size, as well as a way to tolerate it if the tile doesn’t fit the buffer.

    The same way an airline overbooks tickets for a flight, if all the passengers show up, the airline must compensate the ones who are bumped from the plane. But usually all the passengers don’t show up.

    In a sparse tensor, a tile size can be chosen such that usually the tiles will have enough zeros that most still fit into the buffer. But occasionally, a tile will have more nonzero values than will fit. In this case, those data are bumped out of the buffer.

    The researchers enable the hardware to only re-fetch the bumped data without grabbing and processing the entire tile again. They modify the “tail end” of the buffer to handle this, hence the name of this technique, Tailors.

    Then they also created an approach for finding the size for tiles that takes advantage of overbooking. This method, called Swiftiles, swiftly estimates the ideal tile size so that a specific percentage of tiles, set by the user, are overbooked. (The names “Tailors” and “Swiftiles” pay homage to Taylor Swift, whose recent Eras tour was fraught with overbooked presale codes for tickets).

    Swiftiles reduces the number of times the hardware needs to check the tensor to identify an ideal tile size, saving on computation. The combination of Tailors and Swiftiles more than doubles the speed while requiring only half the energy demands of existing hardware accelerators which cannot handle overbooking.

    “Swiftiles allows us to estimate how large these tiles need to be without requiring multiple iterations to refine the estimate. This only works because overbooking is supported. Even if you are off by a decent amount, you can still extract a fair bit of speedup because of the way the non-zeros are distributed,” Xue says.

    In the future, the researchers want to apply the idea of overbooking to other aspects in computer architecture and also work to improve the process for estimating the optimal level of overbooking.

    This research is funded, in part, by the MIT AI Hardware Program. More

  • in

    A new chip for decoding data transmissions demonstrates record-breaking energy efficiency

    Imagine using an online banking app to deposit money into your account. Like all information sent over the internet, those communications could be corrupted by noise that inserts errors into the data.

    To overcome this problem, senders encode data before they are transmitted, and then a receiver uses a decoding algorithm to correct errors and recover the original message. In some instances, data are received with reliability information that helps the decoder figure out which parts of a transmission are likely errors.

    Researchers at MIT and elsewhere have developed a decoder chip that employs a new statistical model to use this reliability information in a way that is much simpler and faster than conventional techniques.

    Their chip uses a universal decoding algorithm the team previously developed, which can unravel any error correcting code. Typically, decoding hardware can only process one particular type of code. This new, universal decoder chip has broken the record for energy-efficient decoding, performing between 10 and 100 times better than other hardware.

    This advance could enable mobile devices with fewer chips, since they would no longer need separate hardware for multiple codes. This would reduce the amount of material needed for fabrication, cutting costs and improving sustainability. By making the decoding process less energy intensive, the chip could also improve device performance and lengthen battery life. It could be especially useful for demanding applications like augmented and virtual reality and 5G networks.

    “This is the first time anyone has broken below the 1 picojoule-per-bit barrier for decoding. That is roughly the same amount of energy you need to transmit a bit inside the system. It had been a big symbolic threshold, but it also changes the balance in the receiver of what might be the most pressing part from an energy perspective — we can move that away from the decoder to other elements,” says Muriel Médard, the School of Science NEC Professor of Software Science and Engineering, a professor in the Department of Electrical Engineering and Computer Science, and a co-author of a paper presenting the new chip.

    Médard’s co-authors include lead author Arslan Riaz, a graduate student at Boston University (BU); Rabia Tugce Yazicigil, assistant professor of electrical and computer engineering at BU; and Ken R. Duffy, then director of the Hamilton Institute at Maynooth University and now a professor at Northeastern University, as well as others from MIT, BU, and Maynooth University. The work is being presented at the International Solid-States Circuits Conference.

    Smarter sorting

    Digital data are transmitted over a network in the form of bits (0s and 1s). A sender encodes data by adding an error-correcting code, which is a redundant string of 0s and 1s that can be viewed as a hash. Information about this hash is held in a specific code book. A decoding algorithm at the receiver, designed for this particular code, uses its code book and the hash structure to retrieve the original information, which may have been jumbled by noise. Since each algorithm is code-specific, and most require dedicated hardware, a device would need many chips to decode different codes.

    The researchers previously demonstrated GRAND (Guessing Random Additive Noise Decoding), a universal decoding algorithm that can crack any code. GRAND works by guessing the noise that affected the transmission, subtracting that noise pattern from the received data, and then checking what remains in a code book. It guesses a series of noise patterns in the order they are likely to occur.

    Data are often received with reliability information, also called soft information, that helps a decoder figure out which pieces are errors. The new decoding chip, called ORBGRAND (Ordered Reliability Bits GRAND), uses this reliability information to sort data based on how likely each bit is to be an error.

    But it isn’t as simple as ordering single bits. While the most unreliable bit might be the likeliest error, perhaps the third and fourth most unreliable bits together are as likely to be an error as the seventh-most unreliable bit. ORBGRAND uses a new statistical model that can sort bits in this fashion, considering that multiple bits together are as likely to be an error as some single bits.

    “If your car isn’t working, soft information might tell you that it is probably the battery. But if it isn’t the battery alone, maybe it is the battery and the alternator together that are causing the problem. This is how a rational person would troubleshoot — you’d say that it could actually be these two things together before going down the list to something that is much less likely,” Médard says.

    This is a much more efficient approach than traditional decoders, which would instead look at the code structure and have a performance that is generally designed for the worst-case.

    “With a traditional decoder, you’d pull out the blueprint of the car and examine each and every piece. You’ll find the problem, but it will take you a long time and you’ll get very frustrated,” Médard explains.

    ORBGRAND stops sorting as soon as a code word is found, which is often very soon. The chip also employs parallelization, generating and testing multiple noise patterns simultaneously so it finds the code word faster. Because the decoder stops working once it finds the code word, its energy consumption stays low even though it runs multiple processes simultaneously.

    Record-breaking efficiency

    When they compared their approach to other chips, ORBGRAND decoded with maximum accuracy while consuming only 0.76 picojoules of energy per bit, breaking the previous performance record. ORBGRAND consumes between 10 and 100 times less energy than other devices.

    One of the biggest challenges of developing the new chip came from this reduced energy consumption, Médard says. With ORBGRAND, generating noise sequences is now so energy-efficient that other processes the researchers hadn’t focused on before, like checking the code word in a code book, consume most of the effort.

    “Now, this checking process, which is like turning on the car to see if it works, is the hardest part. So, we need to find more efficient ways to do that,” she says.

    The team is also exploring ways to change the modulation of transmissions so they can take advantage of the improved efficiency of the ORBGRAND chip. They also plan to see how their technique could be utilized to more efficiently manage multiple transmissions that overlap.

    The research is funded, in part, by the U.S. Defense Advanced Research Projects Agency (DARPA) and Science Foundation Ireland. More

  • in

    A new way for quantum computing systems to keep their cool

    Heat causes errors in the qubits that are the building blocks of a quantum computer, so quantum systems are typically kept inside refrigerators that keep the temperature just above absolute zero (-459 degrees Fahrenheit).

    But quantum computers need to communicate with electronics outside the refrigerator, in a room-temperature environment. The metal cables that connect these electronics bring heat into the refrigerator, which has to work even harder and draw extra power to keep the system cold. Plus, more qubits require more cables, so the size of a quantum system is limited by how much heat the fridge can remove.

    To overcome this challenge, an interdisciplinary team of MIT researchers has developed a wireless communication system that enables a quantum computer to send and receive data to and from electronics outside the refrigerator using high-speed terahertz waves.

    A transceiver chip placed inside the fridge can receive and transmit data. Terahertz waves generated outside the refrigerator are beamed in through a glass window. Data encoded onto these waves can be received by the chip. That chip also acts as a mirror, delivering data from the qubits on the terahertz waves it reflects to their source.

    This reflection process also bounces back much of the power sent into the fridge, so the process generates only a minimal amount of heat. The contactless communication system consumes up to 10 times less power than systems with metal cables.

    “By having this reflection mode, you really save the power consumption inside the fridge and leave all those dirty jobs on the outside. While this is still just a preliminary prototype and we have some room to improve, even at this point, we have shown low power consumption inside the fridge that is already better than metallic cables. I believe this could be a way to build largescale quantum systems,” says senior author Ruonan Han, an associate professor in the Department of Electrical Engineering and Computer Sciences (EECS) who leads the Terahertz Integrated Electronics Group.

    Han and his team, with expertise in terahertz waves and electronic devices, joined forces with associate professor Dirk Englund and the Quantum Photonics Laboratory team, who provided quantum engineering expertise and joined in conducting the cryogenic experiments.

    Joining Han and Englund on the paper are first author and EECS graduate student Jinchen Wang; Mohamed Ibrahim PhD ’21; Isaac Harris, a graduate student in the Quantum Photonics Laboratory; Nathan M. Monroe PhD ’22; Wasiq Khan PhD ’22; and Xiang Yi, a former postdoc who is now a professor at the South China University of Technology. The paper will be presented at the International Solid-States Circuits Conference.

    Tiny mirrors

    The researchers’ square transceiver chip, measuring about 2 millimeters on each side, is placed on a quantum computer inside the refrigerator, which is called a cryostat because it maintains cryogenic temperatures. These super-cold temperatures don’t damage the chip; in fact, they enable it to run more efficiently than it would at room temperature.

    The chip sends and receives data from a terahertz wave source outside the cryostat using a passive communication process known as backscatter, which involves reflections. An array of antennas on top of the chip, each of which is only about 200 micrometers in size, act as tiny mirrors. These mirrors can be “turned on” to reflect waves or “turned off.”

    The terahertz wave generation source encodes data onto the waves it sends into the cryostat, and the antennas in their “off” state can receive those waves and the data they carry.

    When the tiny mirrors are turned on, they can be set so they either reflect a wave in its current form or invert its phase before bouncing it back. If the reflected wave has the same phase, that represents a 0, but if the phase is inverted, that represents a 1. Electronics outside the cryostat can interpret those binary signals to decode the data.

    “This backscatter technology is not new. For instance, RFIDs are based on backscatter communication. We borrow that idea and bring it into this very unique scenario, and I think this leads to a good combination of all these technologies,” Han says.

    Terahertz advantages

    The data are transmitted using high-speed terahertz waves, which are located on the electromagnetic spectrum between radio waves and infrared light.

    Because terahertz waves are much smaller than radio waves, the chip and its antennas can be smaller, too, which would make the device easier to manufacture at scale. Terahertz waves also have higher frequencies than radio waves, so they can transmit data much faster and move larger amounts of information.

    But because terahertz waves have lower frequencies than the light waves used in photonic systems, the terahertz waves carry less quantum noise, which leads to less interference with quantum processors.

    Importantly, the transceiver chip and terahertz link can be fully constructed with standard fabrication processes on a CMOS chip, so they can be integrated into many current systems and techniques.

    “CMOS compatibility is important. For example, one terahertz link could deliver a large amount of data and feed it to another cryo-CMOS controller, which can split the signal to control multiple qubits simultaneously, so we can reduce the quantity of RF cables dramatically. This is very promising.” Wang says.

    The researchers were able to transmit data at 4 gigabits per second with their prototype, but Han says the sky is nearly the limit when it comes to boosting that speed. The downlink of the contactless system posed about 10 times less heat load than a system with metallic cables, and the temperature of the cryostat fluctuated up to a few millidegrees during experiments.

    Now that the researchers have demonstrated this wireless technology, they want to improve the system’s speed and efficiency using special terahertz fibers, which are only a few hundred micrometers wide. Han’s group has shown that these plastic wires can transmit data at a rate of 100 gigabits per second and have much better thermal insulation than fatter, metal cables.

    The researchers also want to refine the design of their transceiver to improve scalability and continue boosting its energy efficiency. Generating terahertz waves requires a lot of power, but Han’s group is studying more efficient methods that utilize low-cost chips. Incorporating this technology into the system could make the device more cost-effective.

    The transceiver chip was fabricated through the Intel University Shuttle Program. More

  • in

    New chip for mobile devices knocks out unwanted signals

    Imagine sitting in a packed stadium for a pivotal football game — tens of thousands of people are using mobile phones at the same time, perhaps video chatting with friends or posting photos on social media. The radio frequency signals being sent and received by all these devices could cause interference, which slows device performance and drains batteries.

    Designing devices that can efficiently block unwanted signals is no easy task, especially as 5G networks become more universal and future generations of wireless communication systems are developed. Conventional techniques utilize many filters to block a range of signals, but filters are bulky, expensive, and drive up production costs.

    MIT researchers have developed a circuit architecture that targets and blocks unwanted signals at a receiver’s input without hurting its performance. They borrowed a technique from digital signal processing and used a few tricks that enable it to work effectively in a radio frequency system across a wide frequency range.

    Their receiver blocked even high-power unwanted signals without introducing more noise, or inaccuracies, into the signal processing operations. The chip, which performed about 40 times better than other wideband receivers at blocking a special type of interference, does not require any additional hardware or circuitry. This would make the chip easier to manufacture at scale.

    “We are interested in developing electronic circuits and systems that meet the demands of 5G and future generations of wireless communication systems. In designing our circuits, we look for inspirations from other domains, such as digital signal processing and applied electromagnetics. We believe in circuit elegance and simplicity and try to come up with multifunctional hardware that doesn’t require additional power and chip area,” says senior author Negar Reiskarimian, the X-Window Consortium Career Development Assistant Professor in the Department of Electrical Engineering and Computer Science (EECS) and a core faculty member of the Microsystems Technology Laboratories.

    Reiskarimian wrote the paper with EECS graduate students Soroush Araei, who is the lead author, and Shahabeddin Mohin. The work is being presented at the International Solid-States Circuits Conference.

    Harmonic interference

    The researchers developed the receiver chip using what is known as a mixer-first architecture. This means that when a radio frequency signal is received by the device, it is immediately converted to a lower-frequency signal before being passed on to the analog-to-digital converter to extract the digital bits that it is carrying. This approach enables the radio to cover a wide frequency range while filtering out interference located close to the operation frequency.

    While effective, mixer-first receivers are susceptible to a particular kind of interference known as harmonic interference. Harmonic interference comes from signals that have frequencies which are multiples of a device’s operating frequency. For instance, if a device operates at 1 gigahertz, then signals at 2 gigahertz, 3 gigahertz, 5 gigahertz, etc., will cause harmonic interference. These harmonics can be indistinguishable from the original signal during the frequency conversion process.    

    “A lot of other wideband receivers don’t do anything about the harmonics until it is time to see what the bits mean. They do it later in the chain, but this doesn’t work well if you have high-power signals at the harmonic frequencies. Instead, we want to remove harmonics as soon as possible to avoid losing information,” Araei says.

    To do this, the researchers were inspired by a concept from digital signal processing known as block digital filtering. They adapted this technique to the analog domain using capacitors, which hold electric charges. The capacitors are charged up at different times as the signal is received, then they are switched off so that charge can be held and used later for processing the data.  

    These capacitors can be connected to each other in various ways, including connecting them in parallel, which enables the capacitors to exchange the stored charges. While this technique can target harmonic interference, the process results in significant signal loss. Stacking capacitors is another possibility, but this method alone is not enough to provide harmonic resilience.

    Most radio receivers already use switched-capacitor circuits to perform frequency conversion. This frequency conversion circuitry can be combined with block filtering to target harmonic interference.

    A precise arrangement

    The researchers found that arranging capacitors in a specific layout, by connecting some of them in series and then performing charge sharing, enabled the device to block harmonic interference without losing any information.

    “People have used these techniques, charge sharing and capacitor stacking, separately before, but never together. We found that both techniques must be done simultaneously to get this benefit. Moreover, we have found out how to do this in a passive way within the mixer without using any additional hardware while maintaining signal integrity and keeping the costs down,” he says.

    They tested the device by simultaneously sending a desired signal and harmonic interference. Their chip was able to block harmonic signals effectively with only a slight reduction in signal strength. It was able to handle signals that were 40 times more powerful than previous, state-of-the-art wideband receivers. More