More stories

  • in

    Gaining real-world industry experience through Break Through Tech AI at MIT

    Taking what they learned conceptually about artificial intelligence and machine learning (ML) this year, students from across the Greater Boston area had the opportunity to apply their new skills to real-world industry projects as part of an experiential learning opportunity offered through Break Through Tech AI at MIT.

    Hosted by the MIT Schwarzman College of Computing, Break Through Tech AI is a pilot program that aims to bridge the talent gap for women and underrepresented genders in computing fields by providing skills-based training, industry-relevant portfolios, and mentoring to undergraduate students in regional metropolitan areas in order to position them more competitively for careers in data science, machine learning, and artificial intelligence.

    “Programs like Break Through Tech AI gives us opportunities to connect with other students and other institutions, and allows us to bring MIT’s values of diversity, equity, and inclusion to the learning and application in the spaces that we hold,” says Alana Anderson, assistant dean of diversity, equity, and inclusion for the MIT Schwarzman College of Computing.

    The inaugural cohort of 33 undergraduates from 18 Greater Boston-area schools, including Salem State University, Smith College, and Brandeis University, began the free, 18-month program last summer with an eight-week, online skills-based course to learn the basics of AI and machine learning. Students then split into small groups in the fall to collaborate on six machine learning challenge projects presented to them by MathWorks, MIT-IBM Watson AI Lab, and Replicate. The students dedicated five hours or more each week to meet with their teams, teaching assistants, and project advisors, including convening once a month at MIT, while juggling their regular academic course load with other daily activities and responsibilities.

    The challenges gave the undergraduates the chance to help contribute to actual projects that industry organizations are working on and to put their machine learning skills to the test. Members from each organization also served as project advisors, providing encouragement and guidance to the teams throughout.

    “Students are gaining industry experience by working closely with their project advisors,” says Aude Oliva, director of strategic industry engagement at the MIT Schwarzman College of Computing and the MIT director of the MIT-IBM Watson AI Lab. “These projects will be an add-on to their machine learning portfolio that they can share as a work example when they’re ready to apply for a job in AI.”

    Over the course of 15 weeks, teams delved into large-scale, real-world datasets to train, test, and evaluate machine learning models in a variety of contexts.

    In December, the students celebrated the fruits of their labor at a showcase event held at MIT in which the six teams gave final presentations on their AI projects. The projects not only allowed the students to build up their AI and machine learning experience, it helped to “improve their knowledge base and skills in presenting their work to both technical and nontechnical audiences,” Oliva says.

    For a project on traffic data analysis, students got trained on MATLAB, a programming and numeric computing platform developed by MathWorks, to create a model that enables decision-making in autonomous driving by predicting future vehicle trajectories. “It’s important to realize that AI is not that intelligent. It’s only as smart as you make it and that’s exactly what we tried to do,” said Brandeis University student Srishti Nautiyal as she introduced her team’s project to the audience. With companies already making autonomous vehicles from planes to trucks a reality, Nautiyal, a physics and mathematics major, shared that her team was also highly motivated to consider the ethical issues of the technology in their model for the safety of passengers, drivers, and pedestrians.

    Using census data to train a model can be tricky because they are often messy and full of holes. In a project on algorithmic fairness for the MIT-IBM Watson AI Lab, the hardest task for the team was having to clean up mountains of unorganized data in a way where they could still gain insights from them. The project — which aimed to create demonstration of fairness applied on a real dataset to evaluate and compare effectiveness of different fairness interventions and fair metric learning techniques — could eventually serve as an educational resource for data scientists interested in learning about fairness in AI and using it in their work, as well as to promote the practice of evaluating the ethical implications of machine learning models in industry.

    Other challenge projects included an ML-assisted whiteboard for nontechnical people to interact with ready-made machine learning models, and a sign language recognition model to help disabled people communicate with others. A team that worked on a visual language app set out to include over 50 languages in their model to increase access for the millions of people that are visually impaired throughout the world. According to the team, similar apps on the market currently only offer up to 23 languages. 

    Throughout the semester, students persisted and demonstrated grit in order to cross the finish line on their projects. With the final presentations marking the conclusion of the fall semester, students will return to MIT in the spring to continue their Break Through Tech AI journey to tackle another round of AI projects. This time, the students will work with Google on new machine learning challenges that will enable them to hone their AI skills even further with an eye toward launching a successful career in AI. More

  • in

    Unpacking the “black box” to build better AI models

    When deep learning models are deployed in the real world, perhaps to detect financial fraud from credit card activity or identify cancer in medical images, they are often able to outperform humans.

    But what exactly are these deep learning models learning? Does a model trained to spot skin cancer in clinical images, for example, actually learn the colors and textures of cancerous tissue, or is it flagging some other features or patterns?

    These powerful machine-learning models are typically based on artificial neural networks that can have millions of nodes that process data to make predictions. Due to their complexity, researchers often call these models “black boxes” because even the scientists who build them don’t understand everything that is going on under the hood.

    Stefanie Jegelka isn’t satisfied with that “black box” explanation. A newly tenured associate professor in the MIT Department of Electrical Engineering and Computer Science, Jegelka is digging deep into deep learning to understand what these models can learn and how they behave, and how to build certain prior information into these models.

    “At the end of the day, what a deep-learning model will learn depends on so many factors. But building an understanding that is relevant in practice will help us design better models, and also help us understand what is going on inside them so we know when we can deploy a model and when we can’t. That is critically important,” says Jegelka, who is also a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Institute for Data, Systems, and Society (IDSS).

    Jegelka is particularly interested in optimizing machine-learning models when input data are in the form of graphs. Graph data pose specific challenges: For instance, information in the data consists of both information about individual nodes and edges, as well as the structure — what is connected to what. In addition, graphs have mathematical symmetries that need to be respected by the machine-learning model so that, for instance, the same graph always leads to the same prediction. Building such symmetries into a machine-learning model is usually not easy.

    Take molecules, for instance. Molecules can be represented as graphs, with vertices that correspond to atoms and edges that correspond to chemical bonds between them. Drug companies may want to use deep learning to rapidly predict the properties of many molecules, narrowing down the number they must physically test in the lab.

    Jegelka studies methods to build mathematical machine-learning models that can effectively take graph data as an input and output something else, in this case a prediction of a molecule’s chemical properties. This is particularly challenging since a molecule’s properties are determined not only by the atoms within it, but also by the connections between them.  

    Other examples of machine learning on graphs include traffic routing, chip design, and recommender systems.

    Designing these models is made even more difficult by the fact that data used to train them are often different from data the models see in practice. Perhaps the model was trained using small molecular graphs or traffic networks, but the graphs it sees once deployed are larger or more complex.

    In this case, what can researchers expect this model to learn, and will it still work in practice if the real-world data are different?

    “Your model is not going to be able to learn everything because of some hardness problems in computer science, but what you can learn and what you can’t learn depends on how you set the model up,” Jegelka says.

    She approaches this question by combining her passion for algorithms and discrete mathematics with her excitement for machine learning.

    From butterflies to bioinformatics

    Jegelka grew up in a small town in Germany and became interested in science when she was a high school student; a supportive teacher encouraged her to participate in an international science competition. She and her teammates from the U.S. and Singapore won an award for a website they created about butterflies, in three languages.

    “For our project, we took images of wings with a scanning electron microscope at a local university of applied sciences. I also got the opportunity to use a high-speed camera at Mercedes Benz — this camera usually filmed combustion engines — which I used to capture a slow-motion video of the movement of a butterfly’s wings. That was the first time I really got in touch with science and exploration,” she recalls.

    Intrigued by both biology and mathematics, Jegelka decided to study bioinformatics at the University of Tübingen and the University of Texas at Austin. She had a few opportunities to conduct research as an undergraduate, including an internship in computational neuroscience at Georgetown University, but wasn’t sure what career to follow.

    When she returned for her final year of college, Jegelka moved in with two roommates who were working as research assistants at the Max Planck Institute in Tübingen.

    “They were working on machine learning, and that sounded really cool to me. I had to write my bachelor’s thesis, so I asked at the institute if they had a project for me. I started working on machine learning at the Max Planck Institute and I loved it. I learned so much there, and it was a great place for research,” she says.

    She stayed on at the Max Planck Institute to complete a master’s thesis, and then embarked on a PhD in machine learning at the Max Planck Institute and the Swiss Federal Institute of Technology.

    During her PhD, she explored how concepts from discrete mathematics can help improve machine-learning techniques.

    Teaching models to learn

    The more Jegelka learned about machine learning, the more intrigued she became by the challenges of understanding how models behave, and how to steer this behavior.

    “You can do so much with machine learning, but only if you have the right model and data. It is not just a black-box thing where you throw it at the data and it works. You actually have to think about it, its properties, and what you want the model to learn and do,” she says.

    After completing a postdoc at the University of California at Berkeley, Jegelka was hooked on research and decided to pursue a career in academia. She joined the faculty at MIT in 2015 as an assistant professor.

    “What I really loved about MIT, from the very beginning, was that the people really care deeply about research and creativity. That is what I appreciate the most about MIT. The people here really value originality and depth in research,” she says.

    That focus on creativity has enabled Jegelka to explore a broad range of topics.

    In collaboration with other faculty at MIT, she studies machine-learning applications in biology, imaging, computer vision, and materials science.

    But what really drives Jegelka is probing the fundamentals of machine learning, and most recently, the issue of robustness. Often, a model performs well on training data, but its performance deteriorates when it is deployed on slightly different data. Building prior knowledge into a model can make it more reliable, but understanding what information the model needs to be successful and how to build it in is not so simple, she says.

    She is also exploring methods to improve the performance of machine-learning models for image classification.

    Image classification models are everywhere, from the facial recognition systems on mobile phones to tools that identify fake accounts on social media. These models need massive amounts of data for training, but since it is expensive for humans to hand-label millions of images, researchers often use unlabeled datasets to pretrain models instead.

    These models then reuse the representations they have learned when they are fine-tuned later for a specific task.

    Ideally, researchers want the model to learn as much as it can during pretraining, so it can apply that knowledge to its downstream task. But in practice, these models often learn only a few simple correlations — like that one image has sunshine and one has shade — and use these “shortcuts” to classify images.

    “We showed that this is a problem in ‘contrastive learning,’ which is a standard technique for pre-training, both theoretically and empirically. But we also show that you can influence the kinds of information the model will learn to represent by modifying the types of data you show the model. This is one step toward understanding what models are actually going to do in practice,” she says.

    Researchers still don’t understand everything that goes on inside a deep-learning model, or details about how they can influence what a model learns and how it behaves, but Jegelka looks forward to continue exploring these topics.

    “Often in machine learning, we see something happen in practice and we try to understand it theoretically. This is a huge challenge. You want to build an understanding that matches what you see in practice, so that you can do better. We are still just at the beginning of understanding this,” she says.

    Outside the lab, Jegelka is a fan of music, art, traveling, and cycling. But these days, she enjoys spending most of her free time with her preschool-aged daughter. More

  • in

    Simulating discrimination in virtual reality

    Have you ever been advised to “walk a mile in someone else’s shoes?” Considering another person’s perspective can be a challenging endeavor — but recognizing our errors and biases is key to building understanding across communities. By challenging our preconceptions, we confront prejudice, such as racism and xenophobia, and potentially develop a more inclusive perspective about others.

    To assist with perspective-taking, MIT researchers have developed “On the Plane,” a virtual reality role-playing game (VR RPG) that simulates discrimination. In this case, the game portrays xenophobia directed against a Malaysian America woman, but the approach can be generalized. Situated on an airplane, players can take on the role of characters from different backgrounds, engaging in dialogue with others while making in-game choices to a series of prompts. In turn, players’ decisions control the outcome of a tense conversation between the characters about cultural differences.

    As a VR RPG, “On the Plane” encourages players to take on new roles that may be outside of their personal experiences in the first person, allowing them to confront in-group/out-group bias by incorporating new perspectives into their understanding of different cultures. Players engage with three characters: Sarah, a first-generation Muslim American of Malaysian ancestry who wears a hijab; Marianne, a white woman from the Midwest with little exposure to other cultures and customs; or a flight attendant. Sarah represents the out group, Marianne is a member of the in group, and the flight staffer is a bystander witnessing an exchange between the two passengers.“This project is part of our efforts to harness the power of virtual reality and artificial intelligence to address social ills, such as discrimination and xenophobia,” says Caglar Yildirim, an MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) research scientist who is a co-author and co-game designer on the project. “Through the exchange between the two passengers, players experience how one passenger’s xenophobia manifests itself and how it affects the other passenger. The simulation engages players in critical reflection and seeks to foster empathy for the passenger who was ‘othered’ due to her outfit being not so ‘prototypical’ of what an American should look like.”

    Yildirim worked alongside the project’s principal investigator, D. Fox Harrell, MIT professor of digital media and AI at CSAIL, the Program in Comparative Media Studies/Writing (CMS), and the Institute for Data, Systems, and Society (IDSS) and founding director of the MIT Center for Advanced Virtuality. “It is not possible for a simulation to give someone the life experiences of another person, but while you cannot ‘walk in someone else’s shoes’ in that sense, a system like this can help people recognize and understand the social patterns at work when it comes to issue like bias,” says Harrell, who is also co-author and designer on this project. “An engaging, immersive, interactive narrative can also impact people emotionally, opening the door for users’ perspectives to be transformed and broadened.” This simulation also utilizes an interactive narrative engine that creates several options for responses to in-game interactions based on a model of how people are categorized socially. The tool grants players a chance to alter their standing in the simulation through their reply choices to each prompt, affecting their affinity toward the other two characters. For example, if you play as the flight attendant, you can react to Marianne’s xenophobic expressions and attitudes toward Sarah, changing your affinities. The engine will then provide you with a different set of narrative events based on your changes in standing with others.

    To animate each avatar, “On the Plane” incorporates artificial intelligence knowledge representation techniques controlled by probabilistic finite state machines, a tool commonly used in machine learning systems for pattern recognition. With the help of these machines, characters’ body language and gestures are customizable: if you play as Marianne, the game will customize her mannerisms toward Sarah based on user inputs, impacting how comfortable she appears in front of a member of a perceived out group. Similarly, players can do the same from Sarah or the flight attendant’s point of view.In a 2018 paper based on work done in a collaboration between MIT CSAIL and the Qatar Computing Research Institute, Harrell and co-author Sercan Şengün advocated for virtual system designers to be more inclusive of Middle Eastern identities and customs. They claimed that if designers allowed users to customize virtual avatars more representative of their background, it might empower players to engage in a more supportive experience. Four years later, “On the Plane” accomplishes a similar goal, incorporating a Muslim’s perspective into an immersive environment.

    “Many virtual identity systems, such as avatars, accounts, profiles, and player characters, are not designed to serve the needs of people across diverse cultures. We have used statistical and AI methods in conjunction with qualitative approaches to learn where the gaps are,” they note. “Our project helps engender perspective transformation so that people will treat each other with respect and enhanced understanding across diverse cultural avatar representations.”

    Harrell and Yildirim’s work is part of the MIT IDSS’s Initiative on Combatting Systemic Racism (ICSR). Harrell is on the initiative’s steering committee and is the leader of the newly forming Antiracism, Games, and Immersive Media vertical, who study behavior, cognition, social phenomena, and computational systems related to race and racism in video games and immersive experiences.

    The researchers’ latest project is part of the ICSR’s broader goal to launch and coordinate cross-disciplinary research that addresses racially discriminatory processes across American institutions. Using big data, members of the research initiative develop and employ computing tools that drive racial equity. Yildirim and Harrell accomplish this goal by depicting a frequent, problematic scenario that illustrates how bias creeps into our everyday lives.“In a post-9/11 world, Muslims often experience ethnic profiling in American airports. ‘On the Plane’ builds off of that type of in-group favoritism, a well-established finding in psychology,” says MIT Professor Fotini Christia, director of the Sociotechnical Systems Research Center (SSRC) and associate director or IDSS. “This game also takes a novel approach to analyzing hardwired bias by utilizing VR instead of field experiments to simulate prejudice. Excitingly, this research demonstrates that VR can be used as a tool to help us better measure bias, combating systemic racism and other forms of discrimination.”“On the Plane” was developed on the Unity game engine using the XR Interaction Toolkit and Harrell’s Chimeria platform for authoring interactive narratives that involve social categorization. The game will be deployed for research studies later this year on both desktop computers and the standalone, wireless Meta Quest headsets. A paper on the work was presented in December at the 2022 IEEE International Conference on Artificial Intelligence and Virtual Reality. More

  • in

    Subtle biases in AI can influence emergency decisions

    It’s no secret that people harbor biases — some unconscious, perhaps, and others painfully overt. The average person might suppose that computers — machines typically made of plastic, steel, glass, silicon, and various metals — are free of prejudice. While that assumption may hold for computer hardware, the same is not always true for computer software, which is programmed by fallible humans and can be fed data that is, itself, compromised in certain respects.

    Artificial intelligence (AI) systems — those based on machine learning, in particular — are seeing increased use in medicine for diagnosing specific diseases, for example, or evaluating X-rays. These systems are also being relied on to support decision-making in other areas of health care. Recent research has shown, however, that machine learning models can encode biases against minority subgroups, and the recommendations they make may consequently reflect those same biases.

    A new study by researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Jameel Clinic, which was published last month in Communications Medicine, assesses the impact that discriminatory AI models can have, especially for systems that are intended to provide advice in urgent situations. “We found that the manner in which the advice is framed can have significant repercussions,” explains the paper’s lead author, Hammaad Adam, a PhD student at MIT’s Institute for Data Systems and Society. “Fortunately, the harm caused by biased models can be limited (though not necessarily eliminated) when the advice is presented in a different way.” The other co-authors of the paper are Aparna Balagopalan and Emily Alsentzer, both PhD students, and the professors Fotini Christia and Marzyeh Ghassemi.

    AI models used in medicine can suffer from inaccuracies and inconsistencies, in part because the data used to train the models are often not representative of real-world settings. Different kinds of X-ray machines, for instance, can record things differently and hence yield different results. Models trained predominately on white people, moreover, may not be as accurate when applied to other groups. The Communications Medicine paper is not focused on issues of that sort but instead addresses problems that stem from biases and on ways to mitigate the adverse consequences.

    A group of 954 people (438 clinicians and 516 nonexperts) took part in an experiment to see how AI biases can affect decision-making. The participants were presented with call summaries from a fictitious crisis hotline, each involving a male individual undergoing a mental health emergency. The summaries contained information as to whether the individual was Caucasian or African American and would also mention his religion if he happened to be Muslim. A typical call summary might describe a circumstance in which an African American man was found at home in a delirious state, indicating that “he has not consumed any drugs or alcohol, as he is a practicing Muslim.” Study participants were instructed to call the police if they thought the patient was likely to turn violent; otherwise, they were encouraged to seek medical help.

    The participants were randomly divided into a control or “baseline” group plus four other groups designed to test responses under slightly different conditions. “We want to understand how biased models can influence decisions, but we first need to understand how human biases can affect the decision-making process,” Adam notes. What they found in their analysis of the baseline group was rather surprising: “In the setting we considered, human participants did not exhibit any biases. That doesn’t mean that humans are not biased, but the way we conveyed information about a person’s race and religion, evidently, was not strong enough to elicit their biases.”

    The other four groups in the experiment were given advice that either came from a biased or unbiased model, and that advice was presented in either a “prescriptive” or a “descriptive” form. A biased model would be more likely to recommend police help in a situation involving an African American or Muslim person than would an unbiased model. Participants in the study, however, did not know which kind of model their advice came from, or even that models delivering the advice could be biased at all. Prescriptive advice spells out what a participant should do in unambiguous terms, telling them they should call the police in one instance or seek medical help in another. Descriptive advice is less direct: A flag is displayed to show that the AI system perceives a risk of violence associated with a particular call; no flag is shown if the threat of violence is deemed small.  

    A key takeaway of the experiment is that participants “were highly influenced by prescriptive recommendations from a biased AI system,” the authors wrote. But they also found that “using descriptive rather than prescriptive recommendations allowed participants to retain their original, unbiased decision-making.” In other words, the bias incorporated within an AI model can be diminished by appropriately framing the advice that’s rendered. Why the different outcomes, depending on how advice is posed? When someone is told to do something, like call the police, that leaves little room for doubt, Adam explains. However, when the situation is merely described — classified with or without the presence of a flag — “that leaves room for a participant’s own interpretation; it allows them to be more flexible and consider the situation for themselves.”

    Second, the researchers found that the language models that are typically used to offer advice are easy to bias. Language models represent a class of machine learning systems that are trained on text, such as the entire contents of Wikipedia and other web material. When these models are “fine-tuned” by relying on a much smaller subset of data for training purposes — just 2,000 sentences, as opposed to 8 million web pages — the resultant models can be readily biased.  

    Third, the MIT team discovered that decision-makers who are themselves unbiased can still be misled by the recommendations provided by biased models. Medical training (or the lack thereof) did not change responses in a discernible way. “Clinicians were influenced by biased models as much as non-experts were,” the authors stated.

    “These findings could be applicable to other settings,” Adam says, and are not necessarily restricted to health care situations. When it comes to deciding which people should receive a job interview, a biased model could be more likely to turn down Black applicants. The results could be different, however, if instead of explicitly (and prescriptively) telling an employer to “reject this applicant,” a descriptive flag is attached to the file to indicate the applicant’s “possible lack of experience.”

    The implications of this work are broader than just figuring out how to deal with individuals in the midst of mental health crises, Adam maintains.  “Our ultimate goal is to make sure that machine learning models are used in a fair, safe, and robust way.” More

  • in

    Meet the 2022-23 Accenture Fellows

    Launched in October 2020, the MIT and Accenture Convergence Initiative for Industry and Technology underscores the ways in which industry and technology can collaborate to spur innovation. The five-year initiative aims to achieve its mission through research, education, and fellowships. To that end, Accenture has once again awarded five annual fellowships to MIT graduate students working on research in industry and technology convergence who are underrepresented, including by race, ethnicity, and gender.This year’s Accenture Fellows work across research areas including telemonitoring, human-computer interactions, operations research,  AI-mediated socialization, and chemical transformations. Their research covers a wide array of projects, including designing low-power processing hardware for telehealth applications; applying machine learning to streamline and improve business operations; improving mental health care through artificial intelligence; and using machine learning to understand the environmental and health consequences of complex chemical reactions.As part of the application process, student nominations were invited from each unit within the School of Engineering, as well as from the Institute’s four other schools and the MIT Schwarzman College of Computing. Five exceptional students were selected as fellows for the initiative’s third year.Drew Buzzell is a doctoral candidate in electrical engineering and computer science whose research concerns telemonitoring, a fast-growing sphere of telehealth in which information is collected through internet-of-things (IoT) connected devices and transmitted to the cloud. Currently, the high volume of information involved in telemonitoring — and the time and energy costs of processing it — make data analysis difficult. Buzzell’s work is focused on edge computing, a new computing architecture that seeks to address these challenges by managing data closer to the source, in a distributed network of IoT devices. Buzzell earned his BS in physics and engineering science and his MS in engineering science from the Pennsylvania State University.

    Mengying (Cathy) Fang is a master’s student in the MIT School of Architecture and Planning. Her research focuses on augmented reality and virtual reality platforms. Fang is developing novel sensors and machine components that combine computation, materials science, and engineering. Moving forward, she will explore topics including soft robotics techniques that could be integrated with clothes and wearable devices and haptic feedback in order to develop interactions with digital objects. Fang earned a BS in mechanical engineering and human-computer interaction from Carnegie Mellon University.

    Xiaoyue Gong is a doctoral candidate in operations research at the MIT Sloan School of Management. Her research aims to harness the power of machine learning and data science to reduce inefficiencies in the operation of businesses, organizations, and society. With the support of an Accenture Fellowship, Gong seeks to find solutions to operational problems by designing reinforcement learning methods and other machine learning techniques to embedded operational problems. Gong earned a BS in honors mathematics and interactive media arts from New York University.

    Ruby Liu is a doctoral candidate in medical engineering and medical physics. Their research addresses the growing pandemic of loneliness among older adults, which leads to poor health outcomes and presents particularly high risks for historically marginalized people, including members of the LGBTQ+ community and people of color. Liu is designing a network of interconnected AI agents that foster connections between user and agent, offering mental health care while strengthening and facilitating human-human connections. Liu received a BS in biomedical engineering from Johns Hopkins University.

    Joules Provenzano is a doctoral candidate in chemical engineering. Their work integrates machine learning and liquid chromatography-high resolution mass spectrometry (LC-HRMS) to improve our understanding of complex chemical reactions in the environment. As an Accenture Fellow, Provenzano will build upon recent advances in machine learning and LC-HRMS, including novel algorithms for processing real, experimental HR-MS data and new approaches in extracting structure-transformation rules and kinetics. Their research could speed the pace of discovery in the chemical sciences and benefits industries including oil and gas, pharmaceuticals, and agriculture. Provenzano earned a BS in chemical engineering and international and global studies from the Rochester Institute of Technology. More

  • in

    Large language models help decipher clinical notes

    Electronic health records (EHRs) need a new public relations manager. Ten years ago, the U.S. government passed a law that required hospitals to digitize their health records with the intent of improving and streamlining care. The enormous amount of information in these now-digital records could be used to answer very specific questions beyond the scope of clinical trials: What’s the right dose of this medication for patients with this height and weight? What about patients with a specific genomic profile?

    Unfortunately, most of the data that could answer these questions is trapped in doctor’s notes, full of jargon and abbreviations. These notes are hard for computers to understand using current techniques — extracting information requires training multiple machine learning models. Models trained for one hospital, also, don’t work well at others, and training each model requires domain experts to label lots of data, a time-consuming and expensive process. 

    An ideal system would use a single model that can extract many types of information, work well at multiple hospitals, and learn from a small amount of labeled data. But how? Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believed that to disentangle the data, they needed to call on something bigger: large language models. To pull that important medical information, they used a very big, GPT-3 style model to do tasks like expand overloaded jargon and acronyms and extract medication regimens. 

    For example, the system takes an input, which in this case is a clinical note, “prompts” the model with a question about the note, such as “expand this abbreviation, C-T-A.” The system returns an output such as “clear to auscultation,” as opposed to say, a CT angiography. The objective of extracting this clean data, the team says, is to eventually enable more personalized clinical recommendations. 

    Medical data is, understandably, a pretty tricky resource to navigate freely. There’s plenty of red tape around using public resources for testing the performance of large models because of data use restrictions, so the team decided to scrape together their own. Using a set of short, publicly available clinical snippets, they cobbled together a small dataset to enable evaluation of the extraction performance of large language models. 

    “It’s challenging to develop a single general-purpose clinical natural language processing system that will solve everyone’s needs and be robust to the huge variation seen across health datasets. As a result, until today, most clinical notes are not used in downstream analyses or for live decision support in electronic health records. These large language model approaches could potentially transform clinical natural language processing,” says David Sontag, MIT professor of electrical engineering and computer science, principal investigator in CSAIL and the Institute for Medical Engineering and Science, and supervising author on a paper about the work, which will be presented at the Conference on Empirical Methods in Natural Language Processing. “The research team’s advances in zero-shot clinical information extraction makes scaling possible. Even if you have hundreds of different use cases, no problem — you can build each model with a few minutes of work, versus having to label a ton of data for that particular task.”

    For example, without any labels at all, the researchers found these models could achieve 86 percent accuracy at expanding overloaded acronyms, and the team developed additional methods to boost this further to 90 percent accuracy, with still no labels required.

    Imprisoned in an EHR 

    Experts have been steadily building up large language models (LLMs) for quite some time, but they burst onto the mainstream with GPT-3’s widely covered ability to complete sentences. These LLMs are trained on a huge amount of text from the internet to finish sentences and predict the next most likely word. 

    While previous, smaller models like earlier GPT iterations or BERT have pulled off a good performance for extracting medical data, they still require substantial manual data-labeling effort. 

    For example, a note, “pt will dc vanco due to n/v” means that this patient (pt) was taking the antibiotic vancomycin (vanco) but experienced nausea and vomiting (n/v) severe enough for the care team to discontinue (dc) the medication. The team’s research avoids the status quo of training separate machine learning models for each task (extracting medication, side effects from the record, disambiguating common abbreviations, etc). In addition to expanding abbreviations, they investigated four other tasks, including if the models could parse clinical trials and extract detail-rich medication regimens.  

    “Prior work has shown that these models are sensitive to the prompt’s precise phrasing. Part of our technical contribution is a way to format the prompt so that the model gives you outputs in the correct format,” says Hunter Lang, CSAIL PhD student and author on the paper. “For these extraction problems, there are structured output spaces. The output space is not just a string. It can be a list. It can be a quote from the original input. So there’s more structure than just free text. Part of our research contribution is encouraging the model to give you an output with the correct structure. That significantly cuts down on post-processing time.”

    The approach can’t be applied to out-of-the-box health data at a hospital: that requires sending private patient information across the open internet to an LLM provider like OpenAI. The authors showed that it’s possible to work around this by distilling the model into a smaller one that could be used on-site.

    The model — sometimes just like humans — is not always beholden to the truth. Here’s what a potential problem might look like: Let’s say you’re asking the reason why someone took medication. Without proper guardrails and checks, the model might just output the most common reason for that medication, if nothing is explicitly mentioned in the note. This led to the team’s efforts to force the model to extract more quotes from data and less free text.

    Future work for the team includes extending to languages other than English, creating additional methods for quantifying uncertainty in the model, and pulling off similar results with open-sourced models. 

    “Clinical information buried in unstructured clinical notes has unique challenges compared to general domain text mostly due to large use of acronyms, and inconsistent textual patterns used across different health care facilities,” says Sadid Hasan, AI lead at Microsoft and former executive director of AI at CVS Health, who was not involved in the research. “To this end, this work sets forth an interesting paradigm of leveraging the power of general domain large language models for several important zero-/few-shot clinical NLP tasks. Specifically, the proposed guided prompt design of LLMs to generate more structured outputs could lead to further developing smaller deployable models by iteratively utilizing the model generated pseudo-labels.”

    “AI has accelerated in the last five years to the point at which these large models can predict contextualized recommendations with benefits rippling out across a variety of domains such as suggesting novel drug formulations, understanding unstructured text, code recommendations or create works of art inspired by any number of human artists or styles,” says Parminder Bhatia, who was formerly Head of Machine Learning at AWS Health AI and is currently Head of ML for low-code applications leveraging large language models at AWS AI Labs. “One of the applications of these large models [the team has] recently launched is Amazon CodeWhisperer, which is [an] ML-powered coding companion that helps developers in building applications.”

    As part of the MIT Abdul Latif Jameel Clinic for Machine Learning in Health, Agrawal, Sontag, and Lang wrote the paper alongside Yoon Kim, MIT assistant professor and CSAIL principal investigator, and Stefan Hegselmann, a visiting PhD student from the University of Muenster. First-author Agrawal’s research was supported by a Takeda Fellowship, the MIT Deshpande Center for Technological Innovation, and the MLA@CSAIL Initiatives. More

  • in

    Busy GPUs: Sampling and pipelining method speeds up deep learning on large graphs

    Graphs, a potentially extensive web of nodes connected by edges, can be used to express and interrogate relationships between data, like social connections, financial transactions, traffic, energy grids, and molecular interactions. As researchers collect more data and build out these graphical pictures, researchers will need faster and more efficient methods, as well as more computational power, to conduct deep learning on them, in the way of graph neural networks (GNN).  

    Now, a new method, called SALIENT (SAmpling, sLIcing, and data movemeNT), developed by researchers at MIT and IBM Research, improves the training and inference performance by addressing three key bottlenecks in computation. This dramatically cuts down on the runtime of GNNs on large datasets, which, for example, contain on the scale of 100 million nodes and 1 billion edges. Further, the team found that the technique scales well when computational power is added from one to 16 graphical processing units (GPUs). The work was presented at the Fifth Conference on Machine Learning and Systems.

    “We started to look at the challenges current systems experienced when scaling state-of-the-art machine learning techniques for graphs to really big datasets. It turned out there was a lot of work to be done, because a lot of the existing systems were achieving good performance primarily on smaller datasets that fit into GPU memory,” says Tim Kaler, the lead author and a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

    By vast datasets, experts mean scales like the entire Bitcoin network, where certain patterns and data relationships could spell out trends or foul play. “There are nearly a billion Bitcoin transactions on the blockchain, and if we want to identify illicit activities inside such a joint network, then we are facing a graph of such a scale,” says co-author Jie Chen, senior research scientist and manager of IBM Research and the MIT-IBM Watson AI Lab. “We want to build a system that is able to handle that kind of graph and allows processing to be as efficient as possible, because every day we want to keep up with the pace of the new data that are generated.”

    Kaler and Chen’s co-authors include Nickolas Stathas MEng ’21 of Jump Trading, who developed SALIENT as part of his graduate work; former MIT-IBM Watson AI Lab intern and MIT graduate student Anne Ouyang; MIT CSAIL postdoc Alexandros-Stavros Iliopoulos; MIT CSAIL Research Scientist Tao B. Schardl; and Charles E. Leiserson, the Edwin Sibley Webster Professor of Electrical Engineering at MIT and a researcher with the MIT-IBM Watson AI Lab.     

    For this problem, the team took a systems-oriented approach in developing their method: SALIENT, says Kaler. To do this, the researchers implemented what they saw as important, basic optimizations of components that fit into existing machine-learning frameworks, such as PyTorch Geometric and the deep graph library (DGL), which are interfaces for building a machine-learning model. Stathas says the process is like swapping out engines to build a faster car. Their method was designed to fit into existing GNN architectures, so that domain experts could easily apply this work to their specified fields to expedite model training and tease out insights during inference faster. The trick, the team determined, was to keep all of the hardware (CPUs, data links, and GPUs) busy at all times: while the CPU samples the graph and prepares mini-batches of data that will then be transferred through the data link, the more critical GPU is working to train the machine-learning model or conduct inference. 

    The researchers began by analyzing the performance of a commonly used machine-learning library for GNNs (PyTorch Geometric), which showed a startlingly low utilization of available GPU resources. Applying simple optimizations, the researchers improved GPU utilization from 10 to 30 percent, resulting in a 1.4 to two times performance improvement relative to public benchmark codes. This fast baseline code could execute one complete pass over a large training dataset through the algorithm (an epoch) in 50.4 seconds.                          

    Seeking further performance improvements, the researchers set out to examine the bottlenecks that occur at the beginning of the data pipeline: the algorithms for graph sampling and mini-batch preparation. Unlike other neural networks, GNNs perform a neighborhood aggregation operation, which computes information about a node using information present in other nearby nodes in the graph — for example, in a social network graph, information from friends of friends of a user. As the number of layers in the GNN increase, the number of nodes the network has to reach out to for information can explode, exceeding the limits of a computer. Neighborhood sampling algorithms help by selecting a smaller random subset of nodes to gather; however, the researchers found that current implementations of this were too slow to keep up with the processing speed of modern GPUs. In response, they identified a mix of data structures, algorithmic optimizations, and so forth that improved sampling speed, ultimately improving the sampling operation alone by about three times, taking the per-epoch runtime from 50.4 to 34.6 seconds. They also found that sampling, at an appropriate rate, can be done during inference, improving overall energy efficiency and performance, a point that had been overlooked in the literature, the team notes.      

    In previous systems, this sampling step was a multi-process approach, creating extra data and unnecessary data movement between the processes. The researchers made their SALIENT method more nimble by creating a single process with lightweight threads that kept the data on the CPU in shared memory. Further, SALIENT takes advantage of a cache of modern processors, says Stathas, parallelizing feature slicing, which extracts relevant information from nodes of interest and their surrounding neighbors and edges, within the shared memory of the CPU core cache. This again reduced the overall per-epoch runtime from 34.6 to 27.8 seconds.

    The last bottleneck the researchers addressed was to pipeline mini-batch data transfers between the CPU and GPU using a prefetching step, which would prepare data just before it’s needed. The team calculated that this would maximize bandwidth usage in the data link and bring the method up to perfect utilization; however, they only saw around 90 percent. They identified and fixed a performance bug in a popular PyTorch library that caused unnecessary round-trip communications between the CPU and GPU. With this bug fixed, the team achieved a 16.5 second per-epoch runtime with SALIENT.

    “Our work showed, I think, that the devil is in the details,” says Kaler. “When you pay close attention to the details that impact performance when training a graph neural network, you can resolve a huge number of performance issues. With our solutions, we ended up being completely bottlenecked by GPU computation, which is the ideal goal of such a system.”

    SALIENT’s speed was evaluated on three standard datasets ogbn-arxiv, ogbn-products, and ogbn-papers100M, as well as in multi-machine settings, with different levels of fanout (amount of data that the CPU would prepare for the GPU), and across several architectures, including the most recent state-of-the-art one, GraphSAGE-RI. In each setting, SALIENT outperformed PyTorch Geometric, most notably on the large ogbn-papers100M dataset, containing 100 million nodes and over a billion edges Here, it was three times faster, running on one GPU, than the optimized baseline that was originally created for this work; with 16 GPUs, SALIENT was an additional eight times faster. 

    While other systems had slightly different hardware and experimental setups, so it wasn’t always a direct comparison, SALIENT still outperformed them. Among systems that achieved similar accuracy, representative performance numbers include 99 seconds using one GPU and 32 CPUs, and 13 seconds using 1,536 CPUs. In contrast, SALIENT’s runtime using one GPU and 20 CPUs was 16.5 seconds and was just two seconds with 16 GPUs and 320 CPUs. “If you look at the bottom-line numbers that prior work reports, our 16 GPU runtime (two seconds) is an order of magnitude faster than other numbers that have been reported previously on this dataset,” says Kaler. The researchers attributed their performance improvements, in part, to their approach of optimizing their code for a single machine before moving to the distributed setting. Stathas says that the lesson here is that for your money, “it makes more sense to use the hardware you have efficiently, and to its extreme, before you start scaling up to multiple computers,” which can provide significant savings on cost and carbon emissions that can come with model training.

    This new capacity will now allow researchers to tackle and dig deeper into bigger and bigger graphs. For example, the Bitcoin network that was mentioned earlier contained 100,000 nodes; the SALIENT system can capably handle a graph 1,000 times (or three orders of magnitude) larger.

    “In the future, we would be looking at not just running this graph neural network training system on the existing algorithms that we implemented for classifying or predicting the properties of each node, but we also want to do more in-depth tasks, such as identifying common patterns in a graph (subgraph patterns), [which] may be actually interesting for indicating financial crimes,” says Chen. “We also want to identify nodes in a graph that are similar in a sense that they possibly would be corresponding to the same bad actor in a financial crime. These tasks would require developing additional algorithms, and possibly also neural network architectures.”

    This research was supported by the MIT-IBM Watson AI Lab and in part by the U.S. Air Force Research Laboratory and the U.S. Air Force Artificial Intelligence Accelerator. More

  • in

    Breaking the scaling limits of analog computing

    As machine-learning models become larger and more complex, they require faster and more energy-efficient hardware to perform computations. Conventional digital computers are struggling to keep up.

    An analog optical neural network could perform the same tasks as a digital one, such as image classification or speech recognition, but because computations are performed using light instead of electrical signals, optical neural networks can run many times faster while consuming less energy.

    However, these analog devices are prone to hardware errors that can make computations less precise. Microscopic imperfections in hardware components are one cause of these errors. In an optical neural network that has many connected components, errors can quickly accumulate.

    Even with error-correction techniques, due to fundamental properties of the devices that make up an optical neural network, some amount of error is unavoidable. A network that is large enough to be implemented in the real world would be far too imprecise to be effective.

    MIT researchers have overcome this hurdle and found a way to effectively scale an optical neural network. By adding a tiny hardware component to the optical switches that form the network’s architecture, they can reduce even the uncorrectable errors that would otherwise accumulate in the device.

    Their work could enable a super-fast, energy-efficient, analog neural network that can function with the same accuracy as a digital one. With this technique, as an optical circuit becomes larger, the amount of error in its computations actually decreases.  

    “This is remarkable, as it runs counter to the intuition of analog systems, where larger circuits are supposed to have higher errors, so that errors set a limit on scalability. This present paper allows us to address the scalability question of these systems with an unambiguous ‘yes,’” says lead author Ryan Hamerly, a visiting scientist in the MIT Research Laboratory for Electronics (RLE) and Quantum Photonics Laboratory and senior scientist at NTT Research.

    Hamerly’s co-authors are graduate student Saumil Bandyopadhyay and senior author Dirk Englund, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), leader of the Quantum Photonics Laboratory, and member of the RLE. The research is published today in Nature Communications.

    Multiplying with light

    An optical neural network is composed of many connected components that function like reprogrammable, tunable mirrors. These tunable mirrors are called Mach-Zehnder Inferometers (MZI). Neural network data are encoded into light, which is fired into the optical neural network from a laser.

    A typical MZI contains two mirrors and two beam splitters. Light enters the top of an MZI, where it is split into two parts which interfere with each other before being recombined by the second beam splitter and then reflected out the bottom to the next MZI in the array. Researchers can leverage the interference of these optical signals to perform complex linear algebra operations, known as matrix multiplication, which is how neural networks process data.

    But errors that can occur in each MZI quickly accumulate as light moves from one device to the next. One can avoid some errors by identifying them in advance and tuning the MZIs so earlier errors are cancelled out by later devices in the array.

    “It is a very simple algorithm if you know what the errors are. But these errors are notoriously difficult to ascertain because you only have access to the inputs and outputs of your chip,” says Hamerly. “This motivated us to look at whether it is possible to create calibration-free error correction.”

    Hamerly and his collaborators previously demonstrated a mathematical technique that went a step further. They could successfully infer the errors and correctly tune the MZIs accordingly, but even this didn’t remove all the error.

    Due to the fundamental nature of an MZI, there are instances where it is impossible to tune a device so all light flows out the bottom port to the next MZI. If the device loses a fraction of light at each step and the array is very large, by the end there will only be a tiny bit of power left.

    “Even with error correction, there is a fundamental limit to how good a chip can be. MZIs are physically unable to realize certain settings they need to be configured to,” he says.

    So, the team developed a new type of MZI. The researchers added an additional beam splitter to the end of the device, calling it a 3-MZI because it has three beam splitters instead of two. Due to the way this additional beam splitter mixes the light, it becomes much easier for an MZI to reach the setting it needs to send all light from out through its bottom port.

    Importantly, the additional beam splitter is only a few micrometers in size and is a passive component, so it doesn’t require any extra wiring. Adding additional beam splitters doesn’t significantly change the size of the chip.

    Bigger chip, fewer errors

    When the researchers conducted simulations to test their architecture, they found that it can eliminate much of the uncorrectable error that hampers accuracy. And as the optical neural network becomes larger, the amount of error in the device actually drops — the opposite of what happens in a device with standard MZIs.

    Using 3-MZIs, they could potentially create a device big enough for commercial uses with error that has been reduced by a factor of 20, Hamerly says.

    The researchers also developed a variant of the MZI design specifically for correlated errors. These occur due to manufacturing imperfections — if the thickness of a chip is slightly wrong, the MZIs may all be off by about the same amount, so the errors are all about the same. They found a way to change the configuration of an MZI to make it robust to these types of errors. This technique also increased the bandwidth of the optical neural network so it can run three times faster.

    Now that they have showcased these techniques using simulations, Hamerly and his collaborators plan to test these approaches on physical hardware and continue driving toward an optical neural network they can effectively deploy in the real world.

    This research is funded, in part, by a National Science Foundation graduate research fellowship and the U.S. Air Force Office of Scientific Research. More