Image: Facebook
Facebook has formally launched today one of Instagram’s secret tools for finding and fixing bugs in the app’s vast Python codebase.
Named Pysa, the tool is a so-called static analyzer. It works by scanning code in a “static” form, before the code is run/compiled, looking for known patterns that may indicate a bug, and then flagging potential issues with the developer.
Facebook says the tool was developed internally, and, through constant refinement, Pysa has now reached maturity. For example, Facebook said that in the first half of 2020, Pysa detected 44% of all security bugs in Instagram’s server-side Python code.
Developed for security teams
Behind this success stands the work of the Facebook security team. Even though Pysa was based on the open-source code of the Pyre project, the tool has been built around the needs of a security team.
While most static analyzers look for a wide range of bugs, Pysa was specifically developed to look for security-related issues. More particularly, Pysa tracks “flows of data through a program.”
How data flows through a program’s code is very important. Most security exploits today take advantage of unfiltered or uncontrolled data flows.
For example, a remote code execution (RCE), one of today’s worst types of bugs, when stripped down, is basically a user input that reaches unwanted portions of a codebase.
Under the hood, Pysa aims to bring some insight into how data travels across codebases, and especially large codebases made up of hundreds of thousands or millions of lines of code.
This concept isn’t new and is something that Facebook has already perfected with Zoncolan, a static analyzer that Facebook released in August 2019 for Hack — the PHP-like language variation that Facebook uses for the main Facebook app’s codebase.
Both Pysa and Zoncolan look for “sources” (where data enters a codebase) and “sinks” (where data ends up). Both tools track how data moves across a codebase, and find dangerous “sinks,” such as functions that can execute code or retrieve sensitive user data.
When a connection is found between a source and a dangerous sink, Pysa (and Zoncolan) warn developers to investigate.
Image: Facebook
Because the Facebook security team was closely involved with creating Pysa, the tool has been already fine-tuned across months of internal testing to find the source-sink patterns specific to common security issues like cross-site scripting, remote code executions, SQL injections, and more.
Built for speed and large codebases
But as Facebook security engineer Graham Bleaney told ZDNet in a phone call this week, Pysa’s ability to find security issues wouldn’t be that useful if it took days to scan Instagram’s entire codebase.
As such, Pysa was also built for speed, being capable of going over millions of lines of code from anywhere between 30 minutes and hours. This allows Pysa to find bugs in near real-time and lets developers teams feel safe about integrating the tool in their regular workflows and routines without having to fear that using it might delay shipping their code or not hitting hard deadlines.
This focus on not disrupting Facebook developers and their regular work processes has been a goal for the Facebook security team, as the Facebook security team has said in a recent episode of the Risky Business podcast.
Extendable
But Pysa also has another ace down its sleeve, and that’s extendability. Instagram, which mostly runs on Python code, was never developed as a cohesive unit from the get-go.
Just like most major platforms, its code was stitched together and improved as the company grew. Currently, its codebase includes lots of different Python frameworks and Python libraries, all running different Instagram components and features.
For Pysa, this also means the tool was created under a plug-and-play model, where the tool can be extended to adapt to new frameworks on the fly.
“Because we use open source Python server frameworks such as Django and Tornado for our own products, Pysa can start finding security issues in projects using these frameworks from the first run,” Bleaney said. “Using Pysa for frameworks we don’t already have coverage for is generally as simple as adding a few lines of configuration to tell Pysa where data enters the server.”
Facebook has formally open-sourced Pysa on GitHub today, along with several bug definitions required to help it find security issues. The Zulip server project has already embedded Pysa in their codebase after the tool was used to discover a major security issue last year.
Image: ZDNet