Google has shed more light on how it’s using private data from sensors on your Android phone to update its machine-learning features, such as Live Translate, without sending private data to its cloud servers.
Google last September introduced Live Caption, Now Playing and Smart Reply as features within Android’s Private Compute Core (PCC), which lives in an isolated virtual sandbox within Android 12 and onwards, shielding PCC and its features from the OS and apps.
It also introduced and more recently open-sourced Private Compute Services (PCS), a “private path” to update and improve machine-learning models without trampling on user privacy. Data handled in PCC goes via PCS to Google’s cloud.
The company has now given a more detailed description about PCC’s architecture, including a recently published technical paper, which is aimed at building trust through transparency.
“PCC allows features to communicate with a server to receive model updates and contribute to global model training through Private Compute Services (PCS), the core of which has been open sourced,” Google explains in the paper.
Also: Follow this one simple rule for better phone security
As Google engineers note, PCC can host sophisticated ML features – such as Live Caption and Smart Reply, as well as screen deactivation when the user looks away – because of the boundaries placed on on it. PCC handles a lot of sensitive data picked up from the device, including audio, image, text, app data from the OS, and data from sensors, including the microphone, camera, and GPS.
“The hosted features themselves, running inside PCC, can be closed source and updatable. In this way, PCC enables machine learning features to process ambient and OS-level data and improve over time, while restricting the availability of information about individual users to servers or apps,” Google engineers explain.
The ambient and OS-level data includes: raw data from device sensors, such as the camera or microphone or content from the screen; data generated from analysis or inferences based on OS-level data; and metadata.
Google engineers Dave Kleidermacher, Dianne Hackborn, and Eugenio Marchiori explain in a blogpost that it’s using federated learning and federated analytics to update the ML models behind PCC features, while keeping the data private. Also, network calls to improve the performance of these models can be monitored with PCS.
“The paradigm of distributed trust, where credibility is built up from verification by multiple trusted sources, continues to extend this core value. Open sourcing the mechanisms for data protection and processes is one step towards making privacy verifiable,” they note.
PCS is an APK, which provides application protocol interfaces for PCC components. The paper notes that PCS’s federated learning and federated analytics enable “privacy-preserving machine learning and analytics without centralized data collection.”
Also: Cybersecurity jobs: Five ways to help you build your career
Android is sending aggregated data from many devices to Google’s cloud but only in the form of computation results after computation happens on the device using locally stored data. Since federated learning is difficult to explain, Google links to its own comic explaining how it works.
“The underlying techniques involve pushing a computation graph (e.g. machine learning model) to the device, computing on the locally stored data, and sending only the computation results back,” Google notes in the technical paper.
“The results from many devices are aggregated together, and used to improve the device features and user experience. Each individual device’s results are protected from being seen by the orchestrating server through the use of the Secure Aggregation multi-party computation protocol, ensuring that only aggregates over many (e.g. thousands) of devices are made available to servers and model/feature developers.”
Google is inviting researchers to analyze its claims and its implementations of PCC features detailed in the technical paper.