![]() ![]() It is typically even faster compared to the custom-built and optimized PyTorch Opacus.Įach step of our DP-SGD implementation takes approximately two forward-backward passes through the network. Compared to other implementations of DP-SGD, such as that in Tensorflow Privacy, the JAX implementation is consistently several times faster. The implementation in JAX was relatively simple and resulted in noticeable performance gains simply because of using the XLA compiler. We created our own implementation of DP-SGD on JAX and benchmarked it against the large ImageNet dataset (the code is included in our release). ![]() Using these JAX features was previously recommended as a good way to speed up DP-SGD in the context of smaller datasets such as CIFAR-10. To streamline our efforts, we used JAX, a high-performance computational library based on XLA that can do efficient auto-vectorization and just-in-time compilation of the mathematical computations. To substantiate these discoveries and encourage follow-up research, we are also releasing the associated source code.Įxploring multiple architectures and training configurations to research what works for DP can be debilitatingly slow. We show that the combination of various training techniques, such as careful choice of the model and hyperparameters, large batch training, and transfer learning from other datasets, can significantly boost accuracy of an ImageNet model trained with DP. In “ Toward Training at ImageNet Scale with Differential Privacy”, we share initial results from our ongoing effort to train a large image classification model on ImageNet using DP while maintaining high accuracy and minimizing computational cost. As a result most DP research papers evaluate DP algorithms on very small datasets ( MNIST, CIFAR-10, or UCI) and don’t even try to perform evaluation of larger datasets, such as ImageNet. Second, DP-SGD training often significantly impacts utility (such as model accuracy) to the point that models trained with DP-SGD may become unusable in practice. First, most existing implementations of DP-SGD are inefficient and slow, which makes it hard to use on large datasets. However training with DP-SGD typically has two major drawbacks. One usually trains a model with DP guarantees using DP-SGD, a specialized training algorithm that provides DP guarantees for the trained model. Within the DP framework, privacy guarantees of a system are usually characterized by a positive parameter ε, called the privacy loss bound, with smaller ε corresponding to better privacy. ![]() To these ends, researchers are increasingly employing federated learning approaches.ĭifferential privacy (DP) provides a rigorous mathematical framework that allows researchers to quantify and understand the privacy guarantees of a system or an algorithm. As such, it is important to prevent the encoding of such characteristics from individual training entries. For example, experiments in controlled settings have shown that language models trained using email datasets may sometimes encode sensitive information included in the training data and may have the potential to reveal the presence of a particular user’s data in the training set. However, despite aggregating large amounts of data, in theory it is possible for models to encode characteristics of individual entries from the training set. Machine learning (ML) models are becoming increasingly valuable for improved performance across a variety of consumer products, from recommendations to automatic image classification. Posted by Alexey Kurakin, Software Engineer and Roxana Geambasu, Visiting Faculty Researcher, Google Research ![]()
0 Comments
Leave a Reply. |