LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
ICLR 2023

Sharath Girish

Kamal Gupta

Saurabh Singh

Abhinav Shrivastava

[Paper]

[arXiv]

[GitHub]

[Video]

[Slides]

[Poster]

We introduce an end-to-end framework for jointly reducing model size on disk and faster inference. Prior works such as pruning based approaches (▲) optimize for compute in terms of FLOPs (y-axis) and have low reductions in model size on disk (x-axis). Codebook quantization based approaches (■) on the other hand, optimize for model size and have very little compute reduction. We jointly optimize for both quantities in an end-to-end manner achieving higher levels of reduction.

Abstract

We introduce LilNetX, an end-to-end trainable technique for neural networks that enables learning models with specified accuracy-compression-computation tradeoff. Prior works approach these problems one at a time and often require postprocessing or multistage training. Our method, on the other hand, constructs a joint training objective that penalizes the self-information of network parameters in a latent representation space to encourage small model size, while also introducing priors to increase structured sparsity in the parameter space to reduce computation. When compared with existing state-of-the-art model compression methods, we achieve up to 50% smaller model size and 98% model sparsity on ResNet-20 on the CIFAR-10 dataset as well as 31% smaller model size and 81% structured sparsity on ResNet-50 trained on ImageNet while retaining the same accuracy as these methods. The resulting sparsity can improve the inference time by a factor of almost 1.86x in comparison to a dense ResNet-50 model.

Joint optimization

Our approach maintains quantized latent representations for efficient storage which can be decoded to obtain sparse model weights for faster inference. We utilize a special type of slice sparsity for obtaining inference speedups without hardware modifications. A joint loss function is optimized simultaenously for downstream performance, model size, and compute. Our approach is generalizable requiring no post-hoc training stages and can be applied to any task.

Video

[Slides]

Paper and Supplementary Material

Sharath Girish, Kamal Gupta, Saurabh Singh, Kamal Gupta
LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
In Conference, ICLR, 2023.
(hosted on OpenReview)

[Bibtex]

Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.