Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Bengio, Y. and LeCun, Y. (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. [12] Gray, S., Radford, A., & Kingma, D. P. (2017). Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224, 3. [13] Elsen, E., Dukhan, M., Gale, T., & Simonyan, K. (2020). Fast sparse convnets. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14629-14638). [14] Gray, S., Radford, A., & Kingma, D. P. (2017). Gpu kernels for block-sparse weights. arXiv preprint arXiv:1711.09224, 3. [15] Wang, Z. (2020, September). SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (pp. 31-42).