Compressing CNN Kernels for Videos Using Tucker Decompositions: Towards Lightweight CNN Applications
Keywords:CNN, Tucker Decomposition, Video Classification, Early Fusion
Convolutional Neural Networks (CNN) are the state-of-the-art in the field of visual computing. However, a major problem with CNNs is the large number of floating point operations (FLOPs) required to perform convolutions for large inputs. When considering the application of CNNs to video data, convolutional filters become even more complex due to the extra temporal dimension. This leads to problems when respective applications are to be deployed on mobile devices, such as smart phones, tablets, micro-controllers or similar, indicating less computational power.
Kim et al. proposed using a Tucker-decomposition to compress the convolutional kernel of a pre-trained network for images in order to reduce the complexity of the network, i.e. the number of FLOPs. In this paper, we generalize the aforementioned method for application to videos (and other 3D signals) and evaluate the proposed method on a modified version of the THETIS data set, which contains videos of individuals performing tennis shots. We show that the compressed network reaches comparable accuracy, while indicating a memory compression by a factor of 51. However, the actual computational speed-up (factor 1.4) does not meet our theoretically derived expectation (factor 6).
M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. Predicting parameters in deep learning. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/7fec306d1e665bc9c748b5d2b99a6e97Paper.pdf.
S. Gourgari, G. Goudelis, K. Karpouzis, and S. Kollias. Thetis: Three dimensional tennis shots a human action dataset. In 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 676– 681, 2013. URL https://doi.org/10.1109/ CVPRW.2013.102.
M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference. BMVA Press, 2014. URL http://dx.doi.org/10.5244/C.28.88.
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014. URL https://doi.org/10.1109/CVPR.2014.223.
Y. D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. 4th International Conference on Learning Representations, Iclr 2016 - Conference Track Proceedings, 2016. URL https://www.scopus.com/inward/record.uri?eid=2s2.0-85083951289&partnerID=40&md5=8115bfcad3c1b4338ff76ada9045ae94.
V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky. Speeding-up convolutional neural networks using finetuned cp-decomposition. In 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2015. URL https://www.scopus.com/inward/record.uri?eid=2s2.0-85083952441&partnerID=40&md5=689b44a127d12ad8be206459e1de7187.
M. Mørup. Applications of tensor (multiway array) factorizations and decompositions in data mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):24–40, 2011. ISSN 1942-4787. URL https://doi.org/10.1002/widm.1.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024– 8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015pytorch-an-imperative-style-highperformance-deep-learning-library.pdf.
R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua. Learning separable filters. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 2754– 2761, 2013. URL https://doi.org/10.1109/ ICCV.2013.355.
P. Wang and J. Cheng. Accelerating convolutional neural networks for mobile applications. In Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, page 541–545, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450336031. URL https://doi.org/10.1145/2964284.2967280.
Copyright (c) 2022 Tobias Engelhardt Rasmussen, Line KH Clemmensen, Andreas Baum
This work is licensed under a Creative Commons Attribution 4.0 International License.