Visual Object Detection For Autonomous UAV Cinematography
Keywords:Visual object detection, Autonomous UAVs, Intelligent cinematography, SSD, Yolov2, Sports events
The popularization of commercial, battery-powered, camera-equipped, Vertical Take-off and Landing (VTOL) Unmanned Aerial Vehicles (UAVs) during the past decade, has significantly affected aerial video capturing operations in varying domains. UAVs are affordable, agile and flexible, having the ability to access otherwise inaccessible spots. However, their limited resources burden computation cinematography techniques on operating with high accuracy and real-time speed on such devices. State-of-the-art object detectors and feature extractors are, thus, studied in this work, aiming to find a trade-off between performance and speed that will allow UAV exploitation for intelligent cinematography purposes. Experimental evaluation on three newly introduced datasets of rowing boats, cyclists and parkour athletes is performed and evidence is provided that even limited-resource autonomous UAVs can indeed be used for cinematography applications.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7310–7311, 2017.
S. Ioffe and C. Szegedy. Batch normalization:Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
I. Karakostas, I. Mademlis, N. Nikolaidis, and I. Pitas. Shot type constraints in UAV cinematography for autonomous target tracking. Information Sciences, 506:273–294, 2020.
T.-Y. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Mi- crosoft COCO: Common objects in context. In Proceedings of the European Conference onComputer Vision (ECCV). Springer, 2014.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single shot multibox detector. European Conference on Computer Vision, pages 21–37, 2016.
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431–3440, 2015.
I. Mademlis, V. Mygdalis, N. Nikolaidis, M. Montagnuolo, F. Negro, A. Messina, and I. Pitas. High-level multiple-UAV cinematography tools for covering outdoor events. IEEE Transactions on Broadcasting, 65(3):627–635, 2019.
I. Mademlis, V. Mygdalis, N. Nikolaidis, and I. Pitas. Challenges in autonomous UAV cinematography: An overview. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, Jul 2018.
I. Mademlis, N. Nikolaidis, A. Tefas, I. Pitas, T. Wagner, and A. Messina. Autonomous unmanned aerial vehicles filming in dynamic unstructured outdoor environments. IEEE Signal Processing Magazine, 36(1):147–153, 2018.
I. Mademlis, N. Nikolaidis, A. Tefas, I. Pitas, T. Wagner, and A. Messina. Autonomous UAV cinematography: A tutorial and a formalized shot type taxonomy. ACM Computing Surveys, 52(5):105, 2019.
P. Nousi, D. Triantafyllidou, A. Tefas, and I. Pitas. Joint lightweight object tracking and detection for unmanned vehicles. In 2019 IEEE International Conference on Image Processing (ICIP), pages 160–164. IEEE, 2019.
N. Passalis and A. Tefas. Concept detection and face pose estimation using lightweight convolutional neural networks for steering drone video shooting. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 71–75, Aug 2017.
N. Passalis and A. Tefas. Training lightweight deep convolutional neural networks using bag- of-features pooling. IEEE transactions on neural networks and learning systems, 30(6):1705– 1715, 2018.
F. Patrona, I. Mademlis, A. Tefas, and I. Pitas. Computational UAV cinematography for intelligent shooting based on semantic visual analysis. In 2019 IEEE International Conference on Image Processing (ICIP), pages 4155–4159, Sept 2019.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You Only Look Once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
D. Triantafyllidou, P. Nousi, and A. Tefas. Lightweight two-stream convolutional face detection. In 2017 25th European Signal Processing Conference (EUSIPCO), pages 1190–1194. IEEE, 2017.
D. Triantafyllidou, P. Nousi, and A. Tefas. Fast deep convolutional face detection in the wild exploiting hard sample mining. Big Data Research, 11:65 – 76, 2018.
M. Tzelepi and A. Tefas. Graph embedded convolutional neural networks in human crowd detection for drone flight safety. IEEE Transactions on Emerging Topics in Computational Intelligence, 2019.