Using Mask R-CNN for Underwater Fish Instance Segmentation as Novel Objects: A Proof of Concept


  • I-Hao Chen NORCE Norwegian Research Centre
  • Nabil Belbachir NORCE Norwegian Research Centre



underwater instance segmentation, digital aquaculture, underwater camera drone, unsupervised learning


Instance Segmentation in general deals with detecting, segmenting and classifying individual instances of objects in an image. Underwater instance segmentation methods often involve aquatic animals like fish as the things to be detected. In order to train deep learning models for instance segmentation in an underwater environment, rigorous human annotation in form of instance segmentation masks with labels is usually required, since the aquatic environment poses challenges due to dynamic background patterns and optical distortions.
However, annotating instance segmentation masks on images is especially time- and cost-intensive compared to classification tasks.
Here we show an unsupervised instance learning and segmentation approach that introduces a novel class, e.g., ""fish"" to a pre-trained Mask R-CNN model using its own detection and segmentation capabilities in underwater images.
Our results demonstrate a robust detection and segmentation of underwater fish in aquaculture without the need for human annotations.
This proof of concept shows that there is room for novel objects within trained instance segmentation models in the paradigm of supervised learning.


M. Buric, M. Pobar, and M. Ivaˇsi ́c-Kos. Ball detection using yolo and mask r-cnn. pages 319–323, 12 2018. doi: 10.1109/CSCI46756.2018.00068.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/CVPR.2016.350.

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017. doi: 10.1109/CVPR.2017.261.

M. Føre, K. Frank, T. Norton, E. Svendsen, J. A. Alfredsen, T. Dempster, H. Eguiraun, W. Watson, A. Stahl, L. M. Sunde, C. Schellewald, K. R. Skøien, M. O. Alver, and D. Berckmans. Precision fish farming: A new framework to improve production in aquaculture. Biosystems Engineering, 173:176–193, 2018. ISSN 1537-5110. doi: 10.1016/j.biosystemseng.2017.10.014.

A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012. doi: 10.1109/CVPR.2012.6248074.

R. Girshick. Fast r-cnn. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015. doi: 10.1109/ICCV.2015.169.

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587 TS – Cross-Ref. IEEE, 62014. ISBN 978-1-4799-5118-5. doi:10.1109/CVPR.2014.81.

E. Glikson and A. Woolley. Human trust in artificial intelligence: Review of empirical research. academy of management annals (in press). The Academy of Management Annals, 04 2020. doi: 10.5465/annals.2018.0057.

K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. IEEE transactions on pattern analysis and machine intelligence, 42(2):386–397, 2020. doi: 10.1109/ICCV.2017.322.

M. Jian, X. Liu, H. Luo, X. Lu, H. Yu, and J. Dong. Underwater image processing and analysis: A review. Signal Processing: Image Communication, 91:116088, 02 2021. doi: 10.1016/j.image.2020.116088.

I. Kjerstad. Underwater imaging and the effect of inherent optical properties on image quality. 2014. URL [Accessed Sep 08th, 2022].

B. Kvæstad, T. Nordtug, and A. Hagemann. A machine vision system for tracking population behavior of zooplankton in small scale experiments: a case study on salmon lice (lepeophtheirus salmonis krøyer, 1838) copepodite population responses to different light stimuli. Biology open, 9, 06 2020. doi: 10.1242/bio.050724.

T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ́ar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 8693 LNCS, pages 740–755, 2014. ISBN 9781479951178. doi: 10.1007/978-3-319-10602-1 48.

T. Porto Marques, M. Cote, A. Rezvanifar, A. Branzan Albu, K. Ersahin, T. Mudge, and S. Gauthier. Instance segmentation-based identification of pelagic species in acoustic backscatter data. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4373–4382, 2021. doi: 10.1109/CVPRW53098.2021.00494.

S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2017. doi: 10.1109/TPAMI.2016.2577031.

M. Saberioon, A. Gholizadeh, P. Cisar, A. Pautsina, and J. Urban. Application of machine vision systems in aquaculture with emphasis on fish: state-of-the-art and key issues. Reviews in Aquaculture, 9(4):369–387, 2017. doi: 10.1111/raq.12143.

Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick. Detectron2., 2019. [Accessed Sep 08th, 2022].