Dense FixMatch: a simple semi-supervised learning method for pixel-wise prediction tasks
DOI:
https://doi.org/10.7557/18.6794Keywords:
semi-supervised learning, dense prediction, pixel-wise prediction, semantic segmentationAbstract
We propose Dense FixMatch, a simple method for online semi-supervised learning of dense and structured prediction tasks combining pseudo-labeling and consistency regularization via strong data augmentation. We enable the application of FixMatch in semi-supervised learning problems beyond image classification by adding a matching operation on the pseudo-labels. This allows us to still use the full strength of data augmentation pipelines, including geometric transformations. We evaluate it on semi-supervised semantic segmentation on Cityscapes and Pascal VOC with different percentages of labeled data and ablate design choices and hyper-parameters. Dense FixMatch significantly improves results compared to supervised learning using only labeled data, approaching its performance with 1/4 of the labeled samples.
References
D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel. Mixmatch: A holistic approach to semi-supervised learning. Advances in neural information processing systems, 32, 2019. URL http://papers.neurips.cc/paper/8749-mixmatch-a-holistic-approach-to-semi-supervised-learning.pdf
D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel. Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HklkeR4KPB
Z. Cai, A. Ravichandran, S. Maji, C. Fowlkes, Z. Tu, and S. Soatto. Exponential moving average normalization for self-supervised and semi-supervised learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 194–203, 2021. doi: 10.1109/CVPR46437.2021.00026.
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision – ECCV 2018, pages 833–851, Cham, 2018. Springer International Publishing. doi: 10.1007/978-3-030-01234-2 49
X. Chen, Y. Yuan, G. Zeng, and J. Wang. Semi-supervised semantic segmentation with cross pseudo supervision. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2613–2622, 2021. doi: 10.1109/CVPR46437.2021.00264.
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, 2016. doi: 10.1109/CVPR.2016.350.
E. D. Cubuk, B. Zoph, J. Shlens, and Q. Le. Randaugment: Practical automated data augmentation with a reduced search space. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/d85b63ef0ccb114d0a3bb7b7d808028f-Abstract.html.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88 (2):303–338, June 2010. doi: 10.1007/s11263-009-0275-4.
G. French, S. Laine, T. Aila, M. Mackiewicz, and G. Finlayson. Semi-supervised semantic segmentation needs strong, varied perturbations. In British Machine Vision Conference, number 31, 2020. URL https://www.bmvc2020-conference.com/conference/papers/paper_0680.html.
B. Hariharan, P. Arbel ́aez, L. Bourdev, S. Maji, and J. Malik. Semantic contours from inverse detectors. In 2011 International Conference on Computer Vision, pages 991–998, 2011. doi: 10.1109/ICCV.2011.6126343.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90.
H. Hu, F. Wei, H. Hu, Q. Ye, J. Cui, and L. Wang. Semi-supervised semantic segmentation via adaptive equalization learning. Advances in Neural Information Processing Systems, 34:22106–22118, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/b98249b38337c5088bbc660d8f872d6a-Abstract.html.
J. Jeong, S. Lee, J. Kim, and N. Kwak. Consistency-based semi-supervised learning for object detection. In Advances in Neural Information Processing Systems, volume 32, 2019. URL https://papers.nips.cc/paper/2019/hash/d0f4dae80c3d0277922f8371d5827292-Abstract.html
Z. Ke, D. Qiu, K. Li, Q. Yan, and R. W. Lau. Guided collaborative training for pixel-wise semi-supervised learning. In European Conference on Computer Vision (ECCV), August 2020. doi: 10.1007/978-3-030-58601-0 26.
S. Laine and T. Aila. Temporal ensembling for semi-supervised learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=BJ6oOfqge.
Y.-C. Liu, C.-Y. Ma, Z. He, C.-W. Kuo, K. Chen, P. Zhang, B. Wu, Z. Kira, and P. Vajda. Unbiased teacher for semi-supervised object detection. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=MJIve1zgR_
M. Martí i Rabadán, S. Bujwid, A. Pieropan, H. Azizpour, and A. Maki. An analysis of over-sampling labeled data in semi-supervised learning with fixmatch. In Proceedings of the Northern Lights Deep Learning Workshop, volume 3, 2022. doi: 10.7557/18.6269.
L. Melas-Kyriazi and A. K. Manrai. Pixmatch: Unsupervised domain adaptation via pixel-wise consistency training. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12430–12440, 2021. doi: 10.1109/CVPR46437.2021.01225.
T. Miyato, S.-I. Maeda, M. Koyama, and S. Ishii. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8):1979–1993, 2019. doi: 10.1109/TPAMI.2018.2858821.
Y. Ouali, C. Hudelot, and M. Tami. Semi-supervised semantic segmentation with cross-consistency training. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12671–12681, 2020. doi: 10.1109/CVPR42600.2020.01269.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
E. Riba, D. Mishkin, D. Ponsa, E. Rublee, and G. Bradski. Kornia: an open source differentiable computer vision library for pytorch. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 3663–3672, 2020. doi: 10.1109/WACV45572.2020.9093363.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Gradcam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017. doi: 10.1109/ICCV.2017.74.
K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. Raffel, E. D. Cubuk, A. Kurakin, and C. Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/06964dce9addb1c5cb5d6e3d9838f733-Abstract.html
K. Sohn, Z. Zhang, C.-L. Li, H. Zhang, C.-Y. Lee, and T. Pfister. A simple semi-supervised learning framework for object detection, 2020. URL https://arxiv.org/abs/2005.04757
A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 1195–1204, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/68053af2923e00204c3ca7c6a3150cf7-Abstract.html.
Y. Wang, H. Wang, Y. Shen, J. Fei, W. Li, G. Jin, L. Wu, R. Zhao, and X. Le. Semi-supervised semantic segmentation using unreliable pseudo-labels. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4238–4247, 2022. doi: 10.1109/CVPR52688.2022.00421.
C. Wei, K. Shen, Y. Chen, and T. Ma. Theoretical analysis of self-training with deep networks on unlabeled data. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=rC8sJ4i6kaH.
Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le. Self-training with noisy student improves imagenet classification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2020. doi: 10.1109/CVPR42600.2020.01070.
M. Xu, Z. Zhang, H. Hu, J. Wang, L. Wang, F. Wei, X. Bai, and Z. Liu. End-to-end semi-supervised object detection with soft teacher. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3040–3049, 2021. doi: 10.1109/ ICCV48922.2021.00305.
Z. Xu, D. Lu, Y. Wang, J. Luo, J. Jayender, K. Ma, Y. Zheng, and X. Li. Noisy labels are treasure: Mean-teacher-assisted confident learning for hepatic vessel segmentation. pages 3–13, 2021. doi: 10.1007/978-3-030-87193-2_1.
L. Yang, W. Zhuo, L. Qi, Y. Shi, and Y. Gao. St++: Make self-trainingwork better for semi-supervised semantic segmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4258–4267, 2022. doi: 10.1109/CVPR52688.2022.00423.
X. Yang, Z. Song, I. King, and Z. Xu. A survey on deep semi-supervised learning. IEEE Transactions on Knowledge and Data Engineering, pages 1–20, 2022. doi: 10.1109/TKDE.2022.3220219.
S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo, and J. Choe. Cutmix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6022–6031, 2019. doi: 10.1109/ICCV.2019.00612
H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1Ddp1-Rb.
B. Zoph, G. Ghiasi, T.-Y. Lin, Y. Cui, H. Liu, E. D. Cubuk, and Q. Le. Rethinking pre-training and self-training. In Advances in Neural Information Processing Systems, volume 33, pages 3833–3845, 2020. URL https://proceedings.neurips.cc/paper/2020/file/27e9661e033a73a6ad8cefcde965c54d-Paper.pdf.
Y. Zou, Z. Zhang, H. Zhang, C.-L. Li, X. Bian, J.-B. Huang, and T. Pfister. Pseudoseg: Designing pseudo labels for semantic segmentation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=-TwO99rbVRu.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Miquel Martí i Rabadán, Alessandro Pieropan, Hossein Azizpour, Atsuto Maki
This work is licensed under a Creative Commons Attribution 4.0 International License.