Reducing Objective Function Mismatch in Deep Clustering with the Unsupervised Companion Objective
Keywords:Deep clustering, Unsupervised companion objective, Objective function mismatch
Preservation of local similarity structure is a key challenge in deep clustering. Many recent deep clustering methods therefore use autoencoders to help guide the model's neural network towards an embedding which is more reflective of the input space geometry. However, recent work has shown that autoencoder-based deep clustering models can suffer from objective function mismatch (OFM). In order to improve the preservation of local similarity structure, while simultaneously having a low OFM, we develop a new auxiliary objective function for deep clustering. Our Unsupervised Companion Objective (UCO) encourages a consistent clustering structure at intermediate layers in the network -- helping the network learn an embedding which is more reflective of the similarity structure in the input space. Since a clustering-based auxiliary objective has the same goal as the main clustering objective, it is less prone to introduce objective function mismatch between itself and the main objective. Our experiments show that attaching the UCO to a deep clustering model improves the performance of the model, and exhibits a lower OFM, compared to an analogous autoencoder-based model.
X. Guo, L. Gao, X. Liu, and J. Yin. Improved Deep Embedded Clustering with Local Structure Preservation. In International Joint Conference on Artificial Intelligence, 2017. https://doi.org/10.24963/ijcai.2017/243
W. Hu, T. Miyato, S. Tokui, E. Matsumoto, and M. Sugiyama. Learning Discrete Representations via Information Maximizing Self-Augmented Training. In International Conference on Machine Learning, 2017.
S. Ioffe and C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, 2015.6
R. Jenssen, J. C. Principe, D. Erdogmus, and T. Eltoft. The Cauchy-Schwarz Divergence and Parzen Windowing: Connections to Graph Theory and Mercer Kernels. Journal of the Franklin Institute, 343(6), 2006. https://doi.org/10.1016/j.jfranklin.2006.03.018
M. Kampffmeyer, S. Løkse, F. M. Bianchi,L. Livi, A.-B. Salberg, and R. Jenssen. Deep Divergence-Based Approach to Clustering. Neural Networks, 113, 2019. https://doi.org/10.1016/j.neunet.2019.01.015
D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, 2015.
Y. LeCun, Y. Bengio, and G. Hinton. Deep Learning. Nature, 521(7553), 2015. https://doi.org/10.1038/nature14539
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11),1998. https://doi.org/10.1109/5.726791
C.-Y. Lee, S. Xie, P. W. Gallagher, Z. Zhang, and Z. Tu. Deeply-Supervised Nets. Journal of Machine Learning Research, 38, 2015.
F. Li, H. Qiao, and B. Zhang. Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders. Pattern Recognition, 83,2018. https://doi.org/10.1016/j.patcog.2018.05.019
L. McInnes, J. Healy, and J. Melville. UMAP: Uniform Manifold Approximationand Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat], 2018. https://doi.org/10.21105/joss.00861
L. Metz, N. Maheswaranathan, B. Cheung, and J. Sohl-Dickstein. Meta-Learning Update Rules for Unsupervised Representation Learning. In International Conference on Learning Representations, 2019.
N. Mrabah, M. Bouguessa, and R. Ksantini. Adversarial Deep Embedded Clustering: On a better trade-off between Feature Randomness and Feature Drift. IEEE Transactions on Knowledge and Data Engineering, 2020. https://doi.org/10.1109/TKDE.2020.2997772
N. Mrabah, N. M. Khan, R. Ksantini, and Z. Lachiri. Deep Clustering with a Dynamic Autoencoder: From Reconstruction towards Centroids Construction. arXiv:1901.07752 [cs,stat], 2020. https://doi.org/10.1016/j.neunet.2020.07.005
S. A. Nene, S. K. Nayar, and H. Murase. Columbia Object Image Library (COIL-100). Techincal Report CUCS-006-96, 1996.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An imperative style, high-performance deep learning library. In Neural Information Processing Systems, 2019.
U. Shaham, K. Stanton, H. Li, R. Basri, B. Nadler, and Y. Kluger. SpectralNet: Spectral Clustering Using Deep Neural Networks. In International Conference on Learning Representations, 2018.
D. J. Trosten, A. S. Strauman, M. Kampffmeyer, and R. Jenssen. Recurrent Deep Divergence-Based Clustering for Simultaneous Feature Learning and Clustering of Variable Length Time Series. In International Conference on Acoustics, Speech and Signal Processing, 2019. https://doi.org/10.1109/ICASSP.2019.8682365
H. Xiao, K. Rasul, and R. Vollgraf. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:1708.07747 [cs, stat], 2017.
J. Xie, R. Girshick, and A. Farhadi. Unsupervised Deep Embedding for Clustering Analysis. In International Conference on Machine Learning, 2016.
B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong. Towards K-Means-Friendly Spaces: Simultaneous Deep Learning and Clustering. In International Conference on Machine Learning,2017.
J. Yang, D. Parikh, and D. Batra. Joint Unsupervised Learning of Deep Representations and Image Clusters. In International Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.1109/CVPR.2016.556