SPHERICAL AND STOCHASTIC CO-CLUSTERING ALGORITHMS

Date

2019-04-17

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Clustering, without a doubt, is a dominating area in data mining and machine learning field. Due to the wide range of the necessity to clustering algorithms, it has many applications in real-life problems, ranging from bioinformatics to personalized information delivery. Feature characteristics of the newly generated data lead us to new approaches to explore the nature of it. General single-sided (i.e. one-way) clustering algorithms such as K-means algorithm clusters either rows or columns of the data matrix. Coclustering algorithm clusters both the instances and features of the data matrix simultaneously and thus, it is more suitable to discover the pattern(s) hidden in both row and column dimensions.
Most existing Coclustering algorithms include inexplicit clustering steps for each dimension, separately. In this study, we developed two novel Coclustering algorithms, named as Spherical Coclustering and Stochastic Coclustering, which utilize the existing K-means framework, furthermore a specific data construction, and two specific data normalization was included as a pre-processing step. The Coclustering framework resembles one existing Coclustering algorithm, Spectral Coclustering, as it first applies feature selection using singular value decomposition and utilizes one-way clustering to achieve Coclustering. Furthermore, we partially address a couple of practical well-known problem in clustering algorithm which include the cluster initialization, the degeneracy problem, a local minimum, and a nan (not-a-number) condition in a Kullback-Leibler divergence.

The correctness and efficiency of the two algorithms were validated with publicly available benchmark dataset in terms of monotonicity of objective function value change and clustering accuracy. To be specific, we compared the accuracy performance of Euclidean K-means, Stochastic K-means, Spherical K-means, Stochastic Coclustering and Spherical Coclustering algorithms.

Description

Keywords

Coclustering algorithm, K-means algorithm, bi-normalization, Stochastic Coclustering, Spherical Coclustering, Sinkhorn-Knopp Normalization, Kullback-Leibler Divergence

Citation