SPHERICAL AND STOCHASTIC CO-CLUSTERING ALGORITHMS
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Clustering, without a doubt, is a dominating area in data mining and machine
learning field. Due to the wide range of the necessity to clustering algorithms, it has
many applications in real-life problems, ranging from bioinformatics to personalized
information delivery. Feature characteristics of the newly generated data lead us to new
approaches to explore the nature of it. General single-sided (i.e. one-way) clustering
algorithms such as K-means algorithm clusters either rows or columns of the data matrix.
Coclustering algorithm clusters both the instances and features of the data matrix
simultaneously and thus, it is more suitable to discover the pattern(s) hidden in both row
and column dimensions.
Most existing Coclustering algorithms include inexplicit clustering steps for each
dimension, separately. In this study, we developed two novel Coclustering algorithms,
named as Spherical Coclustering and Stochastic Coclustering, which utilize the existing
K-means framework, furthermore a specific data construction, and two specific data
normalization was included as a pre-processing step. The Coclustering framework
resembles one existing Coclustering algorithm, Spectral Coclustering, as it first applies
feature selection using singular value decomposition and utilizes one-way clustering to
achieve Coclustering. Furthermore, we partially address a couple of practical well-known
problem in clustering algorithm which include the cluster initialization, the degeneracy
problem, a local minimum, and a nan (not-a-number) condition in a Kullback-Leibler
divergence.
The correctness and efficiency of the two algorithms were validated with publicly available benchmark dataset in terms of monotonicity of objective function value change and clustering accuracy. To be specific, we compared the accuracy performance of Euclidean K-means, Stochastic K-means, Spherical K-means, Stochastic Coclustering and Spherical Coclustering algorithms.