This paper addresses the clustering problem given the similarity matrix of a dataset. By representing this matrix as a weighted graph we transform this problem to a graph clustering/partitioning problem which aims at identifying groups of strongly inter-connected vertices. We define two distinct criteria with the aim of simultaneously minimizing the cut size and obtaining balanced clusters. The first criterion minimizes the similarity between objects belonging to different clusters and is an objective generally met in clustering. The second criterion is formulated with the aid of generalized entropy. The trade-off between these two objectives is explored using a multi-objective genetic algorithm with enhanced operators. As the experimental results show, the Pareto front offers a visualization of the trade-off between the two objectives.
This article is authored also by Synbrain data scientists and collaborators. READ THE FULL ARTICLE