This paper is concerned with feature weighting/selection in the context of unsupervised clustering. Since different subspaces of the feature space may lead to different partitions of the data set, an efficient algorithm to tackle multi-modal environments is needed. In this context, the Multi-Niche Crowding Genetic Algorithm is used for searching relevant feature subsets. The proposed method is designed to deal with the inherent biases regarding the number of clusters and the number of features that appear in an unsupervised framework. The first one is eliminated with the aid of a new unsupervised clustering criterion, while the second is tackled with the aid of cross-projection normalization. The method delivers a vector of weights which offers a ranking of features in accordance with their relevance to clustering.
This article is authored also by Synbrain data scientists and collaborators. READ THE FULL ARTICLE