This paper proposes a new method to identify interesting structures in data based on the projection pursuit methodology. Past work reported in literature uses projection pursuit methods as means to visualize high-dimensional data, or to identify linear combinations of attributes that reveal grouping tendencies or outliers. The framework of projection pursuit is generally formulated as an optimization problem aiming at finding projection axes that minimize/maximize a projection index. With regard to identifying interesting structure, the existing approaches suffer from obvious limitations: linear models are not able to catch more general structures in data like circular/curved clusters or any structure that is the result of a polynomial/nonlinear generative model. This paper extends linear projection pursuit to nonlinear projections while allowing at the same time for the preservation of the general methodology employed in the search of projections. In addition, an algorithmic framework based on multi-modal genetic algorithms is proposed in order to deal with the large search space and to allow for the use of non-differentiable projection indices. Experiments conducted on synthetic data demonstrate the ability of the new approach to identify clusters of various shapes that otherwise are undetectable with linear projection pursuit or popular clustering methods like k-Means.
This article is authored also by Synbrain data scientists and collaborators. READ THE FULL ARTICLE