Projection Pursuit is a methodology for deriving meaningful low-dimensional representations of data. It generally aims at extracting features as linear combinations of the original attributes, under an algorithm which optimizes a function that quantifies the interestingness of the projection. The current paper builds on our previous research, extending an algorithmic framework based on evolutionary algorithms in two directions: at its roots a one-dimensional PP framework, we enhance it with two-dimensional search capabilities providing meaningful planar views (rotations) of the data; moreover, we provide a distributed implementation under Apache Spark. The distributed implementation is not meant only to deal with large data, but also to exploit the inherent parallelism of the evolutionary paradigm, enhancing the efficiency of the method not only when run on a cluster, but also on a single machine, exploiting the existing virtual CPUs. Several case studies illustrate the applicability of our framework and demonstrate its efficiency.
This article is authored also by Synbrain data scientists and collaborators. READ THE FULL ARTICLE