Cluster validation using an ensemble of supervised classifiers uri icon

Abstract

  • © 2018A cluster validity index is used to select which clustering algorithm to apply for a given problem. It works by evaluating the quality of a partition, as output by a candidate clustering algorithm, getting around the common case of the lack of an expert in the given domain of discourse. Most existing validity indexes make assumptions, such as each cluster of the partition having an underlying structure, for example, a hypersphere, yielding incorrect evaluations when they do not hold. Here, we propose a new cluster validity index, which attempts to avoid this bias using an ensemble of distinct supervised classifiers; this way the bias is not attributable to a specific classifier, but to a collection thereof, hence alleviating the problem. The rationale behind our index is that a good partition should induce the construction of also a good classifier; the better the classification performance, the better the quality of the partition under evaluation. Notice how we use the partition to be assessed as a sort of labeled dataset, where each object is labeled with the cluster label it belongs to. We have tested our index on 50 numerical datasets, grouped using six different clustering algorithms. In our experiments, our index outperforms five validity indexes, including the most popular ones.

Publication date

  • April 1, 2018