Supervised and Unsupervised Classification of Rankings Using a Kemeny Distance Framework

Heiser, W. J.; D'Ambrosio, Antonio

Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work on the their classification that uses the typical properties of rankings. We propose a common framework for both the prediction of rankings and clustering of rankings, which is also valid for partial rankings. This framework is based on the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one ranking into another. The Kemeny distance is equivalent to Kendall’s tau for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of tau. For clustering (unsupervised classification), we use the probabilistic distance method proposed by Ben-Israel and Iyigun, and define the disparity between a ranking and the center of a cluster as the Kemeny distance. For prediction (supervised classification), we build a classification tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of the within-node Kemeny distances. In both cases, the center of a subgroup of (partial) rankings - also called the consensus ranking - is useful to characterize the subgroup. It is well-known that finding the consensus ranking is an NPhard problem. We use a branch-and-bound algorithm to find approximate solutions. Illustrative examples are given for both procedures.

Supervised and Unsupervised Classification of Rankings Using a Kemeny Distance Framework / Heiser, W. J.; D'Ambrosio, Antonio. - (2011). ( GFKL 2011 Francoforte (Germania) 31 Agosto - 2 Settembre 2011).