Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work on the their classification that uses the typical properties of rankings. We propose a common framework for both the prediction of rankings and clustering of rankings, which is also valid for partial rankings. This framework is based on the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one ranking into another. The Kemeny distance is equivalent to Kendall’s tau for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of tau. For clustering (unsupervised classification), we use the probabilistic distance method proposed by Ben-Israel and Iyigun, and define the disparity between a ranking and the center of a cluster as the Kemeny distance. For prediction (supervised classification), we build a classification tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of the within-node Kemeny distances. In both cases, the center of a subgroup of (partial) rankings - also called the consensus ranking - is useful to characterize the subgroup. It is well-known that finding the consensus ranking is an NPhard problem. We use a branch-and-bound algorithm to find approximate solutions. Illustrative examples are given for both procedures.

Supervised and Unsupervised Classification of Rankings Using a Kemeny Distance Framework / Heiser, W. J.; D'Ambrosio, Antonio. - (2011). (Intervento presentato al convegno GFKL 2011 tenutosi a Francoforte (Germania) nel 31 Agosto - 2 Settembre 2011).

Supervised and Unsupervised Classification of Rankings Using a Kemeny Distance Framework

D'AMBROSIO, ANTONIO
2011

Abstract

Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work on the their classification that uses the typical properties of rankings. We propose a common framework for both the prediction of rankings and clustering of rankings, which is also valid for partial rankings. This framework is based on the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one ranking into another. The Kemeny distance is equivalent to Kendall’s tau for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of tau. For clustering (unsupervised classification), we use the probabilistic distance method proposed by Ben-Israel and Iyigun, and define the disparity between a ranking and the center of a cluster as the Kemeny distance. For prediction (supervised classification), we build a classification tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of the within-node Kemeny distances. In both cases, the center of a subgroup of (partial) rankings - also called the consensus ranking - is useful to characterize the subgroup. It is well-known that finding the consensus ranking is an NPhard problem. We use a branch-and-bound algorithm to find approximate solutions. Illustrative examples are given for both procedures.
2011
Supervised and Unsupervised Classification of Rankings Using a Kemeny Distance Framework / Heiser, W. J.; D'Ambrosio, Antonio. - (2011). (Intervento presentato al convegno GFKL 2011 tenutosi a Francoforte (Germania) nel 31 Agosto - 2 Settembre 2011).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/413166
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact