This work introduces a supervised tree-based method dealing with preference rankings as response variable. Tree-based models working with multivariate response variables are present in the literature. The problem with preference rankings is that an ordering can be considered as a unique multidimensional ”entity” rather than a multivariate distribution. For this reason the techniques known in the literature to define split for multivariate response variables are not able to obtain impurity measures which are feasible in this case. Building a tree-based structure with rankings as response variable requires the definition of both a particular impurity measure and an assignment rule: the goodness of the tree-based classifier for rankings depends on this choice. In defining the impurity measure, in this paper we use a distance-based approach. In the framework of preference rankings theory, often on discuss about the meaning of ties: who believes that ties are a positive statement of agreement, and not just indifference declarations, should accept a set axioms formulated by Kemeny (Kemeny, 1964) that should be verified in the definition of a distance measure involving preference rankings. But some experimental design could not contemplate ties. We chose the Kemeny distance as impurity measure because it is sufficiently discriminatory to be used for splitting measure. Moreover, if in some experimental design ties are not allowed, it is proved that it is equivalent to the well known Kendall distance. The problem in using the latter distance is that it can be used only with linear orderings (i.e. when ties are not allowed). Otherwise it can be proved that Kendall distance violates the triangular inequality. The ranking-class assignment rule is the consensus ranking computed by maximizing the Emond and Mason’s rank correlation coefficient (Emond and Mason, 2002). The characterization of the impurity measure suggested us to call our method Distance Based Multivariate Trees for Rankings (DBMTR).

Distance-based multivariate trees for rankings / D'Ambrosio, Antonio. - (2010).

Distance-based multivariate trees for rankings

D'AMBROSIO, ANTONIO
2010

Abstract

This work introduces a supervised tree-based method dealing with preference rankings as response variable. Tree-based models working with multivariate response variables are present in the literature. The problem with preference rankings is that an ordering can be considered as a unique multidimensional ”entity” rather than a multivariate distribution. For this reason the techniques known in the literature to define split for multivariate response variables are not able to obtain impurity measures which are feasible in this case. Building a tree-based structure with rankings as response variable requires the definition of both a particular impurity measure and an assignment rule: the goodness of the tree-based classifier for rankings depends on this choice. In defining the impurity measure, in this paper we use a distance-based approach. In the framework of preference rankings theory, often on discuss about the meaning of ties: who believes that ties are a positive statement of agreement, and not just indifference declarations, should accept a set axioms formulated by Kemeny (Kemeny, 1964) that should be verified in the definition of a distance measure involving preference rankings. But some experimental design could not contemplate ties. We chose the Kemeny distance as impurity measure because it is sufficiently discriminatory to be used for splitting measure. Moreover, if in some experimental design ties are not allowed, it is proved that it is equivalent to the well known Kendall distance. The problem in using the latter distance is that it can be used only with linear orderings (i.e. when ties are not allowed). Otherwise it can be proved that Kendall distance violates the triangular inequality. The ranking-class assignment rule is the consensus ranking computed by maximizing the Emond and Mason’s rank correlation coefficient (Emond and Mason, 2002). The characterization of the impurity measure suggested us to call our method Distance Based Multivariate Trees for Rankings (DBMTR).
2010
Distance-based multivariate trees for rankings / D'Ambrosio, Antonio. - (2010).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/374436
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact