The degree to which objects differ from each other with respect to observations on a set of variables, plays an important role in many statistical methods. Many data analysis methods require a quantification of differences in the observed values which we can call distances. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex as there is no straightforward quantification of the size of the observed differences. In this paper, we introduce a flexible framework for efficiently computing distances between categorical variables, supporting existing and new formulations tailored to specific contexts. In supervised classification, it enhances performance by integrating relationships between response and predictor variables. This framework allows measuring differences among objects across diverse data types and domains.

A general framework for implementing distances for categorical variables / Velden, Michel van de; Iodice D’Enza, Alfonso; Markos, Angelos; Cavicchia, Carlo. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 153:(2024). [10.1016/j.patcog.2024.110547]

A general framework for implementing distances for categorical variables

Iodice D’Enza, Alfonso;
2024

Abstract

The degree to which objects differ from each other with respect to observations on a set of variables, plays an important role in many statistical methods. Many data analysis methods require a quantification of differences in the observed values which we can call distances. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex as there is no straightforward quantification of the size of the observed differences. In this paper, we introduce a flexible framework for efficiently computing distances between categorical variables, supporting existing and new formulations tailored to specific contexts. In supervised classification, it enhances performance by integrating relationships between response and predictor variables. This framework allows measuring differences among objects across diverse data types and domains.
2024
A general framework for implementing distances for categorical variables / Velden, Michel van de; Iodice D’Enza, Alfonso; Markos, Angelos; Cavicchia, Carlo. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 153:(2024). [10.1016/j.patcog.2024.110547]
File in questo prodotto:
File Dimensione Formato  
general_framework_pattern_recognition.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 680.57 kB
Formato Adobe PDF
680.57 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/959850
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact