The degree of inter-rater agreement is usually assessed through (Formula presented.) -type coefficients and the extent of agreement is then characterized by comparing the value of the adopted coefficient against a benchmark scale. Through two motivating examples, it is displayed the different behavior of some (Formula presented.) -type coefficients due to asymmetric distribution of marginal frequencies over categories. In order to investigate the robustness of four (Formula presented.) -type coefficients for nominal and ordinal classifications and of an inferential benchmarking procedure that, differently from straightforward benchmarking, does not neglect the influence of the experimental conditions, an extensive Monte Carlo simulation study has been conducted. The robustness has been investigated for several scenarios, differing for sample size, rating scale dimension, number of raters, frequency distribution of rater classifications, pattern of agreement across raters. Simulation results reveal an higher paradoxical behavior of Fleiss kappa and Conger kappa with ordinal rather than nominal classifications; the coefficients robustness improves with increasing sample size and number of raters for both nominal and ordinal classifications whereas robustness improves with rating scale dimension only for nominal classifications. By identifying the scenarios (ie, minimum sample size, number of raters, rating scale dimension) with acceptable robustness, this study provides guidelines about the design of robust agreement studies.

Robustness of κ-type coefficients for clinical agreement / Vanacore, A.; Pellegrino, M. S.. - In: STATISTICS IN MEDICINE. - ISSN 0277-6715. - 41:11(2022), pp. 1986-2004. [10.1002/sim.9341]

Robustness of κ-type coefficients for clinical agreement

Vanacore A.
;
Pellegrino M. S.
2022

Abstract

The degree of inter-rater agreement is usually assessed through (Formula presented.) -type coefficients and the extent of agreement is then characterized by comparing the value of the adopted coefficient against a benchmark scale. Through two motivating examples, it is displayed the different behavior of some (Formula presented.) -type coefficients due to asymmetric distribution of marginal frequencies over categories. In order to investigate the robustness of four (Formula presented.) -type coefficients for nominal and ordinal classifications and of an inferential benchmarking procedure that, differently from straightforward benchmarking, does not neglect the influence of the experimental conditions, an extensive Monte Carlo simulation study has been conducted. The robustness has been investigated for several scenarios, differing for sample size, rating scale dimension, number of raters, frequency distribution of rater classifications, pattern of agreement across raters. Simulation results reveal an higher paradoxical behavior of Fleiss kappa and Conger kappa with ordinal rather than nominal classifications; the coefficients robustness improves with increasing sample size and number of raters for both nominal and ordinal classifications whereas robustness improves with rating scale dimension only for nominal classifications. By identifying the scenarios (ie, minimum sample size, number of raters, rating scale dimension) with acceptable robustness, this study provides guidelines about the design of robust agreement studies.
2022
Robustness of κ-type coefficients for clinical agreement / Vanacore, A.; Pellegrino, M. S.. - In: STATISTICS IN MEDICINE. - ISSN 0277-6715. - 41:11(2022), pp. 1986-2004. [10.1002/sim.9341]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/880982
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact