Benchmarking rater agreement: probabilistic versus deterministic approach

Vanacore, Amalia; Pellegrino, Maria Sole

doi:10.1142/9789813274303_0037

In several industries strategic and operational decisions rely on subjective evaluations provided by raters who are asked to score and/or classify group of items in terms of some technical properties (e.g. classification of faulty material by defect type) and/or perception aspects (e.g. comfort, quality, pain, pleasure, aesthetics). Because of the lack of a gold standard for classifying subjective evaluations as “true” or “false”, rater reliability is generally measured by assessing her/his precision via inter/intra-rater agreement coefficients. Agreement coefficients are useful only if their magnitude can be easily interpreted. A common practice is to apply a straightforward procedure to translate the magnitude of the adopted agreement coefficient into an extent of agreement via a benchmark scale. Many criticisms have been attached to this practice and in order to solve some of them, the adoption of a probabilistic approach to characterize the extent of agreement is recommended. In this study some probabilistic benchmarking procedures are discussed and compared via a wide Monte Carlo simulation study.

Benchmarking rater agreement: probabilistic versus deterministic approach / Vanacore, Amalia; Pellegrino, MARIA SOLE. - 89:(2018), pp. 365-374. [10.1142/9789813274303_0037]