Outliers are observations that are particularly discordant with respect to others, lying hence on the periphery of the data region. In the literature, many tools have been proposed with the aim of detecting multiple outliers. Most of the recent and attractive methods are based on some measure of the distance of each data point from a center. However, they are really effective only if the shape of the data scatter is symmetrical with respect to such a center. Otherwise, asymmetry will make these measures misleading. For this reason, we propose a method that allows direct exploration of the periphery of the data scatter, without considering any center. The methodology we propose is based on a two-step procedure that exploits the sample convex hull and radial projections. It explores gaps in the data scatter and proximities to its boundary, highlighting how the data structure is sparse at its periphery. A complementary graphical display is finally offered as a useful tool to visualize boundary features.
Exploring the Periphery of Data Scatters: Are There Outliers? / Ragozini, Giancarlo; Porzio, G. C.. - (2000), pp. 235-240.
Exploring the Periphery of Data Scatters: Are There Outliers?
RAGOZINI, GIANCARLO;
2000
Abstract
Outliers are observations that are particularly discordant with respect to others, lying hence on the periphery of the data region. In the literature, many tools have been proposed with the aim of detecting multiple outliers. Most of the recent and attractive methods are based on some measure of the distance of each data point from a center. However, they are really effective only if the shape of the data scatter is symmetrical with respect to such a center. Otherwise, asymmetry will make these measures misleading. For this reason, we propose a method that allows direct exploration of the periphery of the data scatter, without considering any center. The methodology we propose is based on a two-step procedure that exploits the sample convex hull and radial projections. It explores gaps in the data scatter and proximities to its boundary, highlighting how the data structure is sparse at its periphery. A complementary graphical display is finally offered as a useful tool to visualize boundary features.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.