To alleviate the scarcity of manually annotated data in Named Entity Recognition (NER) tasks, data augmentation methods can be applied to automatically generate labeled data and improve performance of existing methods. However, based on manipulations of the input text, current techniques may generate too many noisy and mislabeled samples. In this paper we propose COntext SImilarity-based data augmentation for NER (COSINER), a method for NER data augmentation based on context similarity, i.e. we replace entity mentions with the most plausible ones based on the available training data and the contexts in which entities usually appear. We conduct experiments on popular benchmark datasets, showing that our method outperforms current baselines in various few-shot scenarios, where training data is assumed to be strongly limited. Experimental results show that not only does COSINER overcome baselines in terms of NER performances in highly-limited scenarios (2%, 5%), but also its computing times are comparable to simplest augmentation methods.

COSINER: COntext SImilarity data augmentation for Named Entity Recognition / Bartolini, I.; Moscato, V.; Postiglione, M.; Sperli', G.; Vignali, A.. - 13590:(2022), pp. 11-24. [10.1007/978-3-031-17849-8_2]

COSINER: COntext SImilarity data augmentation for Named Entity Recognition

Bartolini I.;Moscato V.;Postiglione M.;Sperli' G.;Vignali A.
2022

Abstract

To alleviate the scarcity of manually annotated data in Named Entity Recognition (NER) tasks, data augmentation methods can be applied to automatically generate labeled data and improve performance of existing methods. However, based on manipulations of the input text, current techniques may generate too many noisy and mislabeled samples. In this paper we propose COntext SImilarity-based data augmentation for NER (COSINER), a method for NER data augmentation based on context similarity, i.e. we replace entity mentions with the most plausible ones based on the available training data and the contexts in which entities usually appear. We conduct experiments on popular benchmark datasets, showing that our method outperforms current baselines in various few-shot scenarios, where training data is assumed to be strongly limited. Experimental results show that not only does COSINER overcome baselines in terms of NER performances in highly-limited scenarios (2%, 5%), but also its computing times are comparable to simplest augmentation methods.
2022
978-3-031-17848-1
978-3-031-17849-8
COSINER: COntext SImilarity data augmentation for Named Entity Recognition / Bartolini, I.; Moscato, V.; Postiglione, M.; Sperli', G.; Vignali, A.. - 13590:(2022), pp. 11-24. [10.1007/978-3-031-17849-8_2]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/915664
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact