This paper proposes and discusses the use of text mining techniques for the extraction of information from clinical records written in Italian. However, as it is very difficult and expensive to obtain annotated material for languages different from English, we only consider unsupervised approaches, where no annotated training set is necessary. We therefore propose a complete system that is structured in two steps. In the first one domain entities are extracted from the clinical records by means of a metathesaurus and standard natural language processing tools. The second step attempts to discover relations between the entity pairs extracted from the whole set of clinical records. For this last step we investigate the performance of unsupervised methods such as clustering in the space of entity pairs, represented by an ad hoc feature vector. The resulting clusters are then automatically labelled by using the most significant features. The system has been tested on a fairly large data set of clinical records in Italian, investigating the variation in the performance adopting different similarity measures in the feature space. The results of our experiments show that the unsupervised approach proposed is promising and well suited for a semi-automatic labelling of the extracted relations.

Unsupervised entity and relation extraction from clinical records in Italian / Alicante, Anita; Corazza, Anna; Isgro', Francesco; Silvestri, Stefano. - In: COMPUTERS IN BIOLOGY AND MEDICINE. - ISSN 0010-4825. - (2016). [10.1016/j.compbiomed.2016.01.014]

Unsupervised entity and relation extraction from clinical records in Italian

ALICANTE, ANITA;CORAZZA, ANNA;ISGRO', FRANCESCO;SILVESTRI, STEFANO
2016

Abstract

This paper proposes and discusses the use of text mining techniques for the extraction of information from clinical records written in Italian. However, as it is very difficult and expensive to obtain annotated material for languages different from English, we only consider unsupervised approaches, where no annotated training set is necessary. We therefore propose a complete system that is structured in two steps. In the first one domain entities are extracted from the clinical records by means of a metathesaurus and standard natural language processing tools. The second step attempts to discover relations between the entity pairs extracted from the whole set of clinical records. For this last step we investigate the performance of unsupervised methods such as clustering in the space of entity pairs, represented by an ad hoc feature vector. The resulting clusters are then automatically labelled by using the most significant features. The system has been tested on a fairly large data set of clinical records in Italian, investigating the variation in the performance adopting different similarity measures in the feature space. The results of our experiments show that the unsupervised approach proposed is promising and well suited for a semi-automatic labelling of the extracted relations.
2016
Unsupervised entity and relation extraction from clinical records in Italian / Alicante, Anita; Corazza, Anna; Isgro', Francesco; Silvestri, Stefano. - In: COMPUTERS IN BIOLOGY AND MEDICINE. - ISSN 0010-4825. - (2016). [10.1016/j.compbiomed.2016.01.014]
File in questo prodotto:
File Dimensione Formato  
CBM2016sottomessa.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Accesso privato/ristretto
Dimensione 1.7 MB
Formato Adobe PDF
1.7 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/626737
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 45
  • ???jsp.display-item.citation.isi??? 32
social impact