The paper goes through some tools considered nowadays classical in Text Mining procedures and software. We are speaking of Latent Semantic Indexing for dimensionality reduction, and the wide literature devoted to the problem of how to weight the word importance, and how to measure similarities between words and between words and queries. Visualisation is strongly affected by these choices. Here we compare some alternatives from a statistical viewpoint. A corpus consisting of six years of the Italian edition of Le Monde Diplomatique is analysed in order to show the effects of the different weighting systems together with the potentiality of Textual Data Analysis in summarising and representing newspaper information.

Pesi e Metriche nell'Analisi dei Dati Testuali

BALBI, SIMONA;
2005

Abstract

The paper goes through some tools considered nowadays classical in Text Mining procedures and software. We are speaking of Latent Semantic Indexing for dimensionality reduction, and the wide literature devoted to the problem of how to weight the word importance, and how to measure similarities between words and between words and queries. Visualisation is strongly affected by these choices. Here we compare some alternatives from a statistical viewpoint. A corpus consisting of six years of the Italian edition of Le Monde Diplomatique is analysed in order to show the effects of the different weighting systems together with the potentiality of Textual Data Analysis in summarising and representing newspaper information.
File in questo prodotto:
File Dimensione Formato  
pesiemetriche.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Accesso privato/ristretto
Dimensione 231.71 kB
Formato Adobe PDF
231.71 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11588/100139
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact