The paper goes through some tools considered nowadays classical in Text Mining procedures and software. We are speaking of Latent Semantic Indexing for dimensionality reduction, and the wide literature devoted to the problem of how to weight the word importance, and how to measure similarities between words and between words and queries. Visualisation is strongly affected by these choices. Here we compare some alternatives from a statistical viewpoint. A corpus consisting of six years of the Italian edition of Le Monde Diplomatique is analysed in order to show the effects of the different weighting systems together with the potentiality of Textual Data Analysis in summarising and representing newspaper information.
Pesi e Metriche nell'Analisi dei Dati Testuali / Balbi, Simona; Misuraca, M.. - In: QUADERNI DI STATISTICA. - ISSN 1594-3739. - STAMPA. - 7:(2005), pp. 55-68.
Pesi e Metriche nell'Analisi dei Dati Testuali
BALBI, SIMONA;
2005
Abstract
The paper goes through some tools considered nowadays classical in Text Mining procedures and software. We are speaking of Latent Semantic Indexing for dimensionality reduction, and the wide literature devoted to the problem of how to weight the word importance, and how to measure similarities between words and between words and queries. Visualisation is strongly affected by these choices. Here we compare some alternatives from a statistical viewpoint. A corpus consisting of six years of the Italian edition of Le Monde Diplomatique is analysed in order to show the effects of the different weighting systems together with the potentiality of Textual Data Analysis in summarising and representing newspaper information.File | Dimensione | Formato | |
---|---|---|---|
pesiemetriche.pdf
non disponibili
Tipologia:
Documento in Post-print
Licenza:
Accesso privato/ristretto
Dimensione
231.71 kB
Formato
Adobe PDF
|
231.71 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.