This paper aims to automatically identify which linguistic phenomena represent barriers to better MT quality. We focus on the translation of news data for two bidirectional language pairs: EN↔ES and EN↔DE. Using the diagnostic MT evaluation toolkit DELiC4MT and a set of human reference translations, we relate translation quality barriers to a selection of 9 source-side PoS-based linguistic checkpoints. Using output from the winning SMT, RbMT, and hybrid systems of the WMT 2013 shared task, translation quality barriers are investigated (in relation to the selected linguistic checkpoints) according to two main variables: (i) the type of the MT approach, i.e. statistical, rule-based or hybrid, and (ii) the human evaluation of MT output, ranked into three quality groups corresponding to good, near miss and poor. We show that the combination of manual quality ranking and automatic diagnostic evaluation on a set of PoS-based linguistic checkpoints is able to identify the specific quality barriers of different MT system types across the four translation directions under consideration.

Relating Translation Quality Barriers to Source-Text Properties / Gaspari, F; Antonio, Toral; Arle, Lommel; Stephen, Doherty; Josef van, Genabith; Andy, Way. - (2014), pp. 61-70. (Intervento presentato al convegno LREC 2014, Ninth International Conference on Language Resources and Evaluation tenutosi a Reykjavik nel 26-31 May 2014).

Relating Translation Quality Barriers to Source-Text Properties

Gaspari F;
2014

Abstract

This paper aims to automatically identify which linguistic phenomena represent barriers to better MT quality. We focus on the translation of news data for two bidirectional language pairs: EN↔ES and EN↔DE. Using the diagnostic MT evaluation toolkit DELiC4MT and a set of human reference translations, we relate translation quality barriers to a selection of 9 source-side PoS-based linguistic checkpoints. Using output from the winning SMT, RbMT, and hybrid systems of the WMT 2013 shared task, translation quality barriers are investigated (in relation to the selected linguistic checkpoints) according to two main variables: (i) the type of the MT approach, i.e. statistical, rule-based or hybrid, and (ii) the human evaluation of MT output, ranked into three quality groups corresponding to good, near miss and poor. We show that the combination of manual quality ranking and automatic diagnostic evaluation on a set of PoS-based linguistic checkpoints is able to identify the specific quality barriers of different MT system types across the four translation directions under consideration.
2014
Relating Translation Quality Barriers to Source-Text Properties / Gaspari, F; Antonio, Toral; Arle, Lommel; Stephen, Doherty; Josef van, Genabith; Andy, Way. - (2014), pp. 61-70. (Intervento presentato al convegno LREC 2014, Ninth International Conference on Language Resources and Evaluation tenutosi a Reykjavik nel 26-31 May 2014).
File in questo prodotto:
File Dimensione Formato  
PDFsam_merge.pdf

non disponibili

Dimensione 897.15 kB
Formato Adobe PDF
897.15 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/894223
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact