This paper discusses the role played by parallel corpora in the design and implementation of fully automatic machine translation (MT) systems, and also looks at their impact on the use of translation memory software, which represents one of the most popular computer-assisted translation (CAT) tools. The main features of example-based machine translation (EBMT) are outlined and compared with those of conventional rule-based machine translation (RBMT), emphasising in particular the viability of the corpus-based approach to MT for the so-called minority and low-density languages. Pure EBMT systems rely on the examples provided by an aligned parallel corpus stored in a database for a given language pair, and they lack explicitly encoded linguistic information. As a result, the most important knowledge basis is represented by a selection of real translation examples contained in written corpora. The paper looks at the crucial aspects of the choice of the parallel corpora that make up the collection of aligned textual data, along with the practical issues raised by the maintenance, improvement, debugging and scaling-up of such example-based MT engines. The second part of the paper is focused on the practical importance of aligned parallel corpora in the neighbouring area of computer-assisted translation, by considering in particular the popular translation memory (TM) tool. TMs allow professional translators to store, manage and retrieve bilingual passages and textual units of previous translations, so as to reuse them whenever this is appropriate and helpful. The degree of comprehensiveness and accuracy of the archives containing the multilingual data crucially affects the usefulness of translation memories, and the translation units stored in the linguistic database of such tools (an aligned parallel corpus) represent a valuable asset that can have a great impact on repetitive or similar translation projects from a practical point of view.

Relevance of Parallel Corpora to the Latest Developments of Machine Translation and Computer-assisted Translation

Gaspari F
2003

Abstract

This paper discusses the role played by parallel corpora in the design and implementation of fully automatic machine translation (MT) systems, and also looks at their impact on the use of translation memory software, which represents one of the most popular computer-assisted translation (CAT) tools. The main features of example-based machine translation (EBMT) are outlined and compared with those of conventional rule-based machine translation (RBMT), emphasising in particular the viability of the corpus-based approach to MT for the so-called minority and low-density languages. Pure EBMT systems rely on the examples provided by an aligned parallel corpus stored in a database for a given language pair, and they lack explicitly encoded linguistic information. As a result, the most important knowledge basis is represented by a selection of real translation examples contained in written corpora. The paper looks at the crucial aspects of the choice of the parallel corpora that make up the collection of aligned textual data, along with the practical issues raised by the maintenance, improvement, debugging and scaling-up of such example-based MT engines. The second part of the paper is focused on the practical importance of aligned parallel corpora in the neighbouring area of computer-assisted translation, by considering in particular the popular translation memory (TM) tool. TMs allow professional translators to store, manage and retrieve bilingual passages and textual units of previous translations, so as to reuse them whenever this is appropriate and helpful. The degree of comprehensiveness and accuracy of the archives containing the multilingual data crucially affects the usefulness of translation memories, and the translation units stored in the linguistic database of such tools (an aligned parallel corpus) represent a valuable asset that can have a great impact on repetitive or similar translation projects from a practical point of view.
File in questo prodotto:
File Dimensione Formato  
012 Relevance.pdf

Riservato

Dimensione 2.25 MB
Formato Adobe PDF
2.25 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11588/894265
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact