The seminar aims at providing theoretical and methodological background to Corpus Linguistics research, in terms of corpus creation, annotation and analysis. A corpus is a collection of naturally-occurring language text, chosen to characterize a state or variety of a language, a collection of texts representative of a given language put together for linguistic analysis. Corpus-based approaches to language analysis are used to expound, test or exemplify theories and descriptions that were formulated before large corpora became available to inform language study. Corpus-driven linguists are strictly committed to the integrity of the data as a whole. Theoretical statements are fully consistent with, and reflect directly, the evidence provided by the corpus. Corpus mark-up is the system of standard codes inserted into a document stored in electronic form to provide information about the text itself. The most widely used mark-up schemes are TEI (Text Encoding Initiative) and CES (Corpus Encoding Standard). Annotation makes extracting information easier, faster and enables human analysts to exploit and retrieve analyses of which they are not themselves capable. Annotated corpora are reusable resources. Corpus annotation records a linguistic analysis explicitly and provides a standard reference resource, a stable base of linguistic analyses, so that successive studies can be compared and contrasted. There are different types of corpora: parallel corpora (source texts plus translations) which can be either unidirectional (from La to Lb or from Lb to Lc alone) or bidirectional (from La to Lb and from Lb to La); comparable corpora (monolingual subcorpora designed using the same sampling techniques); general corpora (BNC, AMC); specialised corpora (MICASE); monitor corpora (Bank of English); reference corpora. Corpora can be used for a wide variety of language analyses. These range from lexicography/terminology to (computational) Linguistics, from dictionaries and grammars to (Critical) Discourse Analysis, from Translation practice and theory to Language teaching and learning. Basic notions of Corpus Linguistics methodology include; Concordance / Concordancer, Collocation (Lexis), Colligation (Grammar), Semantic Preference (Semantics), Discourse Prosody (Pragmatics), Paradigmatic and Syntagmatic Dimensions, the lexico-grammar approach, the idiom principle vs. open-choice principle. To know a word is to know how to use it since certain grammar attracts certain words. For example grammatical words like “a” and “the” are often used in phrases rather than being used independently, compare: “A free hand” vs. “her free hand”, “Hurt his leg” vs. “hit someone in the leg”, “Turn her face” vs. “a slap in the face”. During the seminar different software tools were presented, highlighting their similarities and differences. These include Xaira, WordSmith Tools, AntConc, Concgram as well as web-resources.

Analisi del discorso attraverso tecniche di ‘Concordancing’. Aspetti teorico metodologici / Venuti, Marco. - (2011).

Analisi del discorso attraverso tecniche di ‘Concordancing’. Aspetti teorico metodologici

VENUTI, MARCO
2011

Abstract

The seminar aims at providing theoretical and methodological background to Corpus Linguistics research, in terms of corpus creation, annotation and analysis. A corpus is a collection of naturally-occurring language text, chosen to characterize a state or variety of a language, a collection of texts representative of a given language put together for linguistic analysis. Corpus-based approaches to language analysis are used to expound, test or exemplify theories and descriptions that were formulated before large corpora became available to inform language study. Corpus-driven linguists are strictly committed to the integrity of the data as a whole. Theoretical statements are fully consistent with, and reflect directly, the evidence provided by the corpus. Corpus mark-up is the system of standard codes inserted into a document stored in electronic form to provide information about the text itself. The most widely used mark-up schemes are TEI (Text Encoding Initiative) and CES (Corpus Encoding Standard). Annotation makes extracting information easier, faster and enables human analysts to exploit and retrieve analyses of which they are not themselves capable. Annotated corpora are reusable resources. Corpus annotation records a linguistic analysis explicitly and provides a standard reference resource, a stable base of linguistic analyses, so that successive studies can be compared and contrasted. There are different types of corpora: parallel corpora (source texts plus translations) which can be either unidirectional (from La to Lb or from Lb to Lc alone) or bidirectional (from La to Lb and from Lb to La); comparable corpora (monolingual subcorpora designed using the same sampling techniques); general corpora (BNC, AMC); specialised corpora (MICASE); monitor corpora (Bank of English); reference corpora. Corpora can be used for a wide variety of language analyses. These range from lexicography/terminology to (computational) Linguistics, from dictionaries and grammars to (Critical) Discourse Analysis, from Translation practice and theory to Language teaching and learning. Basic notions of Corpus Linguistics methodology include; Concordance / Concordancer, Collocation (Lexis), Colligation (Grammar), Semantic Preference (Semantics), Discourse Prosody (Pragmatics), Paradigmatic and Syntagmatic Dimensions, the lexico-grammar approach, the idiom principle vs. open-choice principle. To know a word is to know how to use it since certain grammar attracts certain words. For example grammatical words like “a” and “the” are often used in phrases rather than being used independently, compare: “A free hand” vs. “her free hand”, “Hurt his leg” vs. “hit someone in the leg”, “Turn her face” vs. “a slap in the face”. During the seminar different software tools were presented, highlighting their similarities and differences. These include Xaira, WordSmith Tools, AntConc, Concgram as well as web-resources.
2011
Analisi del discorso attraverso tecniche di ‘Concordancing’. Aspetti teorico metodologici / Venuti, Marco. - (2011).
File in questo prodotto:
File Dimensione Formato  
programma.pdf

accesso aperto

Tipologia: Altro materiale allegato
Licenza: Dominio pubblico
Dimensione 98.35 kB
Formato Adobe PDF
98.35 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/392431
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact