Learner corpus research
The CECL is renowned for its compilation of learner corpora (some of them still ongoing). These include:
When collecting corpus data, the CECL always takes care to record a series of important variables that are known to influence learner production. Information for each of the variables is obtained via learner questionnaires and subsequently encoded in the learner corpora where the variables can be used as search criteria. Hence, by selecting variables such as the learners’ L1, the assigned topic, the task setting (i.e. whether the task was timed or untimed and whether reference tools were allowed), the time spent in an English speaking country, or the learning context (i.e. the amount of exposure to English in the learners’ native countries), researchers can compile their customised learner corpus and compare for instance the number and type of errors included in timed and untimed essays or the use of academic vocabulary by learners from different mother tongue backgrounds.
Going hand in hand with the aforementioned learner corpora are two learner-corpus-based methodologies that are widely used by the CECL, viz. contrastive interlanguage analysis (CIA) (Granger 1996) and computer-aided error analysis (CEA) (Dagneaux et al. 1998) (see learner corpus bibliography). CIA involves comparing either learner data with native data (L2 vs L1) or different types of learner data (L2 vs L2) while CEA aims to carry out a detailed analysis of the authentic errors found in learner corpora. These two methodologies have brought to light a number of findings concerning three learner phenomena, namely underuse and overuse, i.e. elements which learners use significantly more or significantly less than their native speaker counterparts, as well as misuse, i.e. learners’ authentic errors. CEA is largely dependent upon prior annotation of the data in the form of error tagging. The CECL has developed its own error tagging system for both L2 English and L2 French. The English 'error toolkit' contains a comprehensive error tagging manual (Dagneaux et al. 2008) which explains each of the 50-plus error tags, and the Université catholique de Louvain Error Editor (UCLEE) software which helps with the insertion of the error tags and the corrections in the data. Error-tagged learner corpora are a valuable resource, be it for teaching, lexicographical or testing purposes. The get-it-right boxes in the Macmillan English Dictionary for Advanced Learners (MED2), which were developed on the basis of an error-tagged version of ICLE, are testimony to this.
The CECL organized the first learner corpus symposium in Louvain-la-Neuve in 1995 and co-organized a second symposium with the Chinese University of Hong Kong in Hong Kong in 1998. With a view to helping researchers analyze learner corpus data, the CECL organized three summer schools (2004, 2006, 2007) which brought together both senior and junior researchers from a wide range of countries internationally. In 2008 the CECL organized a colloquium entirely devoted to the collection and analysis of spoken learner corpora.
On 15-17 September 2011 the CECL will host an international conference entitled “20 years of learner corpus research: looking back, moving ahead” to mark the 20th anniversary of its creation.
| 6/08/2010 |