|
Corpus collection guidelines
The target size for each sub-corpus is 200,000 words. It may be possible to gather the full 200,000 words in one university or alternatively, if there is insufficient time or an insufficient number of students, collaboration with other universities (where the same language is spoken) may be sought. In order to reach the 200,000 word target, contributions from minimum 200 students are needed, as each student may only contribute up to 1,000 words.
ICLE data collection involves the following stages:
1. Request students to fill in a learner profile
The ICLE learner profile has been created in order to provide researchers with information about contributors which will enable meaningful conclusions to be drawn from the results obtained when the corpus is analysed. Using the profile, it will be possible both to draw general conclusions about advanced learner writing, and also to examine subsections e.g. Spanish mother tongue learners, learners who speak some English at home, learners for whom German is the second language and English is the third language. It will also be possible to examine more sociolinguistic aspects such as for instance male/female comparisons. If the corpus is used as a basis for developing specifically adapted teaching tools, the potential advantages of this facility are clear. 2. Collect the right type of material The corpus will consist entirely of essay writing. Two types of essay writing are useful: Argumentative essay writing Using titles such as the ones below:
These essays may be done by students in their own time (untimed), using language reference tools (dictionaries, grammars, etc.) but should be entirely the students' own work, i.e. they should not draw on other articles, books for the essay and should not ask a native speaker of English for help. Alternatively, they may also be done under examination conditions. Descriptive, narrative or technical subjects are not useful for the corpus. For this reason, the following types of titles should be avoided if possible:
These are in some ways easier to collect, but it should be remembered that they must be accompanied by relevant learner profiles. Literature examination papers should not amount to more than 25% of each national corpus. Essays can be completed at home (untimed) and should be at least 500 words long (up to 1,000). Work should be entirely the students' own, no help should be sought from third parties, but they may use reference tools such as dictionaries and grammar books (use of reference tools should be indicated on the learner profile questionnaire).
Important note: the essays should be at least 500 words long (up to 1,000). Leave all the spelling mistakes made by students. If you do not receive the essays in electronic form from the students, pay attention not to add spelling mistakes when keying in the data.
3. Format the files and send them to Louvain Contributors need to follow precise guidelines to format the files in a standardized way before sending them to Louvain.
|
6/08/2010
|
|
||||||||||||||