ICLEv2 CD-Rom: Frequently asked questions
What are the differences between ICLEv1 and ICLEv2?
How do I search for multiword units?
The corpus is available from www.i6doc.com
Have a look at http://www-igm.univ-mlv.fr/~unitex/index.html
We conducted a small pilot study at the Centre for English Corpus Linguistics in which we tagged 51 learner essays representing the 16 mother tongue backgrounds available in the International Corpus of Learner English (c. 42,000 words) and examined the success rate of the CLAWS tagger. All essays had an accuracy rate between 95% and 99.1%.
Multiword units can be retrieved with the < > symbols. If you want to extract all occurrences of the multiword unit "as far as", you should search for <as far as>. To make sure that all occurrences of "as far as" have been tagged as a multiword unit by CLAWS, you should also search for "as far as" without angle brackets.
Click on ‘Tools’ and ‘View word lists’. All sequences of words tagged as multiword units by CLAWS are listed under ‘compound lexical entries’.
“While lemmatizers are potentially very useful for lexical analyses of interlanguage, researchers have to be aware that only the standard realisations of a lemma will be retrieved, i.e. for the lemma LOSE, the standard forms lose/loses/losing/lost, but not the (sometimes equally frequent!) non-standard forms loose/looses/loosing/loosed” (Granger 2008).
It is important to bear in mind that if you look for the preposition 'up' (search query: <up.Prep>) in the corpus, instances of 'up' included in compound lexical entries will not be retrieved as the various tokens of a compound lexical entry are given a single POS tag (e.g. 'fed up' is tagged as an adjective and 'up to' as an adverb). To retrieve all compound lexical entries that include the token 'up', use the following query which involves a morphological filter:
<CDIC><<up>>: matches any multiword unit that is present in the corpus and which includes the token 'up'
To access the complete list of sequences of words tagged as compound lexical entries in ICLEv2, click on 'Tools' and select 'View word lists'. All sequences of words tagged as multiword units by CLAWS are listed under ‘compound lexical entries’.
Granger S. (2008) Learner Corpora. In Lüdeling, A. and M. Kytö (eds) Handbook on Corpus Linguistics. Mouton de Gruyter
| 5/04/2013 |