| Corpus |
Target
language |
First
language |
Medium |
Text type/ task type |
Proficiency level |
Size
in words |
Project director |
Availability |
The Pilot Arabic Learner Corpus  |
Arabic |
English |
written |
|
Intermediate and advanced |
c. 9,000 |
Ghazi Abuhakema
Reem Faraj
Anna Feldman
Montclair State University, USA |
|
The AKCES/CZESL corpus (Akvizièni korpusy èetiny - Acquisition corpora of the Czech language/Czech as a second language  |
Czech |
various |
written and spoken |
student essays
interviews |
various |
2 m |
Charles University in Prague
Technical University in Liberec, Czech Republic |
under development |
| Leerdercorpus Nederlands als Vreemde Taal |
Dutch |
French |
written |
|
|
|
Université catholique de Louvain, Belgium |
|
The ANGLISH corpus  |
English |
French |
spoken |
Readings of texts and sentences, spontaneous oral language. |
various |
|
University of Provence, France. |
Freely available |
| Asao Kojiro’s Learner Corpus Data |
English |
Japanese |
written |
Essays and stories written or reproduced by Japanese college students. |
|
|
|
Texts available for download |
| The Barcelona English Language Corpus (BELC) |
English |
Spanish
Catalan |
spoken and written |
4 tasks:
Written composition
Oral narrative
Oral interview
Role-play
Longitudinal data (children and young adults learning English)
|
|
|
University of Barcelona, Spain |
Spoken components freely available |
| The Bilingual Corpus of Chinese English Learners (BICCEL) |
English |
Chinese |
spoken and written |
Spoken: National Oral English test.
Written: in-class assignments
|
|
c. 2 m |
National Research Center for Foreign Language Education Beijing Foreign Studies University, China |
|
| The Br-ICLE corpus(Brazilian component of ICLE) |
English |
Brazilian Portuguese |
written |
Argumentative and literary essays |
|
|
Catholic University of São Paulo
University of São Paulo, Brazil |
Restricted online access |
| The British Academic Written English (BAWE) corpus |
English |
Mainly L1 speakers but also includes data produced by L2 speakers |
written |
ESP papers  |
4 levels of study (from undergraduate levels to final year and taught masters level)
|
c. 6,5 m |
Sheena Gardner
Warwick, UK
University of Birmingham, UK
Paul Wickens
Oxford Brookes, UK
|
The BAWE corpus can be accessed through the corpus analysis interface, Sketch Engine.
A prototype interface that allows filtered searching of the BAWE corpus files is available.
|
The BUiD Arab Learner Corpus (BALC)  |
English |
Arabic |
written |
School examination essays |
various |
287,227 |
The British University in Dubai,
United Arab Emirates
University of Birmingham, UK |
At present, copies of the current version of the corpus is available on request from |
| The Cambridge Learner Corpus (CLC) |
English |
various |
written |
Exam scripts |
various |
c. 25 m - still expanding |
Cambridge University Press and Cambridge ESOL, UK |
commercial |
| The Corpus of Academic Learner English (CALE) |
English |
German |
written |
Various academic text types that are typically produced in university courses of English, e.g. term papers, reading reports, research plans, abstract, reviews, and summaries. |
advanced |
under development |
Johannes-Gutenberg Universität Mainz, Germany |
|
| The Corpus of English Essays Written by Asian University Students (CEEAUS) |
English |
various |
written |
Student essays |
various |
c. 200,000 |
Kobe University, Japan |
Freely downloadable from the website |
| The Chinese Academic Written English (CAWE) corpus |
English |
Chinese |
written |
Dissertations written by Chinese undergraduates majoring in English linguistics or applied linguistics. |
|
407,960 |
City University of Hong Kong, Hong Kong |
|
| The Chinese Learner English Corpus (CLEC) |
English |
Chinese |
written |
|
various |
1 m |
Gui Shichun
Guangdong University of Foreign Studies & Yang Huizhong, Shanghai Jiatong, China |
The corpus can only be accessed by users in the Department of English at HKPU. |
| The City University Corpus of Academic Spoken English (CUCASE) |
English |
Chinese (but also includes data produced by L1 speakers) |
multimedia |
|
|
2 m |
City University of Hong Kong, Hong Kong |
|
| The Cologne-Hanover Advanced Learner Corpus (CHALC) |
English |
German |
written |
term papers and essays |
advanced |
c. 210,000 |
University of Michigan, USA |
|
| College Learners’ Spoken English Corpus (COLSEC) |
English |
Chinese |
spoken |
National spoken English test for non-English majors. |
|
700,000 |
Yang and Wei |
|
The Corpus Archive of Learner English in Sabah/Sarawak (CALES)  |
English |
Malay |
written |
Argumentative essays |
various |
c. 400,000 |
Simon Botley@Faizal Hakim
Doreen Dillah
Universiti Teknologi MARA Sarawak, Malaysia |
|
| The Corpus of Young Learner Interlanguage (CYLIL) |
English |
various:
Dutch
French
Greek
Italian
|
spoken |
English L2 data elicited from European School pupils.
Longitudinal data |
various |
c. 500,000 |
Vrije Universiteit Brussel, Belgium |
|
| The Eastern European English learner corpus |
English |
Russian
Ukrainian
Polish
Slovak |
spoken |
Spontaneaous spoken production data elicited by means of a semi-structured interview |
various |
c. 60,000 |
Eberhard Karls University of Tübingen, Germany |
|
| The English of Malaysian School Students corpus (EMAS) |
English |
Malay |
written |
Student essays |
various |
c. 500,000 |
et al.
Universiti Putra Malaysia, Malaysia |
|
The English Speech Corpus of Chinese Learners (ESCCL)  |
English |
Chinese |
spoken |
Dialogue reading-aloud |
Middle school and college |
|
Chen Hua
Nantong University, China
Wen Qiufang
Beijing Foreign Studies University, China
Li Aijun
Chinese Academy of Social Sciences, China |
|
The EVA Corpus of Norwegian School English  |
English |
Norwegian |
spoken |
Picture-based tasks |
|
35,000 |
Angela Hasselgren
University of Bergen, Norway |
Searchable online |
| The GICLE corpus (German component of ICLE) |
English |
German |
written |
Mainly non-academic argumentative essays |
advanced |
c. 234,000 |
|
|
The Giessen-Long Beach Chaplin Corpus (GLBCC)  |
English |
German |
spoken |
Transcribed interactions between native English speakers, ESL and EFL speakers |
|
350,000 |
Andreas Jucker
Sara Smith
University of Giessen, Germany |
Restricted use: apply for approval to get a copy. |
| The Hong Kong University of Science & Technology (HKUST) learner corpus |
English |
Chinese - mostly Cantonese |
written |
Untimed assignments written for EFL courses and school leaving exams |
University and advanced high school students |
25 m |
Hong Kong University of Science &Technology, Hong Kong |
|
| The Indianapolis Business Learner Corpus (IBLC) |
English |
various |
written |
Job application letters and résumés of business communication students from the U.S., Belgium, Finland, Germany, and Thailand, spanning the years 1990-1998 |
|
|
Thomas Albin Upton
Indiana University, USA |
|
| The International Corpus of Crosslinguistic Interlanguage (ICCI) |
English |
various |
written |
|
|
|
Yukio Tono
Tokyo University of Foreign Studies, Japan |
Searchable online |
| The International Corpus Network of Asian Learners of English (ICNALE) |
English |
Chinese
Indonesian
Japanese
Koren
Malay
etc. |
written |
Short argumentative essays (topic, time, length and dictionary use are all controlled) |
various |
300,000 (estimated goal: 1 m) |
Kobe University, Japan |
Freely available |
| The International Corpus of Learner English (ICLE) |
English |
various |
written |
Argumentative and literary essays |
High-intermediate to advanced |
3 m |
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium |
CD-Rom: order online. |
| The International Teaching Assistants corpus (ITAcorp) |
English |
various |
spoken |
Learner language from a variety ofspoken classroom tasks: office hours role plays, presentations, discussions |
|
c. 500,000 |
Pennsylvania State University, USA |
|
| The ISLE speech corpus |
English |
German
Italian |
spoken |
Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). |
Intermediate |
|
|
CD-Rom |
| The Israeli Learner Corpus of Written English |
English |
Hebrew |
written |
Argumentative and descriptive essays |
|
c. 750,000 |
Kibbutzim College of Education, Israel |
|
| The Japanese English as a Foreign Language Learner (JEFLL) Corpus |
English |
Japanese |
written |
Student essays |
From beginning to intermediate |
c. 700,000 |
Yukio Tono, Meikai University, Japan
|
The JEFLL Corpus will be freely available for research, first via the web query system (already available in Japanese) and then the entire data will be distributed under license in the future. |
| The Janus Pannonius University (JPU) Corpus |
English |
Hungarian |
written |
Essays and research papers |
University students |
c. 500,000 |
University of Pécs, Hungary |
Searchable online |
| Lancaster Corpus of Academic Written English (LANCAWE) |
English |
various |
written |
IELTS academic writing tests (descriptive and argumentative tasks); assignments.
Longitudinal data. |
|
|
|
|
| The LeaP Corpus :Learning Prosody in a Foreign Language |
English |
German |
spoken |
Four types of speech styles were recorded:
- nonsense word lists
- readings of a short story
- retellings of the story
- free speech in an interview situation |
various |
|
Albert-Ludwigs-University Freiburg, Germany |
The annotated corpus is available to the scientific community. Please contact at the University of Augsburg. |
| The Learner Corpus of English for Business Communication |
English |
|
|
Different types of business correspondence written for simulated business situations, including memos, faxes, reports, letters of enquiry and complaint letters. |
|
117,500 |
Hong Kong Polytechnic University, Hong Kong |
Searchable online |
| The Learner Corpus of Essays and Reports |
English |
|
|
Essays and project reports covering a range of topics from Science, IT and New Media to Nursing, Business and Economics, and the Social Sciences. |
|
188,000 |
Sima Sengupta
Hong Kong Polytechnic University, Hong Kong
|
Searchable online |
| A Learners' Corpus of Reading Texts |
English |
French |
spoken |
Unprepared reading of English texts.
The texts are short abstracts of fiction or made-up dialogues. |
|
|
Sophie Herment
Valérie Kerfelec
Laetitia Leonarduzzi
Gabor Turcsan |
Freely available |
| The LONGDALE project: LONGitudinal DAtabase of Learner English |
English |
various |
spoken and written |
Range of text types/task types.
Longitudinal data. |
From intermediate to advanced |
|
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium |
under development |
| The Longman Learners' Corpus |
English |
various |
written |
Essays and exam scripts |
various |
10 m |
Longman |
commercial |
| The Louvain International Database of Spoken English Interlanguage (LINDSEI) |
English |
various |
spoken |
Interviews and picture descriptions |
High-intermediate to advanced |
c. 800,000 |
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium |
CD-Rom: order online |
| The Malaysian Corpus of Learner English (MACLE) |
English |
Malay |
written |
|
|
|
Gerry Knowles
Zuraidah Mohd. Don
University of Malay, Malaysia |
Available in ICLE |
| The Michigan Corpus of Academic Spoken English (MICASE) |
English |
Mainly L1 speakers but also includes data produced by L2 speakers |
spoken |
Transcipts of academic speech events |
|
c. 1,8 m |
Ute Römer
University of Michigan, USA
|
Searchable online |
| The Michigan Corpus of Upper-level Student Papers (MICUSP) |
English |
semi-balanced sample of native and non-native speakers of English |
written |
ESP papers
A-grade papers or ungraded papers that have been assessed and accepted (such as research proposals), but not published |
|
c. 2,6 m |
Ute Römer
University of Michigan, USA
|
Searchable online |
| The Montclair Electronic Language Database (MELD) |
English |
various |
written |
Student essays |
various |
c. 100,000 |
Monclair State University, USA |
Searchable online |
| The Multimedia Adult ESL Learner Corpus (MAELC) |
English |
ESL environment |
multimedia |
Video of classroom interaction and associated written materials |
From beginning to upper-intermediate |
|
Stephen Reder
Kathryn Harris
Kristen Setzler
Portland State University, USA
|
The Lab School would like to share the extensive resources from MAELC with interested researchers and teacher trainers. Those interested should make inquiries to the Lab School by . |
| The Neungyule Interlanguage Corpus of Korean Learners of English (NICKLE) |
English |
Korean |
spoken and written |
Written part: student essays
Spoken part: student interviews and oral speech tests transcriptions
|
Mainly from beginning to intermediate |
c. 890,000 (spoken: c. 100,000) |
Yonsei University, Seoul, Korea |
The corpus will be available to the scientific community for research purposes upon request. |
The NICT JLE (Japanese Learner English) Corpus  |
English |
Japanese |
spoken |
English oral proficiency interview test |
various |
2 m |
National Institute of Information and Communications Technology, Kyoto, Japan. |
CD-Rom (Japanese page) |
| The Łódź Polish English Learner Corpus (LPELC) |
English |
Polish |
spoken and written |
Written: Argumentative, descriptive, narrative and quasi-academic essays; formal letters |
From beginning to post-advanced |
under development: The empirical basis for this project is a 3-million word corpus of spoken (200,000 words) and written texts by Polish learners of English. |
University of Lodz, Poland |
SQL Queries Directory |
| The PICLE corpus (Polish component of ICLE) |
English |
Polish |
written |
Student essays |
advanced |
330,000 |
AMU, Poznan, Poland |
Searchable online |
The Qatar learner corpus  |
English |
Arabic (mostly from Qatar) |
spoken |
spoken interviews with Qatari learners of English |
|
|
Yun Zhao
Carnegie Mellon University, USA |
Freely available |
The Québec learner corpus  |
English |
From (from Québec) |
written |
Argumentative essays |
Intermediate and advanced |
c. 250,000 |
Université du Québec à Montréal, Canada |
|
| The Romanian Corpus of Learner English (RoCLE) |
English |
Romanian |
written |
Student essays. |
|
|
Zurich University, Switzerland |
|
| The Santiago University Learner of English Corpus (SULEC) |
English |
Spanish |
spoken and written |
Written: compositions or argumentative essays.
Spoken: semistuctured interviews, short oral presentations and brief story descriptions.
|
|
|
|
|
| The Scientext English Learner Corpus |
English |
French |
written |
Academic argumentative texts |
|
|
|
Searchable online |
| The Seoul National University Korean-speaking English Learner Corpus (SKELC) |
English |
Korean |
written |
Student essays |
various |
c. 900,000 |
Seoul National University
Korea |
|
| The SILS Learner Corpus of English |
English |
various (mainly Japanese) |
written |
Student essays |
Basic, intermediate and advanced |
|
Waseda University, Japan |
|
| The Soochow Colber Student Corpus (SCSC) |
English |
Chinese |
written |
Student essays |
|
227,000 |
Colman Bernath
Soochow University, Taiwan |
|
| The Spoken and Written English Corpus of Chinese Learners (SWECCL) |
English |
Chinese |
spoken (SECCL) and written (WECCL) |
Written: argumentative and narrative essays.
Spoken: National Spoken English Test – longitudinal data
|
|
c. 2 m |
Wei Qiufang
Liang Maocheng
Wang Lifei |
Searchable online |
The Taiwanese Corpus of Learner English (TLCE)  |
English |
Chinese |
written |
Journals and essays (descriptive, narrative, expository, argumentative) |
from intermediate to advanced |
c. 2 m |
Rebecca Hsue-Huch Shih
Sun Yat-sen University, Taiwan |
|
| The Tawainese learner academic writing corpus (TaiwanLAWC) |
English |
Chinese |
written |
Theses and dissertations written by Taiwanese graduate students. |
|
|
National Taiwan Normal University, Taiwan |
|
|
The TELEC Secondary Learner Corpus (TSLC)
|
English |
Chinese |
written |
|
|
1,5 m |
University of Hong Kong, Hong Kong |
|
| The Telecollaborative Learner Corpus of English and German Telekorp |
English |
German |
written |
Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005. |
|
c. 1,5 m |
Pennsylvania State University, USA. |
Not publicly available |
| The Tswana Learner English Corpus (TLEC) |
English |
Tswana |
written |
Argumentative essays |
Advanced |
c. 200,000 |
North-West University, South Africa |
Available in ICLE |
| The Uppsala Student English Corpus (USE) |
English |
Swedish |
written |
student essays |
various |
1,221,265 |
Uppsala University, Sweden |
The corpus can be used for research and educational purposes. It can be accessed on the Internet from the Oxford Text Archive. |
| The UPF Learner Translation Corpus |
English |
Catalan |
written |
Translations written by the students of the Translation and Interpreting degree at UPF. |
|
under development |
Pompeu Fabra University, Barcelona, Spain |
|
| The UPV Learner Corpus |
English |
Catalan |
written |
essays |
various |
150,000 |
Universitat Politècnica de València, Spain |
|
| The Varieties of English for Specific Purposes dAtabase (VESPA) learner corpus |
English |
various |
written |
ESP texts (term papers, reports, MA dissertations) |
various |
under development |
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium |
under development |
| The WriCLE (Written Corpus of Learner English) corpus |
English |
Spanish |
written |
essays |
various |
c. 750,000 |
Universidad Autonoma de Madrid, Spain |
The corpus is available for free, and can be downloaded from this website. There is also a search interface to retrieve sentences and clauses. |
| The Estonian Interlanguage Corpus (EIC) of Tallinn University |
Estonian |
Russian
Finnish
English
German
Latvian
Lithuanian
Ukrainian
Belorussian |
written |
Spontaneously produced texts in language learning situations: argumentative and literary essays, written stories, letters, term papers, reading reports. |
A1-C2 |
1,145,794 |
Project director:
Tallinn University, Estonia |
Restricted online access |
| The International Corpus of Learner Finnish (ICLFI) |
Finnish |
various |
written |
Finnish learners’ spontaneously produced texts in language learning situations |
|
under development |
University of Oulu, Finland |
|
| The Chy-FLE (Cypriot Learner Corpus of French) |
French |
Modern Greek
(and Cypriot Greek) |
written |
Argumentative and descriptive essays |
From intermediate to advanced |
c. 250,000 (under development) |
Université de Poitiers, France
In collaboration with the University of Cyprus |
|
The COREIL corpus  |
French
English |
|
spoken |
|
|
|
Université Paris-Diderot, France |
|
| The "Dire Autrement" corpus |
French (Second Language) |
Mainly L1 speakers of English |
written |
Narrative, injunctive, persuasivle and informative texts |
|
48,114 |
Jasmina Milicevic
Dalhousie University, Canada |
|
| French Interlanguage Database (FRIDA) |
French |
various |
written |
|
|
|
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium |
|
| French Learner Language Oral Corpora (FLLOC) |
French |
various |
spoken |
See description of the 7 corpora |
various |
|
Newcastle University
University of Southampton, UK |
The contents of the database are being made freely available to the research community, in the form of digital sound files and related transcripts formatted using CHILDES software.
Searchable online
|
| The InterFra corpus |
French |
Swedish |
spoken |
Interviews, retellings of video clips and picture stories |
various |
|
Stockholm University, Sweden.
|
The contents of the database are meant to be available to the research community in the form of digital audio files and related transcripts formatted using XML software. |
| The "Interphonologie du Français Contemporain" (IPFC) corpus |
French |
Cypriot Greek
Dutch
English (Canada)
German
Japanese
Norwegian
Spanish
|
spoken |
Reading aloud, repeating words, guided interviews, interactions between two learners. |
various |
under development |
Waseda University, Japan
Université de Rouen, France
Université de Genève, Switzerland
Tokyo University of Foreign Studies, Japan |
under development |
| The LCF corpus (Learner Corpus French) |
French |
Dutch |
written |
Argumentative essays
Informative texts
Journalistic texts
Formal letters
Summaries
Written compositions by Flemish students of French
|
From intermediate to advanced |
490,000 |
K.U.Leuven Campus Kortrijk, UGent and Lessius
|
under development |
| The Lund CEFLE Corpus (Corpus Écrit de Français Langue Étrangère) |
French |
Swedish |
written |
Descriptive and narrative essays; picture-based stories. |
various |
100,000 |
Lund University, Sweden |
A sub-part of the corpus is available online. |
The UWi (University of the West Indies) learner corpus  |
French |
English and Jamaican Creole |
spoken |
Conversations during oral exams and in informal contexts |
various |
|
University of New South Wales, Sydney, Australia |
|
The LIPS corpus (Lexicon of Spoken Italian by Foreigners)  |
Italian |
various |
spoken |
Proficiency exams of the Certification of Italian as a Foreign Language (CILS) |
A1-C2 |
c. 700,000 |
Università per Stranieri di Siena, Italy |
|
| The AleSKO corpus |
German |
Chinese (but also German L1 data from the FALKO corpus) |
written |
Argumentative essays |
|
|
University of Konstanz, Germany
Vilnius Pedagogical University, Lithuania. |
|
| Analyzing Discourse Strategies: A Computer Learner Corpus |
German |
English
(mainly American English) |
written |
Threaded Discussion
Chat
Essays
Longitudinal data |
From beginner to intermediate-mid |
under development |
University of Pennsylvania, USA |
|
| The FALKO corpus(Fehlerannotiertes Lernerkorpus ‘error annotated learner corpus’) |
German |
various |
written |
1. Falko summaries
2. Falko essays
3. Falko Georgetown: letters, fiction writing, journal articles, book reviews (= longitudinal data from American learners) |
Mainly advanced (Falko Georgetown: beginners – advanced)
|
Summaries: 41,072
Essays: 23,579
Georgetown: 126,105
|
Anke Lüdeling
Maik Walter
Humboldt-Universität zu Berlin
Institut für deutsche Sprache und Linguistik, Germany
|
Online access |
| The Heriot-Watt corpus |
1. German
2. German
3. German
|
1. English
2. German
|
1. Written
2. Written
3. Written
|
1. Essays, examination, answers.
Longitudinal and cross-sectional data.
2. Essays
3. Teaching input
|
1. Intermediate to Advanced
2. Advanced
|
Under development |
Heriot-Watt University Edinburgh, UK |
Not currently publicly available
|
| The KOLIPSI corpus |
German |
Italian |
written |
Two written language production tasks of a standardized test (email/letter) |
A2-C1 |
under development |
European Academy Bolzano/Bozen, Italy |
|
| The LeaP Corpus (Learning the Prosody of a foreign language) |
German |
various |
spoken |
The LeaP corpus covers four different types of speech:
- read speech
- prepared speech
- free speech
- nonsense word lists |
various |
|
University of Augsburg, Germany |
The annotated corpus is available to the scientific community. Please contact at the University of Augsburg. |
The LeKo (Lernerkorpus) corpus  |
German |
|
|
|
|
|
, Humboldt-Universität Berlin, Germany |
Online access (password protected)
Register here
|
| The Telecollaborative Learner Corpus of English and German Telekorp |
German |
English |
written |
Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005. |
|
c. 1,5 m |
Pennsylvania State University, USA.
|
Not publicly available |
| Ursula Weinberger’s learner corpus |
German |
English |
written |
|
|
27,635 |
Lancaster University, UK |
Not publicly available |
The Langman corpus  |
Hungarian |
Chinese |
spoken |
Interviews conducted in 1994 with 11 Chinese immigrants living in Hungary.
Interviews focused on issues related to their arrival in Hungary as well as their daily life activities |
|
|
University of Texas at San Antonio, USA |
Freely available |
| Corpus parlato di italiano L2 |
Italian |
English
German
Japanese |
spoken |
Transcriptions of interviews |
various |
|
Stefania Spina
Silvio Pazzaglia
Mirco Perini
Università per Stranieri di Perugia, Italy |
Searchable online |
| The KOLIPSI corpus |
Italian |
German |
written |
Two written language production tasks of a standardized test (email/letter) |
A2-C1 |
under development |
European Academy Bolzano/Bozen, Italy |
|
| Varietà di Apprendimento della Lingua Italiana: Corpus Online (VALICO) |
Italian |
various |
written |
|
various |
567,437 |
|
Freely available and searchable online. |
The Korean learner corpus  |
Korean |
various |
written |
|
Beginner and intermediate |
c. 10,000 |
Georgetown University, USA
Wellesley College, USA
Yonsei University, South Korea |
|
The ASK (Andrespråkskorpus = Second Language Corpus) corpus  |
Norwegian |
various |
written |
essays |
|
|
University of Bergen, Norway |
|
The PIKUST pilot learner corpus  |
Slovene |
various |
written |
mostly argumentative essays |
Majority advanced – but also intermediate and beginner |
35,000 |
Mojca Stritar
University of Ljubljana, Slovenia |
|
| The Anglia Polytechnic University (APU) Learner Spanish Corpus |
Spanish |
various |
written |
|
|
120,000 |
Anglia Ruskin University, UK |
|
| CEDEL2 (Corpus Escrito del Español L2) |
Spanish |
English |
written |
Written compositions by learners of Spanish |
|
600,000 |
Universidad Autónoma de Madrid, Spain |
|
| The Corpus of Taiwanese Learners of Spanish (Corpus de Aprendices Taiwaneses de Español, CATE) |
Spanish |
Chinese |
written |
Student essays |
various |
337,122 (under development) |
|
|
The DIAZ corpus  |
Spanish |
various:
German
Swedish
Icelandic
Korean
Chinese
|
spoken |
Semi-spontaneous (structured interviews) and experimental (structured questionnaires) Adult Spanish L2/L3 oral data |
various |
|
Universitat Pompeu Fabra, Spain |
Freely available |
| The Japanese learner corpus of Spanish |
Spanish |
Japanese |
written |
Student essays |
|
83,400 |
University of Birmingham, UK |
|
| Spanish Learner Language Oral Corpus (SPLLOC) |
Spanish |
English |
spoken |
Learner narratives, interviews and picture description tasks |
from beginner to advanced |
|
Laura Dominguez
University of Southampton, UK |
Searchable online
Data freely available for download |
| The ASU corpus |
Swedish |
|
spoken and written |
Transcribed audio-recorded conversations and written texts from adult learners of Swedish – longitudinal data |
|
490,000 words
(415,000 spoken and 75,000 written) |
Stockholm University, Sweden |
|
The ESF (European Science Foundation Second Language) Database  |
Multilingual:
Dutch
English
French
German
Swedish
|
various:
Punjabi
Italian
Turkish
Arabic
Spanish
Finnish
|
spoken |
Spontaneous second language acquisition of forty adult immigrant workers living in Western Europe, and their communication with native speakers in the respective host countries |
various |
|
Clive Perdue
Max Planck Institut, Nijmegen, Netherlands |
Freely available |
| The Foreign Language Examination Corpus (FLEC) |
Multilingual |
Polish |
written |
Data from the Warsaw University
Certification Exams |
various |
under development |
Warsaw University, Poland |
|
| The MeLLANGE Learner Translator Corpus (LTC) |
Multilingual |
various |
written |
Legal, technical, administrative and journalistic texts |
Trainee translators |
|
Université Paris Diderot, France.
|
Searchable online |
| The MiLC Corpus |
Multilingual:
Catalan
English
French
Spanish
|
Catalan |
written |
Formal and informal letters, summaries, curriculum vitae, essays, reports, translations, synchronous and asynchronous communication exchanges, business letters |
|
|
et al
Universidad Polytecnica de Valencia, Spain |
|
The Multilingual Learner Corpus (MLC)  |
Multilingual:
English
German
Italian
Spanish
|
Brazilian Portuguese |
written |
Argumentative and marrative essays |
|
|
University of São Paulo, Brazil |
Accessible online to registered researchers |
The Padova Learner Corpus  |
Multilingual:
English
French
Spanish
|
Italian |
CMC
(Computer-Mediated Communication) |
Student work produced in blended language courses using FirstClass conferencing software.
Variety of genres: diaries, debate contributions, formal reports, résumés etc.
Longitudinal data
|
|
under development |
University of Padua, Italy |
|
|
The PAROLE corpus
(corpus PARallèle Oral en Langue Etrangère) 
|
Multilingual:
English
French
Italian
(Mainly L2 speakers but also includes data produced by L1 speakers)
|
various |
spoken |
5 oral production tasks |
various |
|
Marie-Jo Derive
Nejma Succo
Jean O'Donnell
Sandra Billard
Sandrine Rutigliano-Daspet
Université de Savoie, France |
|