Learner corpora around the world

This list is very much work in progress. We would like it to be as comprehensive as possible. If you have a learner corpus or know of one that is not listed on this webpage, send a message to or and we'll add it to the list. We hope you will find the list useful for your research!
Medium Text type/ task type Proficiency level Size
in words
Project director Availability
The Arabic Learner Corpus
Arabic 66 languages written and spoken narrative and discussion intermediate and advanced

written: c. 283,000

audio: c. 3h30

Abdullah Alfaifi & Eric Atwell

The Pilot Arabic Learner Corpus Arabic English written narrative intermediate and advanced c. 9,000 Ghazi Abuhakema
Reem Faraj
Anna Feldman

Montclair State University, USA
The Jinan Chinese Learner Corpus
Chinese 50 languages written exams and assignments beginners, intermediate and advanced

c. 6 million Chinese characters

c. 9,000 texts

Maolin Wang
Shervin Malmasi
Minggxuan Huang

Freely available
The AKCES/CZESL corpus
(Acquisition corpora of Czech/Czech as a second language)
Czech various written and spoken student essays and
various 2 m
Charles University in Prague
Technical University in Liberec, Czech Republic
Leerdercorpus Nederlands als Vreemde Taal Dutch French written      
Université catholique de Louvain, Belgium

The Aachen Corpus of Academic Writing

English German written Academic research writing advanced

c. 240,000 words

+ c. 225,000 words (L1 component)

Elma Kerz, RWTH Aachen University Under development
The Advanced Learner English Corpus
English mainly Swedish written Essays written by university students of English linguistics and English literature advanced 1.3 m. Tove Larsson, Uppsala University Not freely available
The ANGLISH corpus English French spoken Readings of texts and sentences, spontaneous oral language. various  
University of Provence, France.
 Freely available
Asao Kojiro’s Learner Corpus Data English Japanese written Essays and stories written or reproduced by Japanese college students.     Texts available for download
The Barcelona English Language Corpus (BELC) English Spanish
spoken and written

4 tasks:
Written composition
Oral narrative
Oral interview

Longitudinal data (children and young adults learning English)

University of Barcelona, Spain
The BATMAT Corpus English Swedish
written BA dissertations
MA dissertations
Advanced c. 2.5 m (expanding) , English language and literature, Åbo Akademi University, Finland Under development
The Bilingual Corpus of Chinese English Learners (BICCEL) English Chinese spoken and written

Spoken: National Oral English test.

Written: in-class assignments

  c. 2 m
National Research Center for Foreign Language Education Beijing Foreign Studies University, China
The Br-ICLE corpus (Brazilian component of ICLE) English Brazilian Portuguese written Argumentative and literary essays    
Catholic University of São Paulo

University of São Paulo, Brazil
Restricted online access
The British Academic Written English (BAWE) corpus English Mainly L1 speakers but also includes data produced by L2 speakers written ESP papers

4 levels of study (from undergraduate levels to final year and taught masters level)


c. 6,5 m
Sheena Gardner
Warwick, UK

University of Birmingham, UK
Paul Wickens
Oxford Brookes, UK

The BAWE corpus can be accessed through the corpus analysis interface, Sketch Engine.

prototype interface that allows filtered searching of the BAWE corpus files is available.

The BUiD Arab Learner Corpus (BALC) English Arabic written School examination essays various 287,227
The British University in Dubai,
United Arab Emirates

University of Birmingham, UK
At present, copies of the current version of the corpus is available on request from
The Cambridge Learner Corpus (CLC) English various written Exam scripts various c. 25 m - still expanding Cambridge University Press and Cambridge ESOL, UK Commercial
The Corpus of Academic Learner English (CALE) English German written Various academic text types that are typically produced in university courses of English, e.g. term papers, reading reports, research plans, abstract, reviews, and summaries. advanced under development
University of Bremen, Germany
The Corpus of English Essays Written by Asian University Students (CEEAUS) English various written Student essays various c. 200,000
Kobe University, Japan
Freely downloadable from the website
The Chinese Academic Written English (CAWE) corpus English Chinese written Dissertations written by Chinese undergraduates majoring in English linguistics or applied linguistics.   407,960
City University of Hong Kong, Hong Kong
The Chinese Learner English Corpus (CLEC) English Chinese written   various 1 m Gui Shichun
Guangdong University of Foreign Studies & Yang Huizhong, Shanghai Jiatong, China
The corpus can only be accessed by users in the Department of English at HKPU.
The City University Corpus of Academic Spoken English (CUCASE) English Chinese (but also includes data produced by L1 speakers) multimedia     2 m
City University of Hong Kong, Hong Kong
The Cologne-Hanover Advanced Learner Corpus (CHALC) English German written term papers and essays advanced c. 210,000
University of Michigan, USA
College Learners’ Spoken English Corpus (COLSEC) English Chinese spoken National spoken English test for non-English majors.   700,000 Yang and Wei  
The Corpus Archive of Learner English in Sabah/Sarawak (CALES) English Malay written Argumentative essays various c. 400,000 Simon Botley@Faizal Hakim
Doreen Dillah
Universiti Teknologi MARA Sarawak, Malaysia
The Corpus of Young Learner Interlanguage (CYLIL) English



spoken English L2 data elicited from European School pupils.
Longitudinal data
various c. 500,000
Vrije Universiteit Brussel, Belgium
The Eastern European English learner corpus English Russian
spoken Spontaneaous spoken production data elicited by means of a semi-structured interview various c. 60,000
Eberhard Karls University of Tübingen, Germany
The EFL Teacher Corpus (ETC) English Korean
spoken Teacher talks in language classrooms Upper-intermediate to advanced 123,000
Eun-Joo Lee
Under development
The English of Malaysian School Students corpus (EMAS) English Malay written Student essays various c. 500,000 et al.
Universiti Putra Malaysia, Malaysia
The English Speech Corpus of Chinese Learners (ESCCL) English Chinese spoken Dialogue reading-aloud Middle school and college   Chen Hua
Nantong University, China
Wen Qiufang
Beijing Foreign Studies University, China
Li Aijun
Chinese Academy of Social Sciences, China
The ETS Corpus of Non-Native Written English English 11 languages written 12,100 TOEFL English essays / / Daniel Blanchard Information avout the score level is available for each essay
The EVA Corpus of Norwegian School English English Norwegian spoken Picture-based tasks  / 35,000 Angela Hasselgren
University of Bergen, Norway
Searchable online
The Gachon Learner Corpus English Korean
(+ a few Chinese & Spanish speaking students) 
written Written Journal Assignments Lower intermediate 2.5 million Brian Carlstrom Freely available
The GICLE corpus (German component of ICLE) English German written Mainly non-academic argumentative essays advanced c. 234,000    
The Giessen-Long Beach Chaplin Corpus (GLBCC) English German spoken Transcribed interactions between native English speakers, ESL and EFL speakers   350,000 Andreas Jucker
Sara Smith
University of Giessen, Germany
Restricted use: apply for approval to get a copy.
The Hong Kong University of Science & Technology (HKUST) learner corpus English Chinese - mostly Cantonese written Untimed assignments written for EFL courses and school leaving exams University and advanced high school students 25 m
Hong Kong University of Science &Technology, Hong Kong
The Indianapolis Business Learner Corpus (IBLC) English various written Job application letters and résumés of business communication students from the U.S., Belgium, Finland, Germany, and Thailand, spanning the years 1990-1998    

Thomas Albin Upton
Indiana University, USA
The International Corpus of Crosslinguistic Interlanguage (ICCI) English various written Essays (20-min in-class tasks without the use of a dictionary)  beginner to lower-intermediate   Yukio Tono
Tokyo University of Foreign Studies, Japan
Publicly available
The International Corpus Network of Asian Learners of English (ICNALE) English Chinese
written Short argumentative essays (topic, time, length and dictionary use are all controlled) various 300,000 (estimated goal: 1 m)
Kobe University, Japan
Freely available
The International Corpus of Learner English (ICLE) English various written Argumentative and literary essays High-intermediate to advanced 3 m
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom: order online.
The International Teaching Assistants corpus (ITAcorp) English various spoken Learner language from a variety ofspoken classroom tasks: office hours role plays, presentations, discussions   c. 500,000

Pennsylvania State University, USA
The ISLE speech corpus English German
spoken Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). Intermediate   CD-Rom
The Israeli Learner Corpus of Written English English Hebrew written Argumentative and descriptive essays   c. 750,000
Kibbutzim College of Education, Israel
The Japanese English as a Foreign Language Learner (JEFLL) Corpus English Japanese written Student essays From beginning to intermediate c. 700,000

Yukio Tono, Meikai University, Japan

The JEFLL Corpus will be freely available for research, first via the web query system (already available in Japanese) and then the entire data will be distributed under license in the future.
The Janus Pannonius University (JPU) Corpus English Hungarian written Essays and research papers University students c. 500,000
University of Pécs, Hungary
Searchable online
Lancaster Corpus of Academic Written English (LANCAWE) English various written IELTS academic writing tests (descriptive and argumentative tasks); assignments.
Longitudinal data.
The Lang-8 Learner Corpora English various written texts from Lang-8, a social networking site for language learning / / Toshikazu Tajiri & Mamoru Komachi Available
The LeaP Corpus :Learning Prosody in a Foreign Language English German spoken Four types of speech styles were recorded:
- nonsense word lists
- readings of a short story
- retellings of the story
- free speech in an interview situation
Albert-Ludwigs-University Freiburg, Germany
The annotated corpus is available to the scientific community. Please contact at the University of Augsburg.
The Learner Corpus of Engineering Abstracts
English Malaysian written Abstracts of the Computer and Communication Systems Engineering Final Year Projects various

c. 550,000

998 abstracts

Helen Tan, University Putra Malaysia

Chan Swee Heng

Ain Nadzimah

Syamsiah bt Mashohor

The Learner Corpus of English for Business Communication English     Different types of business correspondence written for simulated business situations, including memos, faxes, reports, letters of enquiry and complaint letters.   c. 117,500
Hong Kong Polytechnic University, Hong Kong
Searchable online
The Learner Corpus of Essays and Reports English     Essays and project reports covering a range of topics from Science, IT and New Media to Nursing, Business and Economics, and the Social Sciences.   c. 188,000

Sima Sengupta
Hong Kong Polytechnic University, Hong Kong


Searchable online
A Learners' Corpus of Reading Texts English French spoken Unprepared reading of English texts.
The texts are short abstracts of fiction or made-up dialogues.
    Sophie Herment
Valérie Kerfelec
Laetitia Leonarduzzi
Gabor Turcsan
Freely available
The LONGDALE project: LONGitudinal DAtabase of Learner English English various spoken and written Range of text types/task types.
Longitudinal data.
From intermediate to advanced  
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
Under development
The Longman Learners' Corpus English various written Essays and exam scripts various c. 10 m Longman Commercial
The Louvain International Database of Spoken English Interlanguage (LINDSEI) English various spoken Interviews and picture descriptions High-intermediate to advanced c. 800,000
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom: order online
The Malaysian Corpus of Learner English (MACLE) English Malay written       Gerry Knowles
Zuraidah Mohd. Don
University of Malay, Malaysia
The Malaysian Corpus of Students' Argumentative Writing (MCSAW) English Malay
Chinese Indian
written Argumentative essays

 Form 4
Form 5

c. 565,500

University Putra Malaysia

Available from developers
The Michigan Corpus of Academic Spoken English (MICASE) English Mainly L1 speakers but also includes data produced by L2 speakers spoken Transcipts of academic speech events   c. 1,8 m

Ute Römer
University of Michigan, USA

Searchable online
The Michigan Corpus of Upper-level Student Papers (MICUSP) English semi-balanced sample of native and non-native speakers of English written ESP papers
A-grade papers or ungraded papers that have been assessed and accepted (such as research proposals), but not published
  c. 2,6 m

Ute Römer
University of Michigan, USA

Searchable online
The Montclair Electronic Language Database (MELD) English various written Student essays various c. 100,000

Monclair State University, USA
Searchable online
The Multimedia Adult ESL Learner Corpus (MAELC) English ESL environment multimedia Video of classroom interaction and associated written materials From beginning to upper-intermediate  

Stephen Reder
Kathryn Harris
Kristen Setzler
Portland State University, USA

The Lab School would like to share the extensive resources from MAELC with interested researchers and teacher trainers. Those interested should make inquiries to the Lab School by .
The Neungyule Interlanguage Corpus of Korean Learners of English (NICKLE) English Korean spoken and written

Written part: student essays
Spoken part: student interviews and oral speech tests transcriptions

Mainly from beginning to intermediate  c. 890,000 (spoken: c. 100,000)  
Yonsei University, Seoul, Korea
The corpus will be available to the scientific community for research purposes upon request.
The NICT JLE (Japanese Learner English) Corpus English Japanese spoken English oral proficiency interview test various 2 m

National Institute of Information and Communications Technology, Kyoto, Japan.
CD-Rom (Japanese page)
The NOn-native Spanish corpus of English (NOSE)
English Spanish written Argumentative and descriptive student essays Intermediate and upper-intermediate c. 300,000 words  
Universidad de Granada, Spain
The NUS Corpus of Learner English English Several East Asian languages, predominantly Chinese written Student essays on a wide range of topics including environmental pollution, healthcare, etc.   various c. 1 m

National University of Singapore, Singapore.
Freely available
The PELCRA Learner English Corpus (PLEC) English Polish spoken and written Written: Argumentative, descriptive, narrative and quasi-academic essays; formal letters From beginning to post-advanced under development: The empirical basis for this project is a 3-million word corpus of spoken (200,000 words) and written (2.8 million words) texts by Polish learners of English (by 2013)

#mailto:Piotr Pęzik:piotr.pezik@gmail.com#

University of Lodz, Poland

Online search engine and corpus analysis tools
The PICLE corpus (Polish component of ICLE) English Polish written Student essays Advanced 330,000
AMU, Poznan, Poland
Searchable online
The Qatar learner corpus English Arabic (mostly from Qatar) spoken Spoken interviews with Qatari learners of English     Yun Zhao Helen
Carnegie Mellon University, USA
Freely available
The Québec learner corpus English From (from Québec) written Argumentative essays Intermediate and advanced c. 250,000
Université du Québec à Montréal, Canada
The Romanian Corpus of Learner English (RoCLE) English Romanian written Student essays    
Zurich University, Switzerland
The Russian Learner Translator Corpus
Russian written Translations produced by trainee translators Trainee translators c. 1 million tokens Project directors: Andrey Kutuzov and Maria Kunilovskaya Freeliy available
The Santiago University Learner of English Corpus (SULEC) English Spanish spoken and written

Written: compositions or argumentative essays.

Spoken: semistuctured interviews, short oral presentations and brief story descriptions.

The Scientext English Learner Corpus English French written Academic argumentative texts     Searchable online
Second Language Research Tasks
English Various



written paragraphs

various oral tasks

Various c. 300,000

Bill Crawford (Northern Arizona University)

Kim McDonough (Concordia University)

Under development
The Seoul National University Korean-speaking English Learner Corpus (SKELC) English Korean written Student essays Various c. 900,000
Seoul National University
The SILS Learner Corpus of English English various (mainly Japanese) written Student essays Basic, intermediate and advanced  
Waseda University, Japan
The Soochow Colber Student Corpus (SCSC) English Chinese written Student essays   227,000 Colman Bernath
Soochow University, Taiwan
The Spoken and Written English Corpus of Chinese Learners (SWECCL) English Chinese spoken (SECCL) and written (WECCL)

Written: argumentative and narrative essays.

Spoken: National Spoken English Test – longitudinal data

  c. 2 m Wei Qiufang
Liang Maocheng
Wang Lifei
Searchable online
The Taiwanese Corpus of Learner English (TLCE) English Chinese written Journals and essays (descriptive, narrative, expository, argumentative) from intermediate to advanced c. 2 m Rebecca Hsue-Huch Shih
Sun Yat-sen University, Taiwan
The Tawainese learner academic writing corpus (TaiwanLAWC) English Chinese written Theses and dissertations written by Taiwanese graduate students.    
National Taiwan Normal University, Taiwan

The TELEC Secondary Learner Corpus (TSLC)


English Chinese written     1,5 m
University of Hong Kong, Hong Kong
The Telecollaborative Learner Corpus of English and German Telekorp English German written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m
Pennsylvania State University, USA.
Not publicly available
The Tswana Learner English Corpus (TLEC) English Tswana written Argumentative essays Advanced c. 200,000
North-West University, South Africa
Available in ICLE
The Uppsala Student English Corpus (USE) English Swedish written student essays various 1,221,265

Uppsala University, Sweden
The corpus can be used for research and educational purposes. It can be accessed on the Internet from the Oxford Text Archive.
The UPF Learner Translation Corpus English Catalan written Translations written by the students of the Translation and Interpreting degree at UPF.    under development
Pompeu Fabra University, Barcelona, Spain 
The UPV Learner Corpus English Catalan written essays various 150,000 Universitat Politècnica de València, Spain  
The Varieties of English for Specific Purposes dAtabase (VESPA) learner corpus English various written ESP texts (term papers, reports, MA dissertations) various under development
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
Under development
The WriCLE (Written Corpus of Learner English) corpus English Spanish written essays various c. 750,000
Universidad Autonoma de Madrid, Spain
The corpus is available for free, and can be downloaded from this website. There is also a search interface to retrieve sentences and clauses.
The Yonsei English Learner Corpus (YELC) English Korean written Yonsei University English Diagnostic Tests (Part 1: Descriptive task, max. 100 words; Part 2: Argumentative tast, max. 300 words) 9 levels
(A1, A1+, A2, B1, B1+, B2, B2+, C1, C2)
1,085,879 Seok-Chae Rhee

Yonsei University, Korea
The YELC corpus will be available to the scientific community for research purposes from 31 March 2012.
The Young Learner Corpus of English
English Greek spoken Pedagogic Corpus of video-recorded EFL language classes.  

170 school hours/126  hours of videotaped material

1,5 million types

Project director: Marina Mattheoudakis, Aristotle University of Thessaloniki, Greece

Thomas Zapounidis

The Estonian Interlanguage Corpus (EIC) of Tallinn University Estonian Russian
written Spontaneously produced texts in language learning situations: argumentative and literary essays, written stories, letters, term papers, reading reports. A1-C2 1,145,794 Project director:
Tallinn University, Estonia
Restricted online access
Linguistic Basis of the Common European Framework for L2 English and L2 Finnish
Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

Paths in Second Language Acquisition
Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

The Advanced Finnish Learner Corpus
Finnish  Russian
written Exam essays, theses, essays and writings Advanced c. 582,000

Kirsti Siitonen, University of Turku, Finland

Ilmari Ivaska, University of Turky, Finland

The Finnish National Foreign Language Certificate Corpus (YKI) Finnish

Lappish (Sami)



Various Beginner, intermediate and advanced  

Ari Maijanen, Centre for Applied Language Studies, University of Jyväskylä, Finland

Tiina Lammervo, Centre for Applied Language Studies, University of Jyväskylä, Finland

Available with user ID and Password
The International Corpus of Learner Finnish
Finnish various written Finnish learners’ spontaneously produced texts in language learning situations, large variety of text types Beginner, intermediate and advanced under development

University of Oulu, Finland

Free download after applying for a user licence
The Chy-FLE (Cypriot Learner Corpus of French) French Modern Greek
(and Cypriot Greek)
written Argumentative and descriptive essays From intermediate to advanced c. 250,000 (under development)
Université de Poitiers, France
In collaboration with the University of Cyprus
The COREIL corpus French

Université Paris-Diderot, France
The "Dire Autrement" corpus French (Second Language) Mainly L1 speakers of English written Narrative, injunctive, persuasivle and informative texts   48,114
Jasmina Milicevic
Dalhousie University, Canada
French Interlanguage Database (FRIDA) French various written      
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
French Learner Language Oral Corpora (FLLOC) French various spoken See description of the 7 corpora various  
Newcastle University

University of Southampton, UK

The contents of the database are being made freely available to the research community, in the form of digital sound files and related transcripts formatted using CHILDES software.

Searchable online

The InterFra corpus French Swedish spoken Interviews, retellings of video clips and picture stories various  

Stockholm University, Sweden.

The contents of the database are meant to be available to the research community in the form of digital audio files and related transcripts formatted using XML software.
The "Interphonologie du Français Contemporain" (IPFC) corpus French Cypriot Greek
English (Canada)

spoken Reading aloud, repeating words, guided interviews, interactions between two learners. various under development
Waseda University, Japan
Université de Rouen, France

Université de Genève, Switzerland

Tokyo University of Foreign Studies, Japan
Under development
The LCF corpus (Learner Corpus French) French Dutch written

Argumentative essays
Informative texts
Journalistic texts
Formal letters

Written compositions by Flemish students of French

From intermediate to advanced 490,000 K.U.Leuven Campus Kortrijk, UGent and Lessius
Under development
The Lund CEFLE Corpus (Corpus Écrit de Français Langue Étrangère) French Swedish written Descriptive and narrative essays; picture-based stories. various 100,000
Lund University, Sweden
A sub-part of the corpus is available online.
The UWi (University of the West Indies) learner corpus French English and Jamaican Creole spoken Conversations during oral exams and in informal contexts various  
University of New South Wales, Sydney, Australia
Comasan Labhairt ann an Gàidhlig (CLAG)
Gaelic Adult Proficiency
Gaelic various spoken

Conversation task


Elicited oral imitation task

Question and answer activity


Roibeard Ó Maolalaigh (University of Glasgow)

Nicola Carty (University of Glasgow)

The AleSKO corpus German Chinese (but also German L1 data from the FALKO corpus) written Argumentative essays    
University of Konstanz, Germany

Vilnius Pedagogical University, Lithuania.
Analyzing Discourse Strategies: A Computer Learner Corpus German English
(mainly American English)
written Threaded Discussion
Longitudinal data
From beginner to intermediate-mid under development

University of Pennsylvania, USA
The Corpus of Learner German (CLEG13) German English written Argumentative, free compositions
Longitudinal over 4 years, undergraduate students
Intermediate to advanced c. 320,000

Online access through the FALKO platform.
The corpus is also available as txt files to the scientific community. Please contact

The deL1L2IM corpus German

Russian-Belorussian bilinguals

written Instant messaging dialogues Advanced c. 52,000

Sviatlana Höhn
University of Luxemburg

The FALKO corpus (Fehlerannotiertes Lernerkorpus ‘error annotated learner corpus’) German

Learner subcorpus: various

Native subcorpus: German


1. Summaries

2. Essays

3. Letters, fiction writing, journal articles, book reviews (= longitudinal data from American learners)

1. Advanced

2. Advanced

3. Beginners - advanced


1. 40.638 (learner subcorpus) + 21.211 (native subcorpus)

2. 144.619 (learner corpus) + 70.615 (native subcorpus)

3. 78.151 (learner subcorpus)

Anke Lüdeling
Maik Walter
Humboldt-Universität zu Berlin
Institut für deutsche Sprache und Linguistik, Germany

Online access
The KOLIPSI corpus German Italian written Two written language production tasks of a standardized test (email/letter) A2-C1 under development

European Academy Bolzano/Bozen, Italy
Eurac research on learner corpora
The LeaP Corpus (Learning the Prosody of a foreign language) German various spoken The LeaP corpus covers four different types of speech:
- read speech
- prepared speech
- free speech
- nonsense word lists
University of Augsburg, Germany
The annotated corpus is available to the scientific community. Please contact at the University of Augsburg.
The LeKo (Lernerkorpus) corpus German           , Humboldt-Universität Berlin, Germany

Online access (password protected)

Register here

The LINCS Corpus

1. German

2. German

3. German

1. English

2. German

1. Written

2. Written

3. Written

1. Essays, examination, answers.
Longitudinal and cross-sectional data.

2. Essays

3. Teaching output

1. Intermediate to Advanced

2. Advanced

Under development
Heriot-Watt University Edinburgh, UK
Not currently publicly available
(Multilingual Platform for the European Reference Levels: Exploring Interlanguage in Context)




various written writing tasks from standardized tests (telc/UJOP) A1 to C1 (CEFR) c. 280.000 Katrin Wisniewski  
The Telecollaborative Learner Corpus of English and German Telekorp German English written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m

Pennsylvania State University, USA.


Not publicly available
The Langman corpus Hungarian Chinese spoken Interviews conducted in 1994 with 11 Chinese immigrants living in Hungary.
Interviews focused on issues related to their arrival in Hungary as well as their daily life activities
University of Texas at San Antonio, USA
Freely available
Corpus parlato di italiano L2 Italian English
spoken Transcriptions of interviews various   Stefania Spina
Silvio Pazzaglia
Mirco Perini
Università per Stranieri di Perugia, Italy
Searchable online
The KOLIPSI corpus Italian German written Two written language production tasks of a standardized test (email/letter) A2-C1 under development
European Academy Bolzano/Bozen, Italy
The LIPS Corpus (Lexicon of Sopoken Italian by Foreigners) Italian various spoken Proficiency exams of the Certification of Italian as a Foreign Language (CILS) A1-C2 c. 700,000

Università per Stranieri di Siena, Italy

Varietà di Apprendimento della Lingua Italiana: Corpus Online (VALICO) Italian various written   various 567,437
Freely available and searchable online.
The Korean learner corpus Korean various written   Beginner and intermediate c. 10,000
Georgetown University, USA

Wellesley College, USA

Yonsei University, South Korea
The ASK (Andrespråkskorpus = Second Language Corpus) corpus Norwegian various written essays    
University of Bergen, Norway
The PIKUST pilot learner corpus Slovene various written mostly argumentative essays Majority advanced – but also intermediate and beginner 35,000 Mojca Stritar
University of Ljubljana, Slovenia
The Anglia Polytechnic University (APU) Learner Spanish Corpus Spanish various written     120,000
Anglia Ruskin University, UK
Aprescrilov ("Aprendera Escribiren Lovaina") Spanish Dutch written Written assignments and tests; several text types (letters, expository, descriptive, argumentative, narrative) from A1 to C1 c. 1 m

KU Leuven, Belgium

Restricted online access


(Corpus de aprendices de español)

Spanish various written   from A1 to C1

c. 575,000

CAES team

Universidade de Santiago de Compostela

Online access
CEDEL2 (Corpus Escrito del Español L2) Spanish English written Written compositions by learners of Spanish   600,000

Universidad Autónoma de Madrid, Spain

Universidad de Granada, Spain

 Please contact to get a free sample of the corpus


(Corpus de textos escritos para el análisis de errores de aprendices de E/LE)

Spanish various written essays from A2 to C1 /

Cestero Mancera, A. M.
Penadés Martínez, I.

Universidad de Alcalá Henares

CD-ROM available
The Corpus of Taiwanese Learners of Spanish (Corpus de Aprendices Taiwaneses de Español, CATE) Spanish Chinese written Student essays various 337,122 (under development)  
The DIAZ corpus Spanish



spoken Semi-spontaneous (structured interviews) and experimental (structured questionnaires) Adult Spanish L2/L3 oral data various  
Universitat Pompeu Fabra, Spain
Freely available
The Japanese learner corpus of Spanish Spanish Japanese written Student essays   83,400
University of Birmingham, UK
The Spanish Corpus Proficiency Level Training
Spanish English (heritage language learners) spoken Dialogues about a given set of questions beginner to advanced   Dr Dale Koike, University of Texas, Austin Liberal Arts Instructional Technology Center

Videos are available

Spanish Learner Language Oral Corpus (SPLLOC) Spanish English spoken Learner narratives, interviews and picture description tasks beginner to advanced   Laura Dominguez
University of Southampton, UK
Searchable online
Data freely available for download
Spanish Learner Oral Corpus Spanish various
(9+ languages - especially Portuguese, French, Italian)
spoken Semi-spontaneous interviews, narrative and descriptive tasks A2-B1 more than 50,000 words
Laboratorio de Lingüistica Informatica
Universidad Autonoma de Madrid, Spain
Online access
The ASU corpus Swedish   spoken and written Transcribed audio-recorded conversations and written texts from adult learners of Swedish – longitudinal data   490,000 words
(415,000 spoken and 75,000 written)

Stockholm University, Sweden
The ESF (European Science Foundation Second Language) Database





spoken Spontaneous second language acquisition of forty adult immigrant workers living in Western Europe, and their communication with native speakers in the respective host countries various  
Clive Perdue
Max Planck Institut, Nijmegen, Netherlands
Freely available
The Foreign Language Examination Corpus (FLEC) Multilingual Polish written Data from the Warsaw University
Certification Exams
various under development

Warsaw University, Poland
The MeLLANGE Learner Translator Corpus (LTC) Multilingual various written Legal, technical, administrative and journalistic texts Trainee translators  

Université Paris Diderot, France.

Searchable online
The MiLC Corpus



Catalan written Formal and informal letters, summaries, curriculum vitae, essays, reports, translations, synchronous and asynchronous communication exchanges, business letters     et al
Universidad Polytecnica de Valencia, Spain
The Multilingual Learner Corpus (MLC)



Brazilian Portuguese written Argumentative and marrative essays    
University of São Paulo, Brazil
Accessible online to registered researchers
The Padova Learner Corpus



Italian CMC
(Computer-Mediated Communication)

Student work produced in blended language courses using FirstClass conferencing software.
Variety of genres: diaries, debate contributions, formal reports, résumés etc. 
Longitudinal data


  under development

University of Padua, Italy

The PAROLE corpus
(corpus PARallèle Oral en Langue Etrangère)




(Mainly L2 speakers but also includes data produced by L1 speakers)

various spoken 5 oral production tasks various  

Marie-Jo Derive
Nejma Succo
Jean O'Donnell
Sandra Billard
Sandrine Rutigliano-Daspet
Université de Savoie, France
The University of Toronto Romance Phonetics Database (RPD)



(including English, Mandarin, Russian, Spanish, etc.)
spoken Elicited production - sentence and passage reading, story narration, description of favourite meal various  

University of Toronto, Canada
Password available from directors


| 9/11/2015 |