Learner corpora around the world

This list is very much work in progress. We would like it to be as comprehensive as possible. If you have a learner corpus or know of one that is not listed on this webpage, send a message to Amandine Dumont or Sylviane Granger and we'll add it to the list. We hope you will find the list useful for your research!

The list only contains learner corpora, i.e. electronic collections of continuous written or spoken data produced by foreign or second language learners.
For a list of learner corpus-based datasets (treebanks, error lists, etc.), click here.


Learner corpora


Medium Text type/ task type Proficiency level Size
in words
Project director Availability
The Arabic Learner Corpus
Arabic 66 languages written and spoken Narrative and discussion Intermediate and advanced

c. 283,000

c. 3h30

Abdullah Alfaifi & Eric Atwell

The Pilot Arabic Learner Corpus Arabic English written Narrative Intermediate and advanced c. 9,000 Ghazi Abuhakema
Reem Faraj
Anna Feldman

Montclair State University, USA
The Jinan Chinese Learner Corpus
Chinese 50 languages written Exams and assignments Beginners, intermediate and advanced

c. 6 m. Chinese characters

c. 9,000 texts

Maolin Wang
Shervin Malmasi
Minggxuan Huang

The AKCES/CZESL corpus
(Acquisition corpora of Czech/Czech as a second language)
Czech Various written and spoken Student essays and
Various 2 m.
Charles University in Prague
Technical University in Liberec, Czech Republic
Leerdercorpus Nederlands als Vreemde Taal Dutch French written      
Université catholique de Louvain, Belgium

The Aachen Corpus of Academic Writing

English German written Academic research writing Advanced

c. 240,000 words

c. 225,000 words (L1 component)

Elma Kerz, RWTH Aachen University Under development
The Advanced Learner English Corpus
English Mainly Swedish written Essays written by university students of English linguistics and English literature Advanced c. 1,3 m. Tove Larsson, Uppsala University Not freely available
The ANGLISH corpus English French spoken Readings of texts and sentences, spontaneous oral language. Various c. 5h30
University of Provence, France.
 Freely available
Asao Kojiro’s Learner Corpus Data English Japanese written Essays and stories written or reproduced by Japanese college students.     Texts available for download
The Barcelona English Language Corpus
English Spanish
spoken and written

4 tasks:
Written composition
Oral narrative
Oral interview

Longitudinal data (children and young adults learning English)

University of Barcelona, Spain
The BATMAT Corpus English Swedish
written BA dissertations
MA dissertations
Advanced c. 2,5 m. (expanding) , English language and literature, Åbo Akademi University, Finland Under development
The Bilingual Corpus of Chinese English Learners
English Chinese spoken and written

Spoken: National Oral English test.

Written: in-class assignments

  c. 2 m.
National Research Center for Foreign Language Education Beijing Foreign Studies University, China
The Br-ICLE corpus (Brazilian component of ICLE) English Brazilian Portuguese written Argumentative and literary essays    c. 200,000
Catholic University of São Paulo

University of São Paulo, Brazil
Restricted online access
The British Academic Written English (BAWE) corpus English

Mainly L1 speakers

Also includes data produced by L2 speakers

written ESP papers

4 levels of study (from undergraduate levels to final year and taught masters level)


c. 6,5 m.
Sheena Gardner
Warwick, UK

University of Birmingham, UK
Paul Wickens
Oxford Brookes, UK

The BAWE corpus can be accessed through the corpus analysis interface, Sketch Engine.

prototype interface that allows filtered searching of the BAWE corpus files is available.

The BUiD Arab Learner Corpus (BALC) English Arabic written School examination essays Various c. 290,000
The British University in Dubai,
United Arab Emirates

University of Birmingham, UK
At present, copies of the current version of the corpus is available on request from
The Cambridge Learner Corpus (CLC) English Various written Exam scripts Various c. 50 m. Cambridge University Press and Cambridge ESOL, UK Commercial
The Corpus of Academic Learner English
English German written Various academic text types that are typically produced in university courses of English, e.g. term papers, reading reports, research plans, abstract, reviews, and summaries. Advanced under development
University of Bremen, Germany
The Corpus of English Essays Written by Asian University Students (CEEAUS) English Various written Student essays Various c. 200,000
Kobe University, Japan
Freely downloadable from the website
The Chinese Academic Written English corpus
English Chinese written Dissertations written by Chinese undergraduates majoring in English linguistics or applied linguistics.   c. 400,000
City University of Hong Kong, Hong Kong
The Chinese Learner English Corpus
English Chinese written   Various c. 1 m. Gui Shichun
Guangdong University of Foreign Studies & Yang Huizhong, Shanghai Jiatong, China
The corpus can only be accessed by users in the Department of English at HKPU.
The City University Corpus of Academic Spoken English (CUCASE) English


Also includes data produced by L1 speakers

multimedia     c. 2 m.
City University of Hong Kong, Hong Kong
The Cologne-Hanover Advanced Learner Corpus (CHALC) English German written term papers and essays Advanced c. 210,000
University of Michigan, USA
The College Learners’ Spoken English Corpus
English Chinese spoken National spoken English test for non-English majors.   c. 700,000 Yang and Wei  
The Corpus Archive of Learner English in Sabah/Sarawak (CALES) English Malay written Argumentative essays Various c. 400,000 Simon Botley@Faizal Hakim
Doreen Dillah
Universiti Teknologi MARA Sarawak, Malaysia
The Corpus of Business Letters English Italian written

Tagged part: BEC1 writting tests (letters, emails, faxes, memos, reports)

Untagged part: business writing exam tests

  c. 32,000 Anna Romagnuolo  
The Corpus of Young Learner Interlanguage (CYLIL) English


spoken English L2 data elicited from European School pupils.
Longitudinal data
Various c. 500,000
Vrije Universiteit Brussel, Belgium
The Eastern European English learner corpus English Russian
spoken Spontaneaous spoken production data elicited by means of a semi-structured interview Various c. 60,000
Eberhard Karls University of Tübingen, Germany
The EFL Teacher Corpus
English Korean
spoken Teacher talks in language classrooms Upper-intermediate to advanced c. 123,000
Eun-Joo Lee
Under development
The English of Malaysian School Students corpus (EMAS) English Malay written Student essays + oral interviews various c. 500,000 et al.
Universiti Putra Malaysia, Malaysia
The English Speech Corpus of Chinese Learners
English Chinese spoken Dialogue reading-aloud Middle school and college   Chen Hua
Nantong University, China
Wen Qiufang
Beijing Foreign Studies University, China
Li Aijun
Chinese Academy of Social Sciences, China
The ETS Corpus of Non-Native Written English English 11 languages written 12,100 TOEFL English essays /   Daniel Blanchard

Information avout the score level is available for each essay

Samples are available

The Europarl corpus of Native Non-native and Translated Texts
English 24 EU languages written Proceedings of the European Parliament Advanced

NNS: c. 780,000

NS: c. 3 m.

Translated: c. 22m.

Sergiu Nisioi Available
The EVA Corpus of Norwegian School English English Norwegian spoken Picture-based tasks  / c. 35,000 Angela Hasselgren
University of Bergen, Norway
The Gachon Learner Corpus English Korean
(+ a few Chinese & Spanish speaking students) 
written Written Journal Assignments Lower intermediate c. 2,5 m. Brian Carlstrom Freely available
The GICLE corpus (German component of ICLE) English German written Mainly non-academic argumentative essays Advanced c. 234,000    
The Giessen-Long Beach Chaplin Corpus
English German spoken Transcribed interactions between native English speakers, ESL and EFL speakers Various c. 350,000 Andreas Jucker
Sara Smith
University of Giessen, Germany
Restricted use: apply for approval to get a copy.
The Hong Kong University of Science & Technology learner corpus
English Chinese - mostly Cantonese written Untimed assignments written for EFL courses and school leaving exams University and advanced high school students c. 25 m.
Hong Kong University of Science &Technology, Hong Kong
The Indianapolis Business Learner Corpus
English Various written Job application letters and résumés of business communication students from the U.S., Belgium, Finland, Germany, and Thailand, spanning the years 1990-1998    

Thomas Albin Upton
Indiana University, USA
The International Corpus of Crosslinguistic Interlanguage (ICCI) English Various written Essays (20-min in-class tasks without the use of a dictionary)  Beginner to lower-intermediate 9,000 essays Yukio Tono
Tokyo University of Foreign Studies, Japan
Freely available
The International Corpus Network of Asian Learners of English
English Chinese
written and spoken

Controlled speeches and essays

L1 productions by 350 NS

Various c. 1,8 m.
Kobe University, Japan
Freely available
The International Corpus of Learner English
English Various written Argumentative and literary essays High-intermediate to advanced c. 3 m.
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom + handbook: order online.
The International Teaching Assistants corpus
English Various spoken Learner language from a variety ofspoken classroom tasks: office hours role plays, presentations, discussions   c. 500,000

Pennsylvania State University, USA
The ISLE speech corpus English German
spoken Recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions) Intermediate  c. 18h CD-Rom
The Israeli Learner Corpus of Written English English Hebrew written Argumentative and descriptive essays   c. 750,000
Kibbutzim College of Education, Israel
The Japanese English as a Foreign Language Learner Corpus
English Japanese written Student essays From beginning to intermediate c. 700,000

Yukio Tono, Meikai University, Japan

The JEFLL Corpus will be freely available for research, first via the web query system (already available in Japanese) and then the entire data will be distributed under license in the future.
The Janus Pannonius University Corpus
English Hungarian written Essays and research papers University students c. 500,000
University of Pécs, Hungary
Searchable online
Lancaster Corpus of Academic Written English
English various written IELTS academic writing tests (descriptive and argumentative tasks); assignments.
Longitudinal data.
The Lang-8 Learner Corpora English Various written texts from Lang-8, a social networking site for language learning / / Toshikazu Tajiri & Mamoru Komachi Available
The LeaP Corpus : Learning Prosody in a Foreign Language English German spoken Four types of speech styles were recorded:
  • nonsense word lists
  • readings of a short story
  • retellings of the story
  • free speech in an interview situation
Various  c. 12h
Albert-Ludwigs-University Freiburg, Germany

The annotated corpus is available to the scientific community. Please contact at the University of Augsburg.

LeaP manual

The Learner Corpus of Engineering Abstracts
English Malaysian written Abstracts of the Computer and Communication Systems Engineering Final Year Projects Various

c. 550,000

998 abstracts

Helen Tan, University Putra Malaysia

Chan Swee Heng

Ain Nadzimah

Syamsiah bt Mashohor

The Learner Corpus of English for Business Communication English Chinese written Different types of business correspondence written for simulated business situations, including memos, faxes, reports, letters of enquiry and complaint letters   c. 117,500
Hong Kong Polytechnic University, Hong Kong
Searchable online
The Learner Corpus of Essays and Reports English  Chinese written Essays and project reports covering a range of topics from Science, IT and New Media to Nursing, Business and Economics, and the Social Sciences   c. 188,000

Sima Sengupta
Hong Kong Polytechnic University, Hong Kong


Searchable online
A Learners' Corpus of Reading Texts English French spoken Unprepared reading of English texts.
The texts are short abstracts of fiction or made-up dialogues.
 University students   Sophie Herment
Valérie Kerfelec
Laetitia Leonarduzzi
Gabor Turcsan
Freely available
The LONGDALE project: LONGitudinal DAtabase of Learner English English Various spoken and written Range of text types/task types.
Longitudinal data.
From intermediate to advanced  
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
Under development
The Longman Learners' Corpus English Various written Essays and exam scripts Various c. 10 m. Longman Commercial
The Louvain International Database of Spoken English Interlanguage (LINDSEI) English Various spoken Interviews and picture descriptions High-intermediate to advanced c. 800,000
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom and handbook: order online
The Malaysian Corpus of Learner English
English Malay written       Gerry Knowles
Zuraidah Mohd. Don
University of Malay, Malaysia
The Malaysian Corpus of Students' Argumentative Writing
English Malay
Chinese Indian
written Argumentative essays

Form 4
Form 5

c. 565,500

University Putra Malaysia

Available from developers
The Michigan Corpus of Academic Spoken English (MICASE) English Mainly L1 speakers but also includes data produced by L2 speakers spoken Transcipts of academic speech events   c. 1,8 m.

Ute Römer
University of Michigan, USA

Searchable online
The Michigan Corpus of Upper-level Student Papers (MICUSP) English Semi-balanced sample of native and non-native speakers of English written ESP papers
A-grade papers or ungraded papers that have been assessed and accepted (such as research proposals), but not published
  c. 2,6 m.

Ute Römer
University of Michigan, USA

Searchable online
The Montclair Electronic Language Database
English Various written Student essays Various c. 100,000

Monclair State University, USA

Searchable online

Includes error annotations

The Multimedia Adult ESL Learner Corpus
English ESL environment multimedia Video of classroom interaction and associated written materials Beginner to upper-intermediate  

Stephen Reder
Kathryn Harris
Kristen Setzler
Portland State University, USA

The Lab School would like to share the extensive resources from MAELC with interested researchers and teacher trainers. Those interested should make inquiries to the Lab School by .
The Neungyule Interlanguage Corpus of Korean Learners of English (NICKLE) English Korean spoken and written

Written part: student essays
Spoken part: student interviews and oral speech tests transcriptions

Mainly from beginning to intermediate 

c. 890,000

c. 100,000

Yonsei University, Seoul, Korea
The corpus will be available to the scientific community for research purposes upon request.
The Japanese Learner English Corpus
English Japanese spoken English oral proficiency interview test various 2 m.

National Institute of Information and Communications Technology, Kyoto, Japan.
Freely available (downloadable)
The NOn-native Spanish corpus of English
English Spanish written Argumentative and descriptive student essays Intermediate and upper-intermediate c. 300,000 words  
Universidad de Granada, Spain
The NUS Corpus of Learner English English Several East Asian languages, predominantly Chinese written Student essays on a wide range of topics including environmental pollution, healthcare, etc.   various c. 1 m.

National University of Singapore, Singapore.
Freely available
The PELCRA Learner English Corpus
English Polish spoken and written Written: Argumentative, descriptive, narrative and quasi-academic essays; formal letters From beginning to post-advanced

Under development

Aim spoken:
c. 200,000

Aim written:
c.2,8 m.

Piotr Pęzik
Barbara Lewandowska-Tomaszczyk
University of Lodz, Poland

Online search engine and corpus analysis tools
The PICLE corpus (Polish component of ICLE) English Polish written Student essays Advanced c. 330,000
AMU, Poznan, Poland
Searchable online
The Qatar learner corpus English Arabic (mostly from Qatar) spoken Spoken interviews with Qatari learners of English     Yun Zhao Helen
Carnegie Mellon University, USA
Freely available
The Québec learner corpus English French (from Québec) written Argumentative essays Intermediate and advanced c. 250,000
Université du Québec à Montréal, Canada
The Romanian Corpus of Learner English
English Romanian written Student essays    
Zurich University, Switzerland
The Russian Learner Translator Corpus
Russian written Translations produced by trainee translators Trainee translators c. 1.5 m. tokens Project directors: Andrey Kutuzov and Maria Kunilovskaya Freeliy available
The Santiago University Learner of English Corpus (SULEC) English Spanish spoken and written

Written: compositions or argumentative essays.

Spoken: semistuctured interviews, short oral presentations and brief story descriptions.

Various Aim: c. 1 m. words Ignacio M. Palacios Martínez, Santiago University Available after registration
The Scientext English Learner Corpus English French written Academic argumentative texts    c. 1.1 m. Searchable online
Second Language Research Tasks
English Various



written paragraphs

various oral tasks

Various c. 300,000

Bill Crawford (Northern Arizona University)

Kim McDonough (Concordia University)

Under development
The Seoul National University Korean-speaking English Learner Corpus (SKELC) English Korean written Student essays Various c. 900,000
Seoul National University
The SILS Learner Corpus of English English Various (mainly Japanese) written Student essays Basic, intermediate and advanced

 c. 3.2 m.

(first and second drafts included)

Waseda University, Japan
The Soochow Colber Student Corpus (SCSC) English Chinese written Student essays   c. 227,000 Colman Bernath
Soochow University, Taiwan
The Spoken and Written English Corpus of Chinese Learners
English Chinese spoken (SECCL)
and written (WECCL)

Written: argumentative and narrative essays.

Spoken: National Spoken English Test – longitudinal data

  c. 2 m. Wei Qiufang
Liang Maocheng
Wang Lifei


The Taiwanese Corpus of Learner English
English Chinese written Journals and essays (descriptive, narrative, expository, argumentative) Intermediate to advanced c. 2 m. Rebecca Hsue-Huch Shih
Sun Yat-sen University, Taiwan
The Tawainese learner academic writing corpus (TaiwanLAWC) English Chinese written Theses and dissertations written by Taiwanese graduate students.    
National Taiwan Normal University, Taiwan

The TELEC Secondary Learner Corpus

English Chinese written and spoken Compostions from secondary classroom   c. 2 m.
University of Hong Kong, Hong Kong
The Telecollaborative Learner Corpus of English and German Telekorp English German written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m.
Pennsylvania State University, USA.
Not publicly available
The Ten-Thousand English Compositions of Chinese Learners
English Chinese written Essays (various topics) written in and after class, and in testing context. Also contains some collaborative writing samples. Various (mainly undergraduates) c. 1,8 m. Project initiator: Jiajin Xu, National Research Centre for Foreign Language Education, Beijing Foreign Studies University Raw texts and part-of-speech tagged texts are available
The Tswana Learner English Corpus (TLEC) English Tswana written Argumentative essays Advanced c. 200,000
North-West University, South Africa
Available in ICLE
The Uppsala Student English Corpus
English Swedish written Student essays Various c. 1,200,000

Uppsala University, Sweden
The corpus can be used for research and educational purposes. It can be accessed on the Internet from the Oxford Text Archive.
The UPF Learner Translation Corpus English Catalan written Translations written by the students of the Translation and Interpreting degree at UPF.    c. 200,000
Pompeu Fabra University, Barcelona, Spain 
The UPV Learner Corpus English Catalan written essays Various c. 150,000 Universitat Politècnica de València, Spain  
The Varieties of English for Specific Purposes dAtabase learner corpus
English Various written ESP texts (term papers, reports, MA dissertations) Various c. 220,000 (under development)
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
The Written Corpus of Learner English corpus
English Spanish written Essays Various c. 750,000
Universidad Autonoma de Madrid, Spain
The corpus is available for free, and can be downloaded from this website. There is also a search interface to retrieve sentences and clauses.
The Yonsei English Learner Corpus (YELC) English Korean written Yonsei University English Diagnostic Tests (Part 1: Descriptive task, max. 100 words; Part 2: Argumentative tast, max. 300 words) 9 levels
(A1, A1+, A2, B1, B1+, B2, B2+, C1, C2)
c. 1 m. Seok-Chae Rhee

Yonsei University, Korea
The YELC corpus will be available to the scientific community for research purposes from 31 March 2012.
The Young Learner Corpus of English
English Greek spoken Pedagogic Corpus of video-recorded EFL language classes.  

170 school hours (126  hours of videotaped material)

1,5 m. types

Project director: Marina Mattheoudakis, Aristotle University of Thessaloniki, Greece

Thomas Zapounidis

The Estonian Interlanguage Corpus of Tallinn University
Estonian Russian
written Spontaneously produced texts in language learning situations: argumentative and literary essays, written stories, letters, term papers, reading reports. A1-C2 c. 1 m. Project director:
Tallinn University, Estonia
Restricted online access
Linguistic Basis of the Common European Framework for L2 English and L2 Finnish
Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

Paths in Second Language Acquisition
Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

The Advanced Finnish Learner Corpus
Finnish  Russian
written Exam essays, theses, essays and writings Advanced c. 630,000

Kirsti Siitonen, University of Turku, Finland

Ilmari Ivaska, University of Turky, Finland

The Finnish National Foreign Language Certificate Corpus (YKI) Finnish

Lappish (Sami)



Various Beginner, intermediate and advanced  

Ari Maijanen, Centre for Applied Language Studies, University of Jyväskylä, Finland

Tiina Lammervo, Centre for Applied Language Studies, University of Jyväskylä, Finland

Available with user ID and Password
The International Corpus of Learner Finnish
Finnish Various written Finnish learners’ spontaneously produced texts in language learning situations, large variety of text types Beginner, intermediate and advanced Under development

University of Oulu, Finland

Free download after applying for a user licence
The Chy-FLE (Cypriot Learner Corpus of French) French Modern Greek
(and Cypriot Greek)
written Argumentative and descriptive essays From intermediate to advanced c. 250,000 (under development)
Université de Poitiers, France
In collaboration with the University of Cyprus
The COREIL corpus French

Université Paris-Diderot, France
The "Dire Autrement" corpus French (Second Language) Mainly L1 speakers of English written Narrative, injunctive, persuasivle and informative texts   c. 50,000
Jasmina Milicevic
Dalhousie University, Canada
Available after registration
French Interlanguage Database
French Various written Free compositions: desciptive, argumentative and narrative texts, news & mail  Intermediate  
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
French Learner Language Oral Corpora
French Various spoken See description of the 7 corpora Various  
Newcastle University

University of Southampton, UK

The contents of the database are being made freely available to the research community, in the form of digital sound files and related transcripts formatted using CHILDES software.

Searchable online

The InterFra corpus French Swedish spoken Interviews, retellings of video clips and picture stories Various  

Stockholm University, Sweden.

The "Interphonologie du Français Contemporain" corpus
French Cypriot Greek
English (Canada)

spoken Reading aloud, repeating words, guided interviews, interactions between two learners. Various Under development
Waseda University, Japan
Université de Rouen, France

Université de Genève, Switzerland

Tokyo University of Foreign Studies, Japan
Under development; samples available
The Learner Corpus French
French Dutch written

Argumentative essays
Informative texts
Journalistic texts
Formal letters

Written compositions by Flemish students of French

Intermediate to advanced c. 500,000 K.U.Leuven Campus Kortrijk, UGent and Lessius
Under development
The Lund CEFLE Corpus (Corpus Écrit de Français Langue Étrangère) French Swedish written Descriptive and narrative essays; picture-based stories. Various c. 100,000
Lund University, Sweden
A sub-part of the corpus is available online.
The University of the West Indies learner corpus


Jamaican Creole

spoken Conversations during oral exams and in informal contexts Various  
University of New South Wales, Sydney, Australia
Comasan Labhairt ann an Gàidhlig (CLAG)
Gaelic Adult Proficiency
Gaelic Various spoken

Conversation task


Elicited oral imitation task

Question and answer activity


Roibeard Ó Maolalaigh (University of Glasgow)

Nicola Carty (University of Glasgow)

The AleSKO corpus German


Also German L1 data from the FALKO corpus

written Argumentative essays    c. 13,600
University of Konstanz, Germany

Vilnius Pedagogical University, Lithuania.
Analyzing Discourse Strategies: A Computer Learner Corpus German English
(mainly American English)
written Threaded Discussion
Longitudinal data
From beginner to intermediate-mid Under development

University of Pennsylvania, USA
The Corpus of Learner German (CLEG13) German English written Argumentative, free compositions
Longitudinal over 4 years, undergraduate students
Intermediate to advanced c. 320,000

Online access through the FALKO platform.
The corpus is also available as txt files to the scientific community. Please contact

The deL1L2IM corpus German

Russian-Belorussian bilinguals

written Instant messaging dialogues Advanced c. 52,000

Sviatlana Höhn
University of Luxemburg

The Fehlerannotiertes Lernerkorpus (‘error annotated learner corpus’)

Learner subcorpus: various

Native subcorpus: German


1. Summaries

2. Essays

3. Letters, fiction writing, journal articles, book reviews (= longitudinal data from American learners)

1. Advanced

2. Advanced

3. Beginners - advanced


1. c. 40,000 (learner subcorpus) + c. 20,000 (native subcorpus)

2. c. 150,000 (learner corpus) + c. 70,000 (native subcorpus)

3. c. 78,000 (learner subcorpus)

Anke Lüdeling
Maik Walter
Humboldt-Universität zu Berlin
Institut für deutsche Sprache und Linguistik, Germany

Online access
The KOLIPSI corpus German Italian written Two written language production tasks of a standardized test (email/letter) A2-C1 under development

European Academy Bolzano/Bozen, Italy
The Learning the Prosody of a Foreign Language
German Various spoken The LeaP corpus covers four different types of speech:
- read speech
- prepared speech
- free speech
- nonsense word lists
Various  62 speakers
University of Augsburg, Germany

The annotated corpus is available to the scientific community. Please contact at the University of Augsburg.


The LeKo (Lernerkorpus) corpus German         c. 55,000 , Humboldt-Universität Berlin, Germany

Online access (password protected)

Register here

The LINCS Corpus

1. German

2. German

3. German

1. English

2. German

1. Written

2. Written

3. Written

1. Essays, examination, answers.
Longitudinal and cross-sectional data.

2. Essays

3. Teaching output

1. Intermediate to Advanced

2. Advanced

Under development
Heriot-Watt University Edinburgh, UK
Not currently publicly available
Multilingual Platform for the European Reference Levels: Exploring Interlanguage in Context




Various written writing tasks from standardized tests (telc/UJOP) A1 to C1 c. 280,000 Katrin Wisniewski Available
The Telecollaborative Learner Corpus of English and German Telekorp German English written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m.

Pennsylvania State University, USA.


Not publicly available
The Langman corpus Hungarian Chinese spoken Interviews conducted in 1994 with 11 Chinese immigrants living in Hungary.
Interviews focused on issues related to their arrival in Hungary as well as their daily life activities
University of Texas at San Antonio, USA
Freely available
Corpus di Apprendenti di Italiano L2
Italian Various written Essays Intermediate to advanced c. 237,000 Stefania Spina, Università per Stranieri di Perugia Searchable via CQPweb
Corpus parlato di italiano L2 Italian English
spoken Transcriptions of interviews Various   Stefania Spina
Silvio Pazzaglia
Mirco Perini
Università per Stranieri di Perugia, Italy
Searchable online
The KOLIPSI corpus Italian German written Two written language production tasks of a standardized test (email/letter) A2-C1 Under development
European Academy Bolzano/Bozen, Italy
The Lexicon of Spoken Italian by Foreigners
Italian Various spoken Proficiency exams of the Certification of Italian as a Foreign Language (CILS) A1-C2 c. 700,000

Università per Stranieri di Siena, Italy

Freely available
Varietà di Apprendimento della Lingua Italiana: Corpus Online
Italian Various written   Various c. 570,000
Freely available and searchable online.
The Korean learner corpus Korean Various written Various: letters, essays, formal writing... Beginner and intermediate c. 10,000
Georgetown University, USA

Wellesley College, USA

Yonsei University, South Korea
The Andrespråkskorpus ('Second Language Corpus')
Norwegian German
written Essays from language tests  B1 and B2  
University of Bergen, Norway
The Persian Learner Corpus
Persian (Farsi) Various written Narratives and essays Intermediate and advanced Under development

Saeed Safari

University of Belgrade, Faculty of Philology

Under development
The Salam Farsi Learner Corpus
Persian (Farsi) Serbian written Narratives, descriptive essays Beginner and upper-intermediate Under development

Saeed Safari

University of Belgrade, Faculty of Philology

Academic, under development
The PIKUST pilot learner corpus Slovene Various written Mostly argumentative essays Majority advanced – but also intermediate and beginner c. 35,000 Mojca Stritar
University of Ljubljana, Slovenia
The Anglia Polytechnic University (APU) Learner Spanish Corpus Spanish Various written     c. 120,000
Anglia Ruskin University, UK
Aprescrilov ("Aprendera Escribiren Lovaina") Spanish Dutch written Written assignments and tests; several text types (letters, expository, descriptive, argumentative, narrative) A1 to C1 c. 1 m.

KU Leuven, Belgium

Restricted online access

The Corpus de aprendices de español

Spanish Various written   A1 to C1

c. 575,000

CAES team

Universidade de Santiago de Compostela

Online access
Corpus Escrito del Español L2
Spanish English written Written compositions by learners of Spanish   c. 730,000

Universidad Autónoma de Madrid, Spain

Universidad de Granada, Spain

 Please contact to get a free sample of the corpus

Corpus de textos escritos para el análisis de errores de aprendices de E/LE

Spanish Various written Essays A2 to C1 /

Cestero Mancera, A. M.
Penadés Martínez, I.

Universidad de Alcalá Henares

CD-ROM available
The Corpus of Taiwanese Learners of Spanish (Corpus de Aprendices Taiwaneses de Español)
Spanish Chinese written Student essays Various c. 340,000 Under development
The DIAZ corpus Spanish


spoken Semi-spontaneous (structured interviews) and experimental (structured questionnaires) Adult Spanish L2/L3 oral data Various  
Universitat Pompeu Fabra, Spain
Freely available
The Japanese learner corpus of Spanish Spanish Japanese written Student essays   c. 83,400
University of Birmingham, UK
The Spanish Corpus Proficiency Level Training
Spanish English (heritage language learners) spoken Dialogues about a given set of questions Beginner to advanced   Dr Dale Koike, University of Texas, Austin Liberal Arts Instructional Technology Center

Videos are available

Spanish Learner Language Oral Corpus
Spanish English spoken Learner narratives, interviews and picture description tasks Beginner to advanced c. 50,000 Laura Dominguez
University of Southampton, UK
Searchable online
Data freely available for download
Spanish Learner Oral Corpus Spanish Various
(9+ languages - especially Portuguese, French, Italian)
spoken Semi-spontaneous interviews, narrative and descriptive tasks A2-B1 c. 50,000 words
Laboratorio de Lingüistica Informatica
Universidad Autonoma de Madrid, Spain
Online access
The Tartu Learner Corpus of Spanish as a L3+ Spanish Estonian written Academic research writing Advanced c. 885,000 Mari Kruse, University of Tartu, Estonia  
The ASU corpus Swedish  Chinese
spoken and written Transcribed audio-recorded conversations and written texts from adult learners of Swedish – longitudinal data   c. 490,000 words
(c. 415,000 spoken and c. 75,000 written)

Stockholm University, Sweden

The European Science Foundation Second Language Database
(ESF database)




spoken Spontaneous second language acquisition of forty adult immigrant workers living in Western Europe, and their communication with native speakers in the respective host countries Various  
Clive Perdue
Max Planck Institut, Nijmegen, Netherlands
Freely available
The Foreign Language Examination Corpus
Multilingual Polish written Data from the Warsaw University
Certification Exams
Various Under development

Warsaw University, Poland
The MeLLANGE Learner Translator Corpus
Multilingual various written Legal, technical, administrative and journalistic texts Trainee translators  

Université Paris Diderot, France.

Searchable online
The MiLC Corpus



Catalan written Formal and informal letters, summaries, curriculum vitae, essays, reports, translations, synchronous and asynchronous communication exchanges, business letters   c. 150,000 et al
Universidad Polytecnica de Valencia, Spain
The Multilingual Learner Corpus (MLC)



Brazilian Portuguese written Argumentative and marrative essays    Aim: c. 200,000
University of São Paulo, Brazil
Accessible online to registered researchers
The Padova Learner Corpus



Italian CMC
(Computer-Mediated Communication)

Student work produced in blended language courses using FirstClass conferencing software.
Variety of genres: diaries, debate contributions, formal reports, résumés etc. 
Longitudinal data


  Under development

University of Padua, Italy

The corpus PARallèle Oral en Langue Etrangère




(Mainly L2 speakers but also includes data produced by L1 speakers)

Various spoken 5 oral production tasks Various  

Marie-Jo Derive
Nejma Succo
Jean O'Donnell
Sandra Billard
Sandrine Rutigliano-Daspet
Université de Savoie, France
The University of Toronto Romance Phonetics Database



(including English, Mandarin, Russian, Spanish, etc.)
spoken Elicited production - sentence and passage reading, story narration, description of favourite meal Various  

University of Toronto, Canada
Password available from directors


Learner corpus-based datasets

Corpus Target language First language Medium Text type / task type Proficiency level Size in words Project director Availability
 The Treebank of Learner English
 English Various written  Sentences from the CLC FCE (annotated with syntactic trees)  Upper-intermediate

(5,124 sentences)

Yevgeni Berzak Publicly available through the UD repository ('English-ESL')


| 15/09/2016 |