Learner corpora around the world

This list is still work in progress. We would like it to be as comprehensive as possible. If you have a learner corpus or know of one that is not listed on this webpage, send a message to Magali Paquot and we will add it to the list. We hope you will find the list useful for your research!

The list below only contains learner corpora, i.e. electronic collections of continuous written or spoken data produced by foreign or second language learners.

For a list of learner corpus-based datasets (treebanks, error lists, etc.), click here.

To refer to this list :

Centre for English Corpus Linguistics (date of access): Learner Corpora around the World. Louvain-la-Neuve: Université catholique de Louvain. https://uclouvain.be/en/research-institutes/ilc/cecl/learner-corpora-around-the-world.html


© 2019, Université catholique de Louvain

Learner corpora 

Last updated 4 April 2023

Use the query box below to search for specific keywords (e.g. languages, task type, medium). 

Corpus Target language First language Medium Text type / task type Proficiency level Size in words Project director Availability
The Arabic Learner Corpus (ALC) Arabic 66 languages Written and spoken Narrative and discussion Intermediate and advanced Written: c. 283,000
Audio: c. 3h30
Abdullah Alfaifi (Al Imam University, Saudi Arabia)
Eric Atwell (University of Leeds, UK)
Available
The Pilot Arabic Learner Corpus Arabic English Written Narrative Intermediate and advanced c. 9,000 Ghazi Abuhakema (College of Charleston, USA)
Reem Faraj (Columbia University, USA)
Anna Feldman (Montclair State University, USA)
Eileen Fitzpatrick (Montclair State University, USA)
/
The Jinan Chinese Learner Corpus (JCLC) Chinese 50 languages Written Exams and assignments Beginners, intermediate and advanced c. 6 m. Chinese characters
c. 9,000 texts
Maolin Wang (Jinan University, China)
Shervin Malmasi (Macquarie University, Australia)
Mingxuan Huang (Guangxi University of Finance and Economics, China)
Free download upon contact with researchers
Croatian Learner Text Corpus (CroLTeC) Croatian 36 languages Written Exam essays, argumentative and literary essays, letters, diaries, picture descriptions, book reviews, short dialogues, etc. A1-C2 c. 1 m. Nives Mikelić Preradović (University of Zagreb, Croatia) Freely available
The AKCES/CZESL (Acquisition corpora of Czech/Czech as a second language) corpus Czech Various Written and spoken Student essays and interviews Various 2 m. Karel Šebesta (Charles University/Technical University, Czech Republic) Available
Leerdercorpus Nederlands als Vreemde Taal Dutch French Written       Liesbeth Degand (Université catholique de Louvain, Belgium)  
Arab Learner English Corpus (ALEC) English Arabic Written Essays written by freshman students as part of first level college writing course University students (second language learners) Analysis: 184,749
Narrative: 67,527
Synthesis: 66,015
Argumentation: 192,298
Inas Mahfouz (American University of Kuwait, Kuwait)

Available upon request for research purposes

A part-of-speech tagged version of the corpus is now available

The Aachen Corpus of Academic Writing (ACAW) English German Written Academic research writing Advanced c. 240,000
c. 225,000 (L1 component)
Elma Kerz (RWTH Aachen University, Germany) Under development
The Advanced Learner English Corpus (ALEC) English Mainly Swedish Written Essays written by university students of English linguistics and English literature Advanced c. 1.3 m. Tove Larsson (Uppsala University, Sweden; Université catholique de Louvain, Belgium) Not freely available
The ANGLISH corpus English French Spoken Readings of texts and sentences, spontaneous oral language Various c. 5h30 Anne Tortel (University of Provence, France) Freely available
Asao Kojiro’s Learner Corpus Data English Japanese Written Essays and stories written or reproduced by Japanese college students     Asao Kojiro (Ritsumeikan University, Japan) Texts available for download
The Barcelona English Language Corpus (BELC) English Spanish
Catalan
Spoken and written 4 tasks: written composition, oral narrative, oral interview, role-play
Longitudinal data (children and young adults learning English)
Various   Carmen Muños (University of Barcelona, Spain) Available
The BATMAT Corpus English Swedish
Finnish
Written BA dissertations, MA dissertations Advanced c. 2,5 m. Tuija Virtanen-Ulfhielm (Åbo Akademi University, Finland) Not publicly available

Belarusian Learner Corpus of English (BELLCE)

English Russian
Belarussian
Written Argumentative essays High intermediate to advanced Unknown Anastasia Rakhuba  
The Bilingual Corpus of Chinese English Learners (BICCEL) English Chinese Spoken and written Spoken: National Oral English test
Written: in-class assignments
  c. 2 m. Wen Qiufang (Beijing Foreign Studies University, China)  
The Brazilian Spoken Corpus of English Learners (BraSCEL) English Portuguese Spoken Informal interview + thought-provoking picture discussion A1-C2 benchmarked to the CEFR Under development Mateus Miranda (Mary Immaculate College/University of Limerick, UK) The corpus (transcriptions of audio files) will be available to the scientific community upon request
The British Academic Written English (BAWE) corpus English Mainly L1 speakers
Also includes data produced by L2 speakers
Written ESP papers 4 levels of study (from undergraduate levels to final year and taught masters level) c. 6.5 m. Hilary Nesi (Coventry University, UK)
Sheena Gardner (Coventry University, UK)
Paul Thompson (University of Birmingham, UK)
Paul Wickens (Oxford Brookes, UK)
The BAWE corpus can be accessed through the corpus analysis interface, Sketch Engine.
The BUiD Arab Learner Corpus (BALC) English Arabic Written School examination essays Various c. 290,000 Mick Randall (The British University in Dubai, United Arab Emirates)
Nicholas Groom (University of Birmingham, UK)
At present, copies of the current version of the corpus is available on request from mick.randall@buid.ac.ae
The Cambridge Learner Corpus (CLC) English Various Written Exam scripts Various c. 50 m. Cambridge University Press and Cambridge ESOL (Cambridge University, UK)

Commercial

Available on SketchEngine

Canadian job cover letter corpus English Multiple L1s for permanent residents of Canada. Self-reported L1s for those who consented to archive their letters include: Mandarin (33), Farsi (25), Punjabi (17), Korean (15), Chinese (11), Spanish (9), Arabic (5), Tagalog (5), Cantonese (4), Russian (3), French (2), Hindi (2), Taiwanese (2), Turkish (2), Albanian (1), Armenian (1), Bengali (1), Bulgarian (1), Cebuano (1), Czech (1), Hakka (1), Ilocano (1), Japanese (1), Karen (1), Khmer (1), Kurdish (1), Pashto (1), Serbian (1), Vietnamese (1), and Waray (1). Written  Job application cover letters (obtained in a simulated task conducted 2015 to 2016 in ESL classrooms in British Columbia, Canada. 201 letters were collected, and 151 were archived with consent) Low to high intermediate English (ranging from CLB 3 to 8) circa 29,000 (for the archived 151 letters) Dr. Terri Everest, teverest@alumni.ubc.ca (under PhD supervisor Dr. Grisel Garcia Perez). Dr. Everest obtained English cover letters from learners and NES (as well as model letters from L1 English books) in both a pilot study and dissertation study. Creative Commons Non-Commercial ShareAlike 4.0 International license. See https://open.library.ubc.ca/cIRcle/collections/ubctheses/33426/items/1.0417287 (folder 5, Everest_cover_letter_corpora_meta_data, in zip file). Users may adapt materials but must attribute Dr. Everest and share it under the same terms. Commercial use is not permitted [Three L1 English cover letter corpora from the same project are also available: (1) 21 NES letters from pilot study participants in one community; (2) 40 NES letters from dissertation study participants, university students and alumni; and (3) 100 sample/model letters from Canadian and American books on cover letter writing, with 10 letters each from 10 job fields, all by different authors or editors.]. 
CEFR-ASAG corpus English French Written Short answers to an open-ended question targeting different proficiency levels A1-C 712 learner texts ALTISSIA International & CENTAL, UCLouvain, Belgium · cental@uclouvain.be Available
The CELI corpus Italian Various Written Written tasks from language certification exams (article, blog, email, letter, story, report, essay) B1-C2 ca. 600,000 Stefania Spina, Irene Fioravanti, Luciana FortiValentino Santucci,  Angela Scerra, Fabio Zanda (University for Foreigners of Perugia, Italy) Freely searchable via CQPweb (registration required) from https://www.unistrapg.it/cqpwebnew/celi/
 Corpus de ELE en Japón (CELEN) Spanish Japanese Written

Texts from a) University courses of Spanish in Japan (exams and assignments) and b) Informal learning contexts on the Internet (electronic blogs and forums)

A1 to C2 c. 658,000 Pilar Valverde Ibañez (Kansai Gaidai University, Japan)  Online access
The Chinese/English Political Interpreting Corpus (CEPIC)   English/Chinese [Cantonese/Putonghua] Chinese [Cantonese/Putonghua] / English Spoken and Written Political speeches Professional (Near-Native) 6,393,994 Jun PAN (janicepan@hkbu.edu.hk) Open Access
The Corpus of Academic Learner English (CALE) English German Written Various academic text types that are typically produced in university courses of English (e.g. term papers, reading reports, research plans, abstract, reviews, and summaries) Advanced Under development Marcus Callies (University of Bremen, Germany)  
The Chinese Academic Written English corpus (CAWE) English Chinese Written Dissertations written by Chinese undergraduates majoring in English linguistics or applied linguistics   c. 400,000 David Yong Wey Lee (City University of Hong Kong, Hong Kong)  
The Chinese Learner English Corpus (CLEC) English Chinese Written   Various c. 1 m. Gui Shichun (Guangdong University of Foreign Studies, China)
Yang Huizhong (Shanghai Jiao Tong University, China)
The corpus can only be accessed by users in the Department of English at HKPU
The City University Corpus of Academic Spoken English (CUCASE) English Chinese
Also includes data produced by L1 speakers
Multimedia     c. 2 m. David Yong Wey Lee (City University of Hong Kong, Hong Kong)  
The Cologne-Hanover Advanced Learner Corpus (CHALC) English German Written Term papers and essays Advanced c. 210,000 Ute Römer (University of Michigan, USA)  
The College Learners’ Spoken English Corpus (COLSEC) English Chinese Spoken National spoken English test for non-English majors   c. 700,000 Yang Huizhong (Shanghai Jiao Tong University, China)
Wei Naixing (Beihang University, China)
 
The Corpus Archive of Learner English in Sabah/Sarawak (CALES) English Malay Written Argumentative essays Various c. 400,000 Simon Botley@Faizal Hakim, Doreen Dillah (Universiti Teknologi MARA Sarawak, Malaysia)  
Corpus Oral de Português como Língua Adicional-Brasil (CoPLA-BR)/Oral Corpus of Brazilian Portuguese as an Additional Language Portuguese Various Spoken Informal interview + thought-provoking picture discussion Basic Intermediate Advanced Under development Mateus Miranda (Mary Immaculate College/University of Limerick, UK) The corpus (transcriptions of audio files) will be available to the scientific community upon request.
Corpus Escrito de Aprendices de Inglés como Lengua Extranjera en Ecuador (COREAILE) English Spanish (Ecuadorian) Written Narrative Beginners and intermediate 44,352 (210 texts) Miguel A. Macías Loor (Universidad Técnica de Manabí, Ecuador) Available upon contact with researcher (miguel.macias@utm.edu.ec)
CORpus del ESPañol de los Italianos (CORESPI) Spanish Italian Written Written compositions A1 to B2 c. 125,000 Sonia Bailini (Università Cattolica del Sacro Cuore, Italy) Online access
CORpus del ITaliano de los Españoles (CORITE) Italian Spanish Written Written compositions A1 to B2 c. 103,000 Sonia Bailini (Università Cattolica del Sacro Cuore, Italy) Online access
The Corpus of Business Letters English Italian Written Tagged part: BEC1 writting tests (letters, emails, faxes, memos, reports)
Untagged part: business writing exam tests
  c. 32,000 Anna Romagnuolo (University of La Tuscia, Italy)  
The Corpus of Multilingual Opinion Essays by College Students (MOECS) English Varied Written Opinion essays College students Unknown Megumi Okugiri (University of the Sacred Heart, Japan) Available
Corpus of writing, pronunciation, reading, and listening by learners of English as a Foreign Language English Japanese Written and spoken Varied Beginners to advanced 29h audio
30,000 words
Katsunori Kotani (Kansai Gaidai University, Japan)
Takehiko Yoshimi (Ryukoku University, Japan)
Hiroaki Nanjo (Ryukoku University, Japan)
Hitoshi Isahara (Toyohashi University of Technology, Japan)
 
Corpus of Written Spanish, L2 and Heritage Speakers (COWS-L2H) Spanish English
Mandarin
Other
Written Personal essays Beginner, intermediate, advanced, and heritage 1,138,097 Claudia H. Sánchez Gutiérrez (University of California, Davis, USA) Available on Github (https://github.com/ucdaviscl/cowsl2h)
The Corpus of Young Learner Interlanguage (CYLIL) English Dutch
French
Greek
Italian
Spoken English L2 data elicited from European School pupils – longitudinal data Various c. 500,000 Alex Housen (Vrije Universiteit Brussel, Belgium)  
Corpus and Repository of Writing (Crow) English 24 languages, predominantly Chinese and Arabic Written Analysis, narrative, literature review, argument, empathy writing, proposal, reflection High intermediate/advanced (TOEFL overall score 80-105); international undergraduate students in first-year writing classes 9 m. (in March 2020) Shelley Staples (University of Arizona, USA)
Bradley Dilger (Purdue University, USA)
Open access after registration
DISKO (Deutsch im Studium: Lernerkorpus/German at university: Learner Corpus) German Various Written Standardized writing task from university admission language test (TestDaF), app. 400 tokens per text B1-C2 c. 240,000 (DISKO_L2), c. 55,000 (DISKO L1), c. 12,000 (DISKO_DSH), c. 90,000 (DISKO_WebTestDaF) Katrin Wisniewski (University of Leipzig, Germany) Available online under the ANNIS architecture, please refer to the corpus handbook
The Eastern European English learner corpus English Russian
Ukrainian
Polish
Slovak
Spoken Spontaneaous spoken production data elicited by means of a semi-structured interview Various c. 60,000 Elena Salakhian (Eberhard Karls University of Tübingen, Germany)  
The EFL Teacher Corpus (ETC) English Korean Spoken Teacher talks in language classrooms Upper-intermediate to advanced c. 123,000 Ye-eun Kwon, Eun-Joo Lee (Kunsan National University, South Korea) Complete. Available at https://www.lextutor.ca/conc/eng/
The English of Malaysian School Students corpus (EMAS) English Malay Written Student essays and oral interviews Various c. 500,000 Arshad Abd. Samad (Universiti Putra Malaysia, Malaysia)  
The English Speech Corpus of Chinese Learners (ESCCL) English Chinese Spoken Dialogue reading-aloud Middle school and college   Chen Hua (Nantong University, China)
Wen Qiufang (Beijing Foreign Studies University, China)
Li Aijun (Chinese Academy of Social Sciences, China)
 
The ETS Corpus of Non-Native Written English English 11 languages Written 12,100 TOEFL English essays /   Daniel Blanchard Information avout the score level is available for each essay Samples are available
The Europarl corpus of Native Non-native and Translated Texts (ENNTT) English 24 EU languages Written Proceedings of the European Parliament Advanced NNS: c. 780,000
NS: c. 3 m.
Translated: c. 22 m.
Sergiu Nisioi (University of Bucharest, Romania) Available
English Students’ Oral Corpus in Chile (ESOC-Chile) English Spanish Spoken Student Interviews B1, B2, C1 73,631 Chinger Zapata (Universidad Católica del Norte, Chile) The corpus (audio files or plain transcriptions of audio files in txt format) will be available to the scientific community upon request to Chinger Zapata
The EVA Corpus of Norwegian School English English Norwegian Spoken Picture-based tasks / c. 35,000 Angela Hasselgren (University of Bergen, Norway)  
The FUSE (The Finnish Upper Secondary School Corpus of Spoken English) English Finnish (possibly other L1s too, information not collected) Spoken Role-tasks or mind-map tasks as part of a low-stakes, course examination in Finnish upper secondary/high schools CEFR: A2-C1 N/A Lasse Ehrnrooth (University of Helsinki, Finland) Online access
The Gachon Learner Corpus English Korean (+ a few Chinese and Spanish speaking students) Written Written Journal Assignments Lower intermediate c. 2,5 m. Brian Carlstrom (Gachon University, South Korea) Freely available
The Gesprochene Wissenschaftssprache konstrastiv/Multilingual corpus of spoken academic language (GeWiss) German English, Polish, Bulgarian and diverse other L1 languages Spoken Academic papers, student presentations and academic oral examinations in German philology / Applied Linguistics / Language pedagogy as well as in Polish, English, and Italian philology B2, C1 1.4 m. Christian Fandrych (Leipzig University, Germany) Freely available upon registration: https://gewiss.uni-leipzig.de/index.php?id=home&L=1
The GICLE corpus (German component of ICLE) English German Written Mainly non-academic argumentative essays Advanced c. 234,000    
The Giessen-Long Beach Chaplin Corpus (GLBCC) English German Spoken Transcribed interactions between native English speakers, ESL and EFL speakers Various c. 350,000 Andreas Jucker, Sara Smith (University of Giessen, Germany) Restricted use: apply for approval to get a copy
The Hong Kong University of Science & Technology (HKUST) learner corpus English Chinese (mostly Cantonese) Written Untimed assignments written for EFL courses and school leaving exams University and advanced high school students c. 25 m. John Milton (Hong Kong University of Science &Technology, Hong Kong)  
The Indianapolis Business Learner Corpus (IBLC) English Various Written Job application letters and résumés of business communication students from the U.S., Belgium, Finland, Germany, and Thailand, spanning the years 1990-1998     Ulla Connor, Kristen Precht, Thomas Albin (Upton Indiana University, USA)  
The International Corpus of Crosslinguistic Interlanguage (ICCI) English Various Written Essays (20-min in-class tasks without the use of a dictionary) Beginner to lower-intermediate 9,000 essays Yukio Tono (Tokyo University of Foreign Studies, Japan) Freely available
The Icelandic L2 Error Corpus (IceL2EC) Icelandic 13 languages Written Student essays and assignments Various c.125,000 Anton Karl Ingason, Lilja Björk Stefánsdóttir, Xindan Xu, Isidora Glišić (University of Iceland, Iceland) Open access
The International Corpus Network of Asian Learners of English (ICNALE) English Chinese, Filipino, Indonesian, Japanese, Korean, Malay, Thai, and Urdu Written and spoken Essays/ Monologues/ Dialogues A2, B1, B2+ c 3.5 m. Shin'ichiro Ishikawa (Kobe University, Japan) Open access
The International Corpus of Learner English (ICLE) English Various Written Argumentative and literary essays High-intermediate to advanced c. 3 m. Sylviane Granger (Centre for English Corpus Linguistics, Université catholique de Louvain, Belgium) CD-Rom + handbook: order online
The International Teaching Assistants corpus (ITAcorp) English Various Spoken Learner language from a variety of spoken classroom tasks: office hours role plays, presentations, discussions   c. 500,000 Steven L. Thorne, Paula Golombek, Jonathon Reinhardt (Pennsylvania State University, USA)  
The « Interphonolog of Contemporary English » corpus English French
Italian
Chinese
Spanish
Spoken Reading aloud, repeating words, guided interviews, interactions between two learners Various Under development Nadine Herry-Bénit (Université Paris Nanterre, France)
Stéphanie Lopez (Northwesterne Polytechnical University, China)
Jeff Tennant (University of Western Ontario, Canada)
Under development; samples available
The Iranian Corpus of Learner English English Farsi Written Expository essays University students (English majors) 436,035 Parviz Maftoon, Parviz Birjandi, Hossein Khazaee (Islamic Azad University, Iran) CD-ROM, data gathered for PhD dissertation by Hossein Khazaee; this corpus is an intellectual property of Science and Research Branch, Islamic Azad University, Tehran, Iran
The ISLE speech corpus English German
Italian
Spoken Recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions) Intermediate c. 18h ecisle@nats.informatik.uni-hamburg.de (University of Hamburg, Germany) CD-Rom
The Israeli Learner Corpus of Written English English Hebrew Written Argumentative and descriptive essays   c. 750,000 Tina Waldman (Kibbutzim College of Education, Israel)  
The Janus Pannonius University (JPU) Corpus English Hungarian Written Essays and research papers University students c. 500,000 József Horváth (University of Pécs, Hungary) Searchable online
The Japanese English as a Foreign Language Learner (JEFLL) Corpus English Japanese Written Student essays From beginning to intermediate c. 700,000 Yukio Tono (Meikai University, Japan)
jefll.inquiry@corpuscobo.net
The JEFLL Corpus will be freely available for research, first via the web query system (already available in Japanese) and then the entire data will be distributed under license in the future

Korean English Learners’ Spoken Corpus (KELSC)

URL: http://icr.or.kr/miscellaneous

English Korean Spoken

1. Two speaking tests using real-time video conferencing software.

2. Integrated Tasks.

2.1 Listen to a passage (60 seconds, 90~100 words) and summarize the context of the listened to passage. Preparation time: 60 seconds, Response time: 60 seconds.

2.2 Read a passage (60 seconds, 110~120 words) and summarize the context of the read passage. Preparation time: 60 seconds, Response time: 60 seconds.

CEFR: A1, A2, B1, B2, C1, C2 36,588

CK Jung & Kory Lauzon (Institute for Corpus Research, Incheon National University, South Korea)

Email: ckjung@inu.ac.kr

Available upon request for research purposes: Institute for Corpus Research, Incheon National University

URL: http://icr.or.kr

Kolipsi Corpus Family Italian
German
Italian
German
Written Written productions from upper secondary school pupils (narrative and argumentative texts)   c. 1 m.

 

All sub-corpora of the Kolipsi Corpus Family can be queried via the ANNIS interface or downloaded on the Eurac Research Clarin Repository (from mid 2021 onwards)
Korpus slovenščine kot tujega jezika (KOST 1.0) Slovene  Albanian, Bosnian, Chinese, Croatian, Czech, English, French, German, Greek, Hungarian, Italian, Japanese, Korean, Macedonian, Polish, Russian, Slovak, Serbian, Spanish, Ukrainian Written Essays (homework assignments and exams) Various 1,000,000 (6311 texts)

Mojca Stritar Kučuk
Email: mojca.stritarkucuk@ff.uni-lj.si

Available here: https://www.clarin.si/repository/xmlui/handle/11356/1753 
The L2 component of the Spoken Chinese Corpus Chinese (Putonghua used in mainland China) English (12 New Zealanders and two Australian who were native English speakers of non-Chinese ethnicity) Spoken Informal interaction (non-task/test settings) Intermediate to advanced 220,792  Lin Li Available via GitHub https://github.com/blculyn
Lancaster Corpus of Academic Written English (LANCAWE) English Various Written IELTS academic writing tests (descriptive and argumentative tasks); assignments – longitudinal data        
LANGSNAP Spanish and French English Spoken and written

Oral interviews, story retelling, argumentative writing 

  700,000 Nicole Tracy-Ventura & colleagues  
The Lang-8 Learner Corpora English Various Written Texts from Lang-8, a social networking site for language learning / / Toshikazu Tajiri, Mamoru Komachi (Nara Institute of Science and Technology, Japan) Available here
Learner Corpus of Latvian (LaVA) Latvian 35 different languages (German (37%), Swedish (11%), Finish (9%), Norwegian, Italian, Arabic, Turkish, Portuguese, Russian, Persian, Urdu, Spanish, Sinhala, French, Tamil, Hindi, Punjabi, Chinese, Flemish, Hebrew etc.) Written (handwritten texts) Student essays A1, A2 192K words Ilze Auziņa 

1) freely available on corpus website: https://lava.korpuss.lv/en/

2) noSketchEngine: 
http://nosketch.korpuss.lv/#dashboard?corpname=lava 

3) CLARIN-lv: 
http://hdl.handle.net/20.500.12574/42

The LeaP (Learning Prosody in a Foreign Language) Corpus English German Spoken Four types of speech styles were recorded: nonsense word lists, readings of a short story, retellings of the story, free speech in an interview situation Various c. 12h Ulrike Gut (Albert-Ludwigs-University Freiburg, Germany) The annotated corpus is available to the scientific community. Please contact Ulrike Gut at the University of Augsburg
The Learner Corpus of Engineering Abstracts (LCEA) English Malaysian Written Abstracts of the Computer and Communication Systems Engineering Final Year Projects Various c. 550,000
998 abstracts
Helen Tan (University Putra Malaysia, Malaysia)
Ain Nadzimah Abdullah (University Putra Malaysia, Malaysia)
Syamsiah bt Mashohor (University Putra Malaysia, Malaysia)
Chan Swee Heng (Taylor's University, Malaysia)
Available. Contact: Helen Tan
The Learner Corpus of English for Business Communication English Chinese Written Different types of business correspondence written for simulated business situations, including memos, faxes, reports, letters of enquiry and complaint letters   c. 117,500 Li Lan (Hong Kong Polytechnic University, Hong Kong) Searchable online
The Learner Corpus of Essays and Reports English Chinese Written Essays and project reports covering a range of topics from Science, IT and New Media to Nursing, Business and Economics, and the Social Sciences   c. 188,000 Sima Sengupta (Hong Kong Polytechnic University, Hong Kong) Searchable online
A Learners' Corpus of Reading Texts English French Spoken Unprepared reading of English texts (the texts are short abstracts of fiction or made-up dialogues) University students   Sophie Herment, Valérie Kerfelec, Laetitia Leonarduzzi, Gabor Turcsan (Aix-Marseille University, France) Freely available
The Longitudinal LEarner COrpus iN Italiano, Deutsch, English (LEONIDE) Italian
German
English
Italian
German
Written Written productions from secondary school pupils (narrative and opinion texts)   237,000   The Corpus can be queried via the ANNIS interface. It will be available for download on the Eurac Research Clarin Repository in summer 2021.
The LONGDALE (LONGitudinal DAtabase of Learner English) project English Various Spoken and written Range of text types/task types – longitudinal data From intermediate to advanced   Fanny Meunier (Centre for English Corpus Linguistics, Université catholique de Louvain, Belgium) Under development
The Longman Learners' Corpus English Various Written Essays and exam scripts Various c. 10 m. Longman Commercial
Learner of Persian Spoken Corpus (LoPSC) Persian English Spoken Informal conversations Upper-Intermediate c. 30,000 (ongoing)

Sepideh Daghbandan University of Edinburgh, UK) 

Please contact project director for access to the corpus
The Louvain International Database of Spoken English Interlanguage (LINDSEI) English Various Spoken Interviews and picture descriptions High-intermediate to advanced c. 800,000 Gaëtanelle Gilquin (Centre for English Corpus Linguistics, Université catholique de Louvain, Belgium) CD-Rom and handbook: order online
The MERLIN Corpus Italian
German
Czech
Various Written Various (informal and formal email/letter for different purposes, opinion text on different topics), based on standardised language tests   c. 340,000   The MERLIN Corpus can be queried via the ANNIS interface or downloaded on the Eurac Research Clarin Repository.
Mexican Learners Corpus MexLeC English Mexican Spanish Spoken Semi-structured interview on spare time, occupation, friends and family. Monologue: narratives and opinion questions A1-B1 Longitudinal (1st. Stage) Up to 200,000 (Under development) Abigahil Flores. Conacyt PostDoc Researcher / Pauline Moore. Universidad Autónoma del Estado de México Available soon at: MexLeC
Moroccan Learner English Corpus (MoLEC) English Various Written Argumentative essays Undergraduate EFL students 44,783 (185 texts) Ennaciri El Mehdi, Iabdounane Yassine  
Multilingual Academic Corpus of Assignments - Writing and Speech (MACAWS) Portuguese
Russian
15 languages (predominantly English and Spanish) Written and spoken Classroom assignments and exams organized by Macrogenre (e.g., Analysis, Description, Evaluation, Exposition, Narration) and Topic (e.g., Art, Culture, Literature, Family, Food, Future Plans, Trip) Beginner, intermediate and advanced 212,064 (in March 2020) Shelley Staples (University of Arizona, USA)
Aleksey Novikov
Adriana Picoral
Bruna Sommer-Farias
Open access after registratio
Multilingual Corpus of Second Language Speech (MuSSeL) Mandarin Chinese, French, Portuguese, Spanish Mainly English Spoken Recordings in response to the Interpersonal Listening/Speaking (ILS) section of ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) and ACTFL’s Oral Proficiency Interview by Computer (OPIc) Novice to Advanced 111,267 words (2,597 texts) collected from 152 learners (as of Nov 17, 2021) Fernando Rubio Publicly available via L2TReC and Talkbank
The Malaysian Corpus of Learner English (MACLE) English Malay Written       Gerry Knowles, Zuraidah Mohd. Don (University of Malay, Malaysia) /
The Malaysian Corpus of Students' Argumentative Writing (MCSAW) English Malay
Chinese
Indian
Written Argumentative essays Form 4 Form 5 College c. 565,500 Seyed Ali Rezvani Kalajahi, Jayakaran Mukundan (University Putra Malaysia, Malaysia) Available from developers
The Michigan Corpus of Academic Spoken English (MICASE) English Mainly L1 speakers but also includes data produced by L2 speakers Spoken Transcipts of academic speech events   c. 1,8 m. Ute Römer (University of Michigan, USA)
micase@umich.edu
Searchable online
The Michigan Corpus of Upper-level Student Papers (MICUSP) English Semi-balanced sample of native and non-native speakers of English Written ESP papers A-grade papers or ungraded papers that have been assessed and accepted (such as research proposals), but not published   c. 2,6 m. Ute Römer (University of Michigan, USA)
micase@umich.edu
Searchable online
The Montclair Electronic Language Database (MELD) English Various Written Student essays Various c. 100,000 Eileen Fitzpatrick, Milton S. Seegmiller (Monclair State University, USA) Contact Eileen Fitzpatrick
Includes error annotations
The Multimedia Adult ESL Learner Corpus (MAELC) English ESL environment Multimedia Video of classroom interaction and associated written materials Beginner to upper-intermediate   Stephen Reder, Kathryn Harris, Kristen Setzler (Portland State University, USA)
labschool@pdx.edu
The Lab School would like to share the extensive resources from MAELC with interested researchers and teacher trainers. Those interested should make inquiries to the Lab School by e-mail
The Neungyule Interlanguage Corpus of Korean Learners of English (NICKLE) English Korean Spoken and written Written part: student essays Spoken part: student interviews and oral speech tests transcriptions Mainly from beginning to intermediate Written: c. 890,000
Spoken: c. 100,000
Ji-Myoung Choi (Yonsei University, Souh Korea) The corpus will be available to the scientific community for research purposes upon request
The Japanese Learner English Corpus (NICT JLE) English Japanese Spoken English oral proficiency interview test Various 2 m. Emi Izumi, Kiyotaka Uchimoto, Hitoshi Isahara (National Institute of Information and Communications Technology, Japan) Freely available (downloadable)
The NOn-native Spanish corpus of English (NOSE) English Spanish Written Argumentative and descriptive student essays Intermediate and upper-intermediate c. 300,000 Ana Diaz-Negrillo (Universidad de Granada, Spain)  

The NORINT Corpus. The NORINT Corpus consists of three sub-corpora: NORINT Speech, NORINT Recited, NORINT Text  

Norwegian Various Spoken and written

NORINT Speech: interviews with and conversations between informants

NORINT Recited: the informants read out a short story as well as 60 non-contextualized sentences

NORINT Text: written language corpus comprising exam papers

B1 or higher

NORINT Speech: 103719 tokens

NORINT Recited: 36873 tokens

NORINT Text: 53247 tokens

Annely Tomson
https://www.hf.uio.no/iln/english/people/aca/norwegian-for-international-students/tenured/annelyt/index.html 
Glossa (search and post-processing tool) supports login with CLARIN and Feide. Contact the Text Laboratory (tekstlab-post@iln.uio.no) if you don’t have the possibility to access via Feide or CLARIN
The NUS Corpus of Learner English (NUCLE) English Several East Asian languages, predominantly Chinese Written Student essays on a wide range of topics including environmental pollution, healthcare, etc. Various c. 1 m. Hwee Tou Ng, Siew Mei Wu, Daniel Dahlmeier (National University of Singapore, Singapore) Freely available
The PELCRA Learner English Corpus (PLEC) English Polish Spoken and written Written: argumentative, descriptive, narrative and quasi-academic essays; formal letters From beginning to post-advanced Under development
Aim spoken: c. 200,000
Aim written: c.2,8 m.
Piotr Pęzik, Barbara Lewandowska-Tomaszczyk (University of Lodz, Poland) Online search engine and corpus analysis tools
The PICLE corpus (Polish component of ICLE) English Polish Written Student essays Advanced c. 330,000 Przemyslaw Kaszubski (Adam Mickiewicz University, Poland) Searchable online
The Polish Learner Corpus PoLKo Polish Various Written Essays, descriptions, argumentative essays, private and official letters, reviews, short messages, interviews etc. Various c. 8000 (26.03.21)
Under development
Adrian Jan Zasina (Charles University, Czech Republic)
Elżbieta Kaczmarska (University of Warsaw, Poland)
Available upon request
The Qatar learner corpus English Arabic (mostly from Qatar) Spoken Spoken interviews with Qatari learners of English     Yun Zhao Helen (Carnegie Mellon University, USA) Freely available
The Québec learner corpus English French (from Québec) Written Argumentative essays Intermediate and advanced c. 250,000 Tom Cobb (Université du Québec à Montréal, Canada) /
The Romanian Corpus of Learner English (RoCLE) English Romanian Written Student essays     Chitez Madalina (Zurich University, Switzerland)  
Russian Error-Annotated English Learner Corpus English Russian Written Examination essays of the kind similar to IELTS Task 1 and Task 2, with errors annotated manually Intermediate to Advanced c.800,000 by November 2017 and growing (together with the old part of the corpus less consistently annotated or not annotated, available at http://realec.org/index.xhtml#/, c. 2,000,000) Olga Vinogradova (National Research University Higher School of Economics, Russia) Freely available
The Russian Learner Translator Corpus (RusLTC) English
Russian
Russian Written Translations produced by trainee translators Trainee translators c. 1.5 m. tokens Andrey Kutuzov (University of Oslo, Norway)
Maria Kunilovskaya (Tyumen State University, Russia)
Freely available

The Santiago University Learner of English Corpus (SULEC)

English Spanish Spoken and written Written: compositions or argumentative essays
Spoken: semistuctured interviews, short oral presentations and brief story descriptions
Various Aim: c. 1 m. Ignacio M. Palacios Martínez (University of Santiago de Compostela, Spain) Available after registration

The Scientext English Learner Corpus

English French Written Academic argumentative texts   c. 1.1 m. scientext@u-grenoble3.fr (Université Stendhal/Grenoble-III, France) Searchable online
Second Language Research Tasks (SLRT) English Various Written and spoken Written paragraphs Various oral tasks Various c. 300,000 Bill Crawford (Northern Arizona University, USA)
Kim McDonough (Concordia University, Canada)
Under development
The Seoul National University Korean-speaking English Learner Corpus (SKELC) English Korean Written Student essays Various c. 900,000 Heokseung Kwon (Seoul National University, South Korea) /
The SILS Learner Corpus of English English Various (mainly Japanese) Written Student essays Basic, intermediate and advanced c. 3.2 m. (first and second drafts included) Victoria Muehleisen (Waseda University, Japan)  
The Soochow Colber Student Corpus (SCSC) English Chinese Written Student essays   c. 227,000 Colman Bernath (Soochow University, Taiwan)  
The Spoken and Written English Corpus of Chinese Learners (SWECCL) English Chinese Written (WECCL) and spoken (SECCL) Written: argumentative and narrative essays
Spoken: National Spoken English Test – longitudinal data
  c. 2 m. Wei Qiufang, Liang Maocheng, Wang Lifei (Beijing Foreign Studies University, China) CD-rom
The Taiwanese Corpus of Learner English (TLCE) English Chinese Written Journals and essays (descriptive, narrative, expository, argumentative) Intermediate to advanced c. 2 m. Rebecca Hsue-Huch Shih (Sun Yat-sen University, Taiwan)  
The Tawainese learner academic writing corpus (TaiwanLAWC) English Chinese Written Theses and dissertations written by Taiwanese graduate students.     Howard Chen (National Taiwan Normal University, Taiwan)  
The TELEC Secondary Learner Corpus (TSLC) English Chinese Written and spoken Compostions from secondary classroom   c. 2 m. Quentin Allan (University of Hong Kong, Hong Kong)  
The Telecollaborative Learner Corpus of English and German Telekorp English German Written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005   c. 1,5 m. Julie Belz (Pennsylvania State University, USA) Not publicly available
The Ten-Thousand English Compositions of Chinese Learners (TECCL) English Chinese Written Essays (various topics) written in and after class, and in testing context. Also contains some collaborative writing samples Various (mainly undergraduates) c. 1.8 m. Jiajin Xu (Beijing Foreign Studies University, China) Raw texts and part-of-speech tagged texts are available
Tracking Written Learner Language (TRAWL) Multilingual:
English
French
German
Spanish
Norwegian Writing Texts written as part of regular class work (tests, in-school writing, homework) Longitudinal corpus (beginners/advanced)   Hildegunn Dirdal (University of Oslo, Norway)  
Trinity Lancaster Learner Spoken Corpus English Various Spoken

Presentation
Interactive task
Discussion
Conversation

B1-C2 c. 4 million Dana Gablasova, Vaclav Brezina  
The Tswana Learner English Corpus (TLEC) English Tswana Written Argumentative essays Advanced c. 200,000 Bertus Van Rooy (University of Amsterdam, Netherlands) Available in ICLE
The Undergraduate Learner Translator Corpus (ULTC) Bidirectional:
English-Arabic
French-Arabic
Bidirectional:
English-Arabic
French-Arabic
Arabic is the native language of the learners and the main target language
Written and spoken Translations produced by learners of translation from and into Arabic and a reference subcorpus of published translations From beginners to advanced levels Under development Reem Alfuraih (Princess Nora bint Abdul Rahman University, Saudi Arabia) Available via https://arabicparallelultc.com/
The Uppsala Student English Corpus (USE) English Swedish Written Student essays Various c. 1,200,000 Ylva Berglund Prytz, Margareta Westergren Axelsson (Uppsala University, Sweden) The corpus can be used for research and educational purposes. It can be accessed on the Internet from the Oxford Text Archive.
The Uppsala WordReference Corpus English
Spanish
French
Italian
Various Written Forum posts   English learner subcorpus: 38 m., English native subcorpus: 50 m., Spanish learner subcorpus: 5 m., Spanish native subcorpus: 22 m., French learner subcorpus: 4 m., French native subcorpus: 7 m., Italian learner subcorpus: 1 m., Italian native subcorpus: 3 m. Aleksandrs Berdicevskis (Uppsala University, Sweden) Freely available
The UPF Learner Translation Corpus English Catalan Written Translations written by the students of the Translation and Interpreting degree at UPF   c. 200,000 Anna Espunya (Pompeu Fabra University, Spain)  
The UPV Learner Corpus English Catalan Written Essays Various c. 150,000 Angeles Andreu Andrés (Universitat Politècnica de València, Spain)  
The Varieties of English for Specific Purposes dAtabase learner corpus (VESPA) English Various Written ESP texts (term papers, reports, MA dissertations) Various c. 220,000 (under development) Magali Paquot (Centre for English Corpus Linguistics, Université catholique de Louvain, Belgium)  
The Written Corpus of Learner English corpus (WriCLE) English Spanish Written Essays Various c. 750,000 Paul Rollinson (Universidad Autonoma de Madrid, Spain)  
The Yonsei English Learner Corpus (YELC) English Korean Written Yonsei University English Diagnostic Tests (Part 1: descriptive task, max. 100 words; Part 2: argumentative tast, max. 300 words) 9 levels (A1, A1+, A2, B1, B1+, B2, B2+, C1, C2) c. 1 m. Seok-Chae Rhee (Yonsei University, South Korea), CK Jung (Incheon National University, South Korea) The YELC corpus will be available to the scientific community for research purposes from 31 March 2012
The Young Learner Corpus of English (YOLECORE) English Greek Spoken Pedagogic Corpus of video-recorded EFL language classes   170 school hours (126 hours of videotaped material)
1,5 m. types
Marina Mattheoudakis, Thomas Zapounidis (Aristotle University of Thessaloniki, Greece)  
The Estonian Interlanguage Corpus of Tallinn University (EIC) Estonian Russian
Finnish
English
German
Latvian
Lithuanian
Ukrainian
Belorussian
Written Spontaneously produced texts in language learning situations: argumentative and literary essays, written stories, letters, term papers, reading reports. A1-C2 c. 1 m. Pille Eslon (Tallinn University, Estonia)  
Linguistic Basis of the Common European Framework for L2 English and L2 Finnish (CEFLING) Finnish
English
Various Written Various Various   Maisa Martin (University of Jyväskylä, Finland)  
Paths in Second Language Acquisition (TOPLING) Finnish
English
Swedish
Various Written Various Various   Maisa Martin (University of Jyväskylä, Finland) Available (see here for instructions on how to access the corpora)
The Advanced Finnish Learner Corpus (LAS2) Finnish Russian
Czech
Swedish
Estonian
Lithuanian
Komi
English
Hungarian
German
Icelandic
Japanese
Written Exam essays, theses, essays and writings Advanced c. 630,000 Kirsti Siitonen, Ilmari Ivaska (University of Turku, Finland) Available
The Finnish National Foreign Language Certificate Corpus (YKI) Finnish English
Finnish
French
German
Italian
Lappish (Sami)
Spanish
Swedish
Russian
Written and spoken Various Beginner, intermediate and advanced   Ari Maijanen, Tiina Lammervo (University of Jyväskylä, Finland) Available with user ID and Password
The International Corpus of Learner Finnish (ICLFI) Finnish Various Written Finnish learners’ spontaneously produced texts in language learning situations, large variety of text types Beginner, intermediate and advanced Under development Jarmo Harri Jantunen (University of Oulu, Finland) Free download after applying for a user licence
The Chy-FLE (Cypriot Learner Corpus of French) French Modern Greek (and Cypriot Greek) Written Argumentative and descriptive essays From intermediate to advanced c. 250,000 (under development) Freiderikos Valetopoulos (Université de Poitiers, France, in collaboration with the University of Cyprus)  
The COREIL corpus French English   Spoken       Elisabeth Delais-Roussarie, Hiyon Yoo (Université Paris-Diderot, France)  
The "Dire Autrement" corpus French (Second Language) Mainly L1 speakers of English Written Narrative, injunctive, persuasivle and informative texts   c. 50,000 Marie-Josée Hamel, Jasmina Milićević (Dalhousie University, Canada) Available after registration
French Interlanguage Database (FRIDA) French Various Written Free compositions: desciptive, argumentative and narrative texts, news & mail Intermediate   Sylviane Granger (Centre for English Corpus Linguistics, Université catholique de Louvain, Belgium)  
French Learner Language Oral Corpora (FLLOC) French Various Spoken See description of the 7 corpora Various   Florence Myles (Newcastle University, UK)
Rosamund Mitchell (University of Southampton, UK)
The contents of the database are being made freely available to the research community, in the form of digital sound files and related transcripts formatted using CHILDES software. Searchable online
The InterFra corpus French Swedish Spoken Interviews, retellings of video clips and picture stories Various   Inge Bartning (Stockholm University, Sweden)
interfra@fraita.su.se
Available
The "Interphonologie du Français Contemporain" corpus (IPFC) French Cypriot
Greek
Dutch
English (Canada)
German
Japanese
Norwegian
Spanish
Spoken Reading aloud, repeating words, guided interviews, interactions between two learners Various Under development Sylvain Detey (Waseda University, Japan; Université de Rouen, France)
Isabelle Racine (Université de Genève, Switzerland)
Yuji Kawaguchi (Tokyo University of Foreign Studies, Japan)
Under development; samples available
The Learner Corpus French (LCF) French Dutch Written Argumentative essays, informative texts, journalistic texts, formal letters, summaries, written compositions by Flemish students of French Intermediate to advanced c. 500,000 Hans Paulussen (K.U.Leuven/Ugent/Lessius, Belgium) Under development
The Lund CEFLE Corpus (Corpus Écrit de Français Langue Étrangère) French Swedish Written Descriptive and narrative essays; picture-based stories Various c. 100,000 Malin Ågren (Lund University, Sweden) A sub-part of the corpus is available online.
The University of the West Indies learner corpus (UWi) French English
Jamaican
Creole
Spoken Conversations during oral exams and in informal contexts Various   Hugues Peters (University of New South Wales, Australia) Corpus is available freely here (last updated 2017)
Comasan Labhairt ann an Gàidhlig (CLAG) Gaelic Adult Proficiency (GAP) Gaelic Various Spoken Conversation task Narrative Elicited oral imitation task Question and answer activity Various   Roibeard Ó Maolalaigh, Nicola Carty (University of Glasgow, UK)  
The AleSKO corpus German Chinese
Also German L1 data from the FALKO corpus
Written Argumentative essays   c. 13,600 Heike Zinsmeister (University of Konstanz, Germany)
Margrit Breckle (Vilnius Pedagogical University, Lithuania)
 
Analyzing Discourse Strategies: A Computer Learner Corpus German English (mainly American English) Written Threaded discussion, chat, essays – longitudinal data From beginner to intermediate-mid Under development Christina Frei, Edward Nixon (University of Pennsylvania, USA)  
The Corpus of Learner German (CLEG13) German English Written Argumentative, free compositions
Longitudinal over 4 years, undergraduate students
Intermediate to advanced c. 320,000 Ursula Maden-Weinberger (Edge Hill University, UK) Online access through the FALKO platform. The corpus is also available as txt files to the scientific community. Please contact U. Maden-Weinberger at uschi@miralis.co.uk
The deL1L2IM corpus German Russian-Belorussian bilinguals Written Instant messaging dialogues Advanced c. 52,000 Sviatlana Höhn (University of Luxembourg, Luxemburg) Available
The Fehlerannotiertes Lernerkorpus (‘error annotated learner corpus’) (FALKO) German Learner subcorpus: various
Native subcorpus: German
Written 1. Summaries
2. Essays
3. Letters, fiction writing, journal articles, book reviews (= longitudinal data from American learners)
1. Advanced
2. Advanced
3. Beginners - advanced
1. c. 40,000 (learner subcorpus) + c. 20,000 (native subcorpus)
2. c. 150,000 (learner corpus) + c. 70,000 (native subcorpus)
3. c. 78,000 (learner subcorpus)
Anke Lüdeling, Maik Walter (Humboldt-Universität zu Berlin, Germany)
falko-korpus@hu-berlin.de
Online access
The KOLIPSI corpus German Italian Written Two written language production tasks of a standardized test (email/letter) A2-C1 Under development Andrea Abel, Aivars Glaznieks (European Academy Bolzano/Bozen, Italy)  
The Learning the Prosody of a Foreign Language (LeaP) German Various Spoken The LeaP corpus covers four different types of speech: read speech, prepared speech, free speech, nonsense word lists Various 62 speakers Ulrike Gut (University of Augsburg, Germany) The annotated corpus is available to the scientific community. Please contact Ulrike Gut at the University of Augsburg Manual
The LeKo (Lernerkorpus) corpus German         c. 55,000 Anke Lüdeling (Humboldt-Universität zu Berlin, Germany) Online access (password protected) Register here
The LINCS Corpus 1. German
2. German
3. German
1. English
2. German
1. Written
2. Written
3. Written
1. Essays, examination, answers (longitudinal and cross-sectional data)
2. Essays
3. Teaching output
1. Intermediate to Advanced
2. Advanced
Under development Elizabeth Thoday (Heriot-Watt University Edinburgh, UK) Not currently publicly available
Multilingual Platform for the European Reference Levels: Exploring Interlanguage in Context (MERLIN) German
Italian
Czech
Various Written Writing tasks from standardized tests (telc/UJOP) A1 to C1 c. 280,000 Katrin Wisniewski (Leipzig University, Germany) Available
Rhodes University Deutsch als Fremdsprache (RUDaF) German English
Afrikaans
isiXhosa
XiTsonga
Written Short descriptive and argumentative writing paragraphs (300 words each) A2-B2 34,000 Gwyndolen Ortner, Undine S. Weber (Rhodes University, South Africa) Not available
The Telecollaborative Learner Corpus of English and German Telekorp German English Written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005   c. 1,5 m. Julie Belz (Pennsylvania State University, USA) Not publicly available
The Langman corpus Hungarian Chinese Spoken Interviews conducted in 1994 with 11 Chinese immigrants living in Hungary. Interviews focused on issues related to their arrival in Hungary as well as their daily life activities     Juliet Langman (University of Texas at San Antonio, USA) Freely available
Corpus di Apprendenti di Italiano L2 (CAIL2) Italian Various Written Essays Intermediate to advanced c. 237,000 Stefania Spina (Università per Stranieri di Perugia, Italy)  
Corpus parlato di italiano L2 Italian English
German
Japanese
Spoken Transcriptions of interviews Various   Stefania Spina, Silvio Pazzaglia, Mirco Perini (Università per Stranieri di Perugia, Italy)  
The KOLIPSI corpus Italian German Written Two written language production tasks of a standardized test (email/letter) A2-C1 Under development Andrea Abel (European Academy Bolzano/Bozen, Italy)  
The Lexicon of Spoken Italian by Foreigners (LIPS) Italian Various Spoken Proficiency exams of the Certification of Italian as a Foreign Language (CILS) A1-C2 c. 700,000 Francesca Gallina (Università per Stranieri di Siena, Italy) Freely available
MISTiC (Multiple Italian Student TranslatIon Corpus) Italian English
French
Written Translations produced by trainee translators (mainly specialised texts) Post-graduate trainee translators ca. 125,000 (English-Italian), ca. 50,000 (French-Italian) Sara Castagnoli (University of Macerata, Italy) Not available
Varietà di Apprendimento della Lingua Italiana: Corpus Online (VALICO) Italian Various Written   Various c. 570,000 Manuel Barbera, Carla Marello, Elisa Corino (University of Turin, Italy)  
Longitudinal Corpus of Chinese Learners of Italian (LOCCLI) Italian Chinese Written Essays Beginners and pre-intermediate 97,000 Stefania Spina (Università per Stranieri di Perugia, Italy)
Anna Siyanova-Chanturia (Victoria University of Wellington, New Zealand)
It is freely searchable via CQPweb (registration required) from https://www.unistrapg.it/cqpweb/
Corpus of Chinese Learners of Italian (COLI) Italian Chinese Written and spoken Essays and answers to open questions, interviews Intermediate and advanced 82,300 Stefania Spina (Università per Stranieri di Perugia, Italy)  
The Korean learner corpus Korean Various Written Various: letters, essays, formal writing, etc. Beginner and intermediate c. 10,000

Jungyeul Park (University of British Columbia, Vancouvert)
Jung Hee Lee (Kyung Hee University)

Available through Jungyeul Park's GitHub: https://github.com/jungyeul/korean-learner-corpus
ESAM Latvian and Lithuanian Latvian and Lithuanian Written   Beginner 52,000 Inga Znotiņa (Rīga Stradiņš University, Latvia) Available online
The ASK corpus Norwegian German
Dutch
English
Spanish
Russian
Polish
Bosnian-Croatian-Serbian
Albanian
Vietnamese
Somali
Written Essays from language tests B1 and B2   Kari Tenfjord (University of Bergen, Norway) Apply for a licence here
The Persian Learner Corpus (PLC) Persian (Farsi) Various Written Narratives and essays Intermediate and advanced Academic/Restricted online access Saeed Safari (University of Belgrade, Serbia)  
The Salam Farsi Learner Corpus (SFLC) Persian (Farsi) Serbian Written Narratives, descriptive essays Beginner and upper-intermediate Under development Saeed Safari (University of Belgrade, Serbia) Academic, under development
Learner Corpus of Portuguese L2 (COPLE2) Portuguese 15 languages: Chinese, English, Spanish, German, Russian, French, Japanese, Italian, Dutch, Tetum, Arabic, Polish, Korean, Romanian and Swedish Written and spoken Exams and assignments A1-C1 Written: 171,461
Oral: 25,783
Iria del Río (Universidade de Lisboa, Portugal) Available
Russian Learner Corpus Russian Varied Written and spoken Academic and non academic Teachers and heritage speakers Unknown Ekaterina Rakhilina (National Research University Higher School of Economics, Russia) Available online
The University of Pittsburgh English Language Institute Corpus (PELIC) English 30 languages Written (spoken data to be released in the future) Variety of General English and EAP tasks and text types Pre-Intermediate to Advanced 4.2 m. Alan Juffs (University of Pittsburgh, USA) Publicly available on GitHub
                 
The Anglia Polytechnic University (APU) Learner Spanish Corpus Spanish Various Written     c. 120,000 Anne Ife (Anglia Ruskin University, UK)  
Aprescrilov ("Aprendera Escribiren Lovaina") Spanish Dutch Written Written assignments and tests; several text types (letters, expository, descriptive, argumentative, narrative) A1 to C1 c. 1 m. Kris Buyse (KU Leuven, Belgium) Restricted online access
The Corpus de aprendices de español (CAES) Spanish Various Written   A1 to C1 c. 575,000 CAES team (Universidade de Santiago de Compostela, Spain) Online access
Corpus Escrito del Español L2 (CEDEL2 version 2.0) Spanish English
German
Dutch
French
Portuguese
Italian
Greek
Russian
Japanese
Chinese
Arabic
Written (and some spoken) Written (and some spoken) compositions by learners of Spanish All proficiency levels (lower beginner to upper advanced) 1,105,936 words coming from 4,399 participants Cristobal Lozano (Universidad de Granada, Spain) Downloadable/browsable via the CEDEL2 webpage: http://cedel2.learnercorpora.com/
Corpus de textos escritos para el análisis de errores de aprendices de E/LE (CORANE) Spanish Various Written Essays A2 to C1 / Ana M. Cestero Mancera, Inmaculada Penadés Martínez (Universidad de Alcalá Henares, Spain) CD-ROM available
The Corpus of Taiwanese Learners of Spanish (Corpus de Aprendices Taiwaneses de Español) (CATE) Spanish Chinese Written Student essays Various c. 340,000 hclu@mail.ncku.edu.tw (National Cheng Kung University, Taiwan) Under development
The DIAZ corpus Spanish German
Swedish
Icelandic
Korean
Chinese
Spoken Semi-spontaneous (structured interviews) and experimental (structured questionnaires) Adult Spanish L2/L3 oral data Various   Lourdes Díaz Rodríguez (Universitat Pompeu Fabra, Spain)  
The Japanese learner corpus of Spanish Spanish Japanese Written Student essays   c. 87,000 Yoshihito Kamakura (University of Birmingham, UK) Online access
The Spanish Corpus Proficiency Level Training (SPT) Spanish English (heritage language learners) Spoken Dialogues about a given set of questions Beginner to advanced   Dale Koike (University of Texas at Austin/Liberal Arts Instructional Technology Center, USA) Videos are available
Spanish Learner Language Oral Corpus (SPLLOC) Spanish English Spoken Learner narratives, interviews and picture description tasks Beginner to advanced c. 50,000 Laura Domínguez (University of Southampton, UK) Searchable online Data freely available for download
Spanish Learner Oral Corpus Spanish Various (9+ languages, especially Portuguese, French, Italian) Spoken Semi-spontaneous interviews, narrative and descriptive tasks A2-B1 c. 50,000 Leonardo Campillos Llanos (Universidad Autonoma de Madrid, Spain) Online access
The Tartu Learner Corpus of Spanish as a L3+ Spanish Estonian Written Academic research writing Advanced c. 885,000 Mari Kruse (University of Tartu, Estonia)  
The ASU corpus Swedish Chinese
English
German
Greek
Polish
Portuguese
Spanish
...
Spoken and written Transcribed audio-recorded conversations and written texts from adult learners of Swedish – longitudinal data   c. 490,000 (c. 415,000 spoken and c. 75,000 written) Björn Hammarberg (Stockholm University, Sweden) Available
Leiden Learner Corpus Multilingual:
Dutch
French
Italian
Portuguese
Spanish)
Various Written and spoken Written data: short essays; oral data: picture-based story telling Various 200 participants M. Carmen Parafita Couto (University of Leiden, Netherlands)  
The European Science Foundation Second Language Database (ESF database) Multilingual:
Dutch
English
French
German
Swedish
Punjabi
Italian
Turkish
Arabic
Spanish
Finnish
Spoken Spontaneous second language acquisition of forty adult immigrant workers living in Western Europe, and their communication with native speakers in the respective host countries Various   Wolfgang Klein, Clive Perdue (Max Planck Institut, Netherlands) Freely available
The Foreign  Language Examination Corpus (FLEC) Multilingual Polish Written Data from the Warsaw University Certification Exams Various Under development Piotr Banski, Romuald Gozdawa-Golebiowski (Warsaw University, Poland)  
The MeLLANGE Learner Translator Corpus (LTC) Multilingual Various Written Legal, technical, administrative and journalistic texts Trainee translators   Natalie Kübler (Université Paris Diderot, France)
mellange_p7@eila.univ-paris-diderot.fr
Searchable online
The MiLC Corpus Multilingual:
Catalan
English
French
Spanish
Catalan Written Formal and informal letters, summaries, curriculum vitae, essays, reports, translations, synchronous and asynchronous communication exchanges, business letters   c. 150,000 Angeles Andreu Andrés (Universidad Polytecnica de Valencia, Spain)  
The Multilingual Learner Corpus (MLC) Multilingual:
English
German
Italian
Spanish
Brazilian Portuguese Written Argumentative and marrative essays   Aim: c. 200,000 Stella E.O. Tagnin (University of São Paulo, Brazil) Accessible online to registered researchers
The Padova Learner Corpus Multilingual:
English
French
Spanish
Italian CMC (Computer-Mediated Communication) Student work produced in blended language courses using FirstClass conferencing software. Variety of genres: diaries, debate contributions, formal reports, résumés, etc. Longitudinal data   Under development Fiona Dalziel, Francesca Helm (University of Padua, Italy)  
The corpus PARallèle Oral en Langue Etrangère (PAROLE) Multilingual:
English
French
Italian
(Mainly L2 speakers but also includes data produced by L1 speakers)
Various Spoken 5 oral production tasks Various   Heather Hilton, John Osborne, Marie-Jo Derive, Nejma Succo, Jean O'Donnell, Sandra Billard, Sandrine Rutigliano-Daspet (Université de Savoie, France)  
The University of Toronto Romance Phonetics Database (RPD) Multilingual:
English
French
Italian
Portuguese
Romanian
Spanish
Various (including English, Mandarin, Russian, Spanish, etc.) Spoken Elicited production: sentence and passage reading, story narration, description of favourite meal Various   Laura Colantoni, Jeffrey Steele (University of Toronto, Canada) Password available from directors

  

Learner corpus-based datasets

  

Corpus Target language First language Medium Text type / task type Proficiency level Size in words Project director Availability
The Treebank of Learner English (TLE) English Various Written Sentences from the CLC FCE (annotated with syntactic trees) Upper-intermediate 97,681 (5,124 sentences) Yevgeni Berzak Publicly available through the UD repository ('English-ESL')
VALICO-UD Italian German, English, French, Spanish  Written Comic strip elicited texts From first year to forth year of study of Italian (various proficiencies) 6,784 (learner texts) + 6,832 (target hypotheses) Elisa Di Nuovo, Cristina Bosco, Elisa Corino Released in the Universal Dependencies repository