Learner corpora around the world

This list is very much work in progress. We would like it to be as comprehensive as possible. If you have a learner corpus or know of one that is not listed on this webpage, send a message to or and we'll add it to the list. We hope you will find the list useful for your research!
Corpus Target
language
First
language
Medium Text type/ task type Proficiency level Size
in words
Project director Availability
The Arabic Learner Corpus
(ALC)
Arabic 66 languages written and spoken / /

written: c. 283,000

audio: c. 3h30

Abdullah Alfaifi & Eric Atwell

available
The Pilot Arabic Learner Corpus Arabic English written narrative intermediate and advanced c. 9,000 Ghazi Abuhakema
Reem Faraj
Anna Feldman

Montclair State University, USA
 
The AKCES/CZESL corpus (Akvizièni korpusy èetiny - Acquisition corpora of the Czech language/Czech as a second language Czech various written and spoken student essays
interviews
various 2 m
Charles University in Prague
Technical University in Liberec, Czech Republic
under development
Leerdercorpus Nederlands als Vreemde Taal Dutch French written      
Université catholique de Louvain, Belgium
 
The Advanced Learner English Corpus
(ALEC)
English mainly Swedish written Essays written by university students of English linguistics and English literature advanced 1.3 m. Tove Larsson, Uppsala University Not freely available
The ANGLISH corpus English French spoken Readings of texts and sentences, spontaneous oral language. various  
University of Provence, France.
 Freely available
Asao Kojiro’s Learner Corpus Data English Japanese written Essays and stories written or reproduced by Japanese college students.     Texts available for download
The Barcelona English Language Corpus (BELC) English Spanish
Catalan
spoken and written

4 tasks:
Written composition
Oral narrative
Oral interview
Role-play

Longitudinal data (children and young adults learning English)

   
University of Barcelona, Spain
 
The BATMAT Corpus English Swedish
Finnish
written BA dissertations
MA dissertations
Advanced c. 2.5 m (expanding) , English language and literature, Åbo Akademi University, Finland Under development
The Bilingual Corpus of Chinese English Learners (BICCEL) English Chinese spoken and written

Spoken: National Oral English test.

Written: in-class assignments

  c. 2 m
National Research Center for Foreign Language Education Beijing Foreign Studies University, China
 
The Br-ICLE corpus (Brazilian component of ICLE) English Brazilian Portuguese written Argumentative and literary essays    
Catholic University of São Paulo

University of São Paulo, Brazil
Restricted online access
The British Academic Written English (BAWE) corpus English Mainly L1 speakers but also includes data produced by L2 speakers written ESP papers

4 levels of study (from undergraduate levels to final year and taught masters level)

 

c. 6,5 m
Sheena Gardner
Warwick, UK

University of Birmingham, UK
Paul Wickens
Oxford Brookes, UK

The BAWE corpus can be accessed through the corpus analysis interface, Sketch Engine.


prototype interface that allows filtered searching of the BAWE corpus files is available.

The BUiD Arab Learner Corpus (BALC) English Arabic written School examination essays various 287,227
The British University in Dubai,
United Arab Emirates

University of Birmingham, UK
At present, copies of the current version of the corpus is available on request from
The Cambridge Learner Corpus (CLC) English various written Exam scripts various c. 25 m - still expanding Cambridge University Press and Cambridge ESOL, UK commercial
The Corpus of Academic Learner English (CALE) English German written Various academic text types that are typically produced in university courses of English, e.g. term papers, reading reports, research plans, abstract, reviews, and summaries. advanced under development
Johannes-Gutenberg Universität Mainz, Germany
 
The Corpus of English Essays Written by Asian University Students (CEEAUS) English various written Student essays various c. 200,000
Kobe University, Japan
Freely downloadable from the website
The Chinese Academic Written English (CAWE) corpus English Chinese written Dissertations written by Chinese undergraduates majoring in English linguistics or applied linguistics.   407,960
City University of Hong Kong, Hong Kong
 
The Chinese Learner English Corpus (CLEC) English Chinese written   various 1 m Gui Shichun
Guangdong University of Foreign Studies & Yang Huizhong, Shanghai Jiatong, China
The corpus can only be accessed by users in the Department of English at HKPU.
The City University Corpus of Academic Spoken English (CUCASE) English Chinese (but also includes data produced by L1 speakers) multimedia     2 m
City University of Hong Kong, Hong Kong
 
The Cologne-Hanover Advanced Learner Corpus (CHALC) English German written term papers and essays advanced c. 210,000
University of Michigan, USA
 
College Learners’ Spoken English Corpus (COLSEC) English Chinese spoken National spoken English test for non-English majors.   700,000 Yang and Wei  
The Corpus Archive of Learner English in Sabah/Sarawak (CALES) English Malay written Argumentative essays various c. 400,000 Simon Botley@Faizal Hakim
Doreen Dillah
Universiti Teknologi MARA Sarawak, Malaysia
 
The Corpus of Young Learner Interlanguage (CYLIL) English

various:

Dutch
French
Greek
Italian

spoken English L2 data elicited from European School pupils.
Longitudinal data
various c. 500,000
Vrije Universiteit Brussel, Belgium
 
The Eastern European English learner corpus English Russian
Ukrainian
Polish
Slovak
spoken Spontaneaous spoken production data elicited by means of a semi-structured interview various c. 60,000
Eberhard Karls University of Tübingen, Germany
 
The EFL Teacher Corpus (ETC) English Korean
 
spoken Teacher talks in language classrooms Upper-intermediate to advanced 123,000
Eun-Joo Lee
under development
The English of Malaysian School Students corpus (EMAS) English Malay written Student essays various c. 500,000 et al.
Universiti Putra Malaysia, Malaysia
 
The English Speech Corpus of Chinese Learners (ESCCL) English Chinese spoken Dialogue reading-aloud Middle school and college   Chen Hua
Nantong University, China
Wen Qiufang
Beijing Foreign Studies University, China
Li Aijun
Chinese Academy of Social Sciences, China
 
The ETS Corpus of Non-Native Written English English 11 languages written 12,100 TOEFL English essays / / Daniel Blanchard Information avout the score level is available for each essay
The EVA Corpus of Norwegian School English English Norwegian spoken Picture-based tasks  / 35,000 Angela Hasselgren
University of Bergen, Norway
Searchable online
The Gachon Learner Corpus English Korean
(+ a few Chinese & Spanish speaking students) 
written Written Journal Assignments Lower intermediate 1,277,077 (ongoing) Brian Carlstrom Freely available
The GICLE corpus (German component of ICLE) English German written Mainly non-academic argumentative essays advanced c. 234,000    
The Giessen-Long Beach Chaplin Corpus (GLBCC) English German spoken Transcribed interactions between native English speakers, ESL and EFL speakers   350,000 Andreas Jucker
Sara Smith
University of Giessen, Germany
Restricted use: apply for approval to get a copy.
The Hong Kong University of Science & Technology (HKUST) learner corpus English Chinese - mostly Cantonese written Untimed assignments written for EFL courses and school leaving exams University and advanced high school students 25 m
Hong Kong University of Science &Technology, Hong Kong
 
The Indianapolis Business Learner Corpus (IBLC) English various written Job application letters and résumés of business communication students from the U.S., Belgium, Finland, Germany, and Thailand, spanning the years 1990-1998    

Thomas Albin Upton
Indiana University, USA
 
The International Corpus of Crosslinguistic Interlanguage (ICCI) English various written Essays (20-min in-class tasks without the use of a dictionary)  beginner to lower-intermediate   Yukio Tono
Tokyo University of Foreign Studies, Japan
Publicly available
The International Corpus Network of Asian Learners of English (ICNALE) English Chinese
Indonesian
Japanese
Koren
Malay
etc.
written Short argumentative essays (topic, time, length and dictionary use are all controlled) various 300,000 (estimated goal: 1 m)
Kobe University, Japan
Freely available
The International Corpus of Learner English (ICLE) English various written Argumentative and literary essays High-intermediate to advanced 3 m
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom: order online.
The International Teaching Assistants corpus (ITAcorp) English various spoken Learner language from a variety ofspoken classroom tasks: office hours role plays, presentations, discussions   c. 500,000


Pennsylvania State University, USA
 
The ISLE speech corpus English German
Italian
spoken Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). Intermediate   CD-Rom
The Israeli Learner Corpus of Written English English Hebrew written Argumentative and descriptive essays   c. 750,000
Kibbutzim College of Education, Israel
 
The Japanese English as a Foreign Language Learner (JEFLL) Corpus English Japanese written Student essays From beginning to intermediate c. 700,000

Yukio Tono, Meikai University, Japan

The JEFLL Corpus will be freely available for research, first via the web query system (already available in Japanese) and then the entire data will be distributed under license in the future.
The Janus Pannonius University (JPU) Corpus English Hungarian written Essays and research papers University students c. 500,000
University of Pécs, Hungary
Searchable online
Lancaster Corpus of Academic Written English (LANCAWE) English various written IELTS academic writing tests (descriptive and argumentative tasks); assignments.
Longitudinal data.
       
The Lang-8 Learner Corpora English various written texts from Lang-8, a social networking site for language learning / / Toshikazu Tajiri & Mamoru Komachi Available
The LeaP Corpus :Learning Prosody in a Foreign Language English German spoken Four types of speech styles were recorded:
- nonsense word lists
- readings of a short story
- retellings of the story
- free speech in an interview situation
various  
Albert-Ludwigs-University Freiburg, Germany
The annotated corpus is available to the scientific community. Please contact at the University of Augsburg.
The Learner Corpus of Engineering Abstracts
(LCEA)
English Malaysian written Abstracts of the Computer and Communication Systems Engineering Final Year Projects various

c. 550,000

998 abstracts

Helen Tan, University Putra Malaysia

Chan Swee Heng

Ain Nadzimah

Syamsiah bt Mashohor

Available
The Learner Corpus of English for Business Communication English     Different types of business correspondence written for simulated business situations, including memos, faxes, reports, letters of enquiry and complaint letters.   c. 117,500
Hong Kong Polytechnic University, Hong Kong
Searchable online
The Learner Corpus of Essays and Reports English     Essays and project reports covering a range of topics from Science, IT and New Media to Nursing, Business and Economics, and the Social Sciences.   c. 188,000

Sima Sengupta
Hong Kong Polytechnic University, Hong Kong

 

Searchable online
A Learners' Corpus of Reading Texts English French spoken Unprepared reading of English texts.
The texts are short abstracts of fiction or made-up dialogues.
    Sophie Herment
Valérie Kerfelec
Laetitia Leonarduzzi
Gabor Turcsan
Freely available
The LONGDALE project: LONGitudinal DAtabase of Learner English English various spoken and written Range of text types/task types.
Longitudinal data.
From intermediate to advanced  
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
under development
The Longman Learners' Corpus English various written Essays and exam scripts various c. 10 m Longman commercial
The Louvain International Database of Spoken English Interlanguage (LINDSEI) English various spoken Interviews and picture descriptions High-intermediate to advanced c. 800,000
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom: order online
The Malaysian Corpus of Learner English (MACLE) English Malay written       Gerry Knowles
Zuraidah Mohd. Don
University of Malay, Malaysia
 
The Malaysian Corpus of Students' Argumentative Writing (MCSAW) English Malay
Chinese Indian
written Argumentative essays

 Form 4
Form 5
College

c. 565,500



University Putra Malaysia

 Available from developers
The Michigan Corpus of Academic Spoken English (MICASE) English Mainly L1 speakers but also includes data produced by L2 speakers spoken Transcipts of academic speech events   c. 1,8 m

Ute Römer
University of Michigan, USA

Searchable online
The Michigan Corpus of Upper-level Student Papers (MICUSP) English semi-balanced sample of native and non-native speakers of English written ESP papers
A-grade papers or ungraded papers that have been assessed and accepted (such as research proposals), but not published
  c. 2,6 m

Ute Römer
University of Michigan, USA

Searchable online
The Montclair Electronic Language Database (MELD) English various written Student essays various c. 100,000

Monclair State University, USA
Searchable online
The Multimedia Adult ESL Learner Corpus (MAELC) English ESL environment multimedia Video of classroom interaction and associated written materials From beginning to upper-intermediate  

Stephen Reder
Kathryn Harris
Kristen Setzler
Portland State University, USA

The Lab School would like to share the extensive resources from MAELC with interested researchers and teacher trainers. Those interested should make inquiries to the Lab School by .
The Neungyule Interlanguage Corpus of Korean Learners of English (NICKLE) English Korean spoken and written

Written part: student essays
Spoken part: student interviews and oral speech tests transcriptions

Mainly from beginning to intermediate  c. 890,000 (spoken: c. 100,000)  
Yonsei University, Seoul, Korea
The corpus will be available to the scientific community for research purposes upon request.
The NICT JLE (Japanese Learner English) Corpus English Japanese spoken English oral proficiency interview test various 2 m


National Institute of Information and Communications Technology, Kyoto, Japan.
CD-Rom (Japanese page)
The NOn-native Spanish corpus of English (NOSE)
English Spanish written Argumentative and descriptive student essays Intermediate and upper-intermediate c. 300,000 words  
Universidad de Granada, Spain
 
The NUS Corpus of Learner English English Several East Asian languages, predominantly Chinese written Student essays on a wide range of topics including environmental pollution, healthcare, etc.   various c. 1 m


National University of Singapore, Singapore.
Freely available
The PELCRA Learner English Corpus (PLEC) English Polish spoken and written Written: Argumentative, descriptive, narrative and quasi-academic essays; formal letters From beginning to post-advanced under development: The empirical basis for this project is a 3-million word corpus of spoken (200,000 words) and written (2.8 million words) texts by Polish learners of English (by 2013)

#mailto:Piotr Pęzik:piotr.pezik@gmail.com#

University of Lodz, Poland

 Online search engine and corpus analysis tools
The PICLE corpus (Polish component of ICLE) English Polish written Student essays Advanced 330,000
AMU, Poznan, Poland
Searchable online
The Qatar learner corpus English Arabic (mostly from Qatar) spoken Spoken interviews with Qatari learners of English     Yun Zhao Helen
Carnegie Mellon University, USA
Freely available
The Québec learner corpus English From (from Québec) written Argumentative essays Intermediate and advanced c. 250,000
Université du Québec à Montréal, Canada
 
The Romanian Corpus of Learner English (RoCLE) English Romanian written Student essays    
Zurich University, Switzerland
 
The Russian Learner Translator Corpus
(RusLTC)
English
Russian
Russian written Translations produced by trainee translators Trainee translators c. 1 million tokens Project directors: Andrey Kutuzov and Maria Kunilovskaya Freeliy available
The Santiago University Learner of English Corpus (SULEC) English Spanish spoken and written

Written: compositions or argumentative essays.

Spoken: semistuctured interviews, short oral presentations and brief story descriptions.

       
The Scientext English Learner Corpus English French written Academic argumentative texts     Searchable online
Second Language Research Tasks
(SLRT)
English Various

written

spoken

written paragraphs

various oral tasks

Various c. 300,000

Bill Crawford (Northern Arizona University)

Kim McDonough (Concordia University)

under development
The Seoul National University Korean-speaking English Learner Corpus (SKELC) English Korean written Student essays Various c. 900,000
Seoul National University
Korea
 
The SILS Learner Corpus of English English various (mainly Japanese) written Student essays Basic, intermediate and advanced  
Waseda University, Japan
 
The Soochow Colber Student Corpus (SCSC) English Chinese written Student essays   227,000 Colman Bernath
Soochow University, Taiwan
 
The Spoken and Written English Corpus of Chinese Learners (SWECCL) English Chinese spoken (SECCL) and written (WECCL)

Written: argumentative and narrative essays.

Spoken: National Spoken English Test – longitudinal data

  c. 2 m Wei Qiufang
Liang Maocheng
Wang Lifei
Searchable online
The Taiwanese Corpus of Learner English (TLCE) English Chinese written Journals and essays (descriptive, narrative, expository, argumentative) from intermediate to advanced c. 2 m Rebecca Hsue-Huch Shih
Sun Yat-sen University, Taiwan
 
The Tawainese learner academic writing corpus (TaiwanLAWC) English Chinese written Theses and dissertations written by Taiwanese graduate students.    
National Taiwan Normal University, Taiwan
 

The TELEC Secondary Learner Corpus (TSLC)

 

English Chinese written     1,5 m
University of Hong Kong, Hong Kong
 
The Telecollaborative Learner Corpus of English and German Telekorp English German written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m
Pennsylvania State University, USA.
Not publicly available
The Tswana Learner English Corpus (TLEC) English Tswana written Argumentative essays Advanced c. 200,000
North-West University, South Africa
Available in ICLE
The Uppsala Student English Corpus (USE) English Swedish written student essays various 1,221,265

Uppsala University, Sweden
The corpus can be used for research and educational purposes. It can be accessed on the Internet from the Oxford Text Archive.
The UPF Learner Translation Corpus English Catalan written Translations written by the students of the Translation and Interpreting degree at UPF.    under development
Pompeu Fabra University, Barcelona, Spain 
 
The UPV Learner Corpus English Catalan written essays various 150,000 Universitat Politècnica de València, Spain  
The Varieties of English for Specific Purposes dAtabase (VESPA) learner corpus English various written ESP texts (term papers, reports, MA dissertations) various under development
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
under development
The WriCLE (Written Corpus of Learner English) corpus English Spanish written essays various c. 750,000
Universidad Autonoma de Madrid, Spain
The corpus is available for free, and can be downloaded from this website. There is also a search interface to retrieve sentences and clauses.
The Yonsei English Learner Corpus (YELC) English Korean written Yonsei University English Diagnostic Tests (Part 1: Descriptive task, max. 100 words; Part 2: Argumentative tast, max. 300 words) 9 levels
(A1, A1+, A2, B1, B1+, B2, B2+, C1, C2)
1,085,879 Seok-Chae Rhee

Yonsei University, Korea
The YELC corpus will be available to the scientific community for research purposes from 31 March 2012.
The Young Learner Corpus of English
(YOLECORE)
English Greek spoken Pedagogic Corpus of video-recorded EFL language classes.   120 hours

Project director: Marina Mattheoudakis, Aristotle University of Thessaloniki, Greece

Thomas Zapounidis

Under development
The Estonian Interlanguage Corpus (EIC) of Tallinn University Estonian Russian
Finnish
English
German
Latvian
Lithuanian
Ukrainian
Belorussian
written Spontaneously produced texts in language learning situations: argumentative and literary essays, written stories, letters, term papers, reading reports. A1-C2 1,145,794 Project director:
Tallinn University, Estonia
Restricted online access
Linguistic Basis of the Common European Framework for L2 English and L2 Finnish
(CEFLING)
Finnish
English
Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

 
Paths in Second Language Acquisition
(TOPLING)
Finnish
English
Swedish
Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

 
The Advanced Finnish Learner Corpus
(LAS2)
Finnish  Russian
Czech
Swedish
Estonian
Lithuanian
Komi
English
Hungarian
German
Icelandic
Japanese
written Exam essays, theses, essays and writings Advanced c. 582,000

Kirsti Siitonen, University of Turku, Finland

Ilmari Ivaska, University of Turky, Finland

 
The Finnish National Foreign Language Certificate Corpus (YKI) Finnish

English
Finnish
French
German
Italian
Lappish (Sami)
Spanish
Swedish
Russian

written

spoken

Various Beginner, intermediate and advanced  

Ari Maijanen, Centre for Applied Language Studies, University of Jyväskylä, Finland

Tiina Lammervo, Centre for Applied Language Studies, University of Jyväskylä, Finland

Available with user ID and Password
The International Corpus of Learner Finnish
(ICLFI)
Finnish various written Finnish learners’ spontaneously produced texts in language learning situations, large variety of text types Beginner, intermediate and advanced under development


University of Oulu, Finland

 Free download after applying for a user licence
The Chy-FLE (Cypriot Learner Corpus of French) French Modern Greek
(and Cypriot Greek)
written Argumentative and descriptive essays From intermediate to advanced c. 250,000 (under development)
Université de Poitiers, France
In collaboration with the University of Cyprus
 
The COREIL corpus French
English
  spoken      

Université Paris-Diderot, France
 
The "Dire Autrement" corpus French (Second Language) Mainly L1 speakers of English written Narrative, injunctive, persuasivle and informative texts   48,114
Jasmina Milicevic
Dalhousie University, Canada
 
French Interlanguage Database (FRIDA) French various written      
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
 
French Learner Language Oral Corpora (FLLOC) French various spoken See description of the 7 corpora various  
Newcastle University

University of Southampton, UK

The contents of the database are being made freely available to the research community, in the form of digital sound files and related transcripts formatted using CHILDES software.

Searchable online

The InterFra corpus French Swedish spoken Interviews, retellings of video clips and picture stories various  

 
Stockholm University, Sweden.

The contents of the database are meant to be available to the research community in the form of digital audio files and related transcripts formatted using XML software.
The "Interphonologie du Français Contemporain" (IPFC) corpus French Cypriot Greek
Dutch
English (Canada)
German
Japanese
Norwegian
Spanish

 
spoken Reading aloud, repeating words, guided interviews, interactions between two learners. various under development
Waseda University, Japan
Université de Rouen, France

Université de Genève, Switzerland

Tokyo University of Foreign Studies, Japan
 under development
The LCF corpus (Learner Corpus French) French Dutch written

Argumentative essays
Informative texts
Journalistic texts
Formal letters
Summaries

Written compositions by Flemish students of French

From intermediate to advanced 490,000 K.U.Leuven Campus Kortrijk, UGent and Lessius
under development
The Lund CEFLE Corpus (Corpus Écrit de Français Langue Étrangère) French Swedish written Descriptive and narrative essays; picture-based stories. various 100,000
Lund University, Sweden
A sub-part of the corpus is available online.
The UWi (University of the West Indies) learner corpus French English and Jamaican Creole spoken Conversations during oral exams and in informal contexts various  
University of New South Wales, Sydney, Australia
 
Comasan Labhairt ann an Gàidhlig (CLAG)
-
Gaelic Adult Proficiency
Gaelic various spoken

Conversation task

Narrative

Elicited oral imitation task

Question and answer activity

various  

Roibeard Ó Maolalaigh (University of Glasgow)

Nicola Carty (University of Glasgow)

 
The AleSKO corpus German Chinese (but also German L1 data from the FALKO corpus) written Argumentative essays    
University of Konstanz, Germany

Vilnius Pedagogical University, Lithuania.
 
Analyzing Discourse Strategies: A Computer Learner Corpus German English
(mainly American English)
written Threaded Discussion
Chat
Essays
Longitudinal data
From beginner to intermediate-mid under development

University of Pennsylvania, USA
 
The Corpus of Learner German (CLEG13) German English written Argumentative, free compositions
Longitudinal over 4 years, undergraduate students
Intermediate to advanced c. 320,000

Online access through the FALKO platform.
The corpus is also available as txt files to the scientific community. Please contact

The FALKO corpus (Fehlerannotiertes Lernerkorpus ‘error annotated learner corpus’) German

Learner subcorpus: various

Native subcorpus: German

written

1. Summaries

2. Essays

3. Letters, fiction writing, journal articles, book reviews (= longitudinal data from American learners)

1. Advanced

2. Advanced

3. Beginners - advanced

 

1. 40.638 (learner subcorpus) + 21.211 (native subcorpus)

2. 144.619 (learner corpus) + 70.615 (native subcorpus)

3. 78.151 (learner subcorpus)

Anke Lüdeling
Maik Walter
Humboldt-Universität zu Berlin
Institut für deutsche Sprache und Linguistik, Germany

Online access
The KOLIPSI corpus German Italian written Two written language production tasks of a standardized test (email/letter) A2-C1 under development

European Academy Bolzano/Bozen, Italy
Eurac research on learner corpora
The LeaP Corpus (Learning the Prosody of a foreign language) German various spoken The LeaP corpus covers four different types of speech:
- read speech
- prepared speech
- free speech
- nonsense word lists
various  
University of Augsburg, Germany
The annotated corpus is available to the scientific community. Please contact at the University of Augsburg.
The LeKo (Lernerkorpus) corpus German           , Humboldt-Universität Berlin, Germany

Online access (password protected)

Register here

The LINCS Corpus

1. German

2. German

3. German

1. English

2. German

1. Written

2. Written

3. Written

1. Essays, examination, answers.
Longitudinal and cross-sectional data.

2. Essays

3. Teaching output

1. Intermediate to Advanced

2. Advanced

Under development
Heriot-Watt University Edinburgh, UK
Not currently publicly available
The Telecollaborative Learner Corpus of English and German Telekorp German English written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m


Pennsylvania State University, USA.

 

Not publicly available
The Langman corpus Hungarian Chinese spoken Interviews conducted in 1994 with 11 Chinese immigrants living in Hungary.
Interviews focused on issues related to their arrival in Hungary as well as their daily life activities
   
University of Texas at San Antonio, USA
Freely available
Corpus parlato di italiano L2 Italian English
German
Japanese
spoken Transcriptions of interviews various   Stefania Spina
Silvio Pazzaglia
Mirco Perini
Università per Stranieri di Perugia, Italy
Searchable online
The KOLIPSI corpus Italian German written Two written language production tasks of a standardized test (email/letter) A2-C1 under development
#mailto:
European Academy Bolzano/Bozen, Italy
 
The LIPS Corpus (Lexicon of Sopoken Italian by Foreigners) Italian various spoken Proficiency exams of the Certification of Italian as a Foreign Language (CILS) A1-C2 c. 700,000


Università per Stranieri di Siena, Italy

 
Varietà di Apprendimento della Lingua Italiana: Corpus Online (VALICO) Italian various written   various 567,437
Freely available and searchable online.
The Korean learner corpus Korean various written   Beginner and intermediate c. 10,000
Georgetown University, USA

Wellesley College, USA

Yonsei University, South Korea
 
The ASK (Andrespråkskorpus = Second Language Corpus) corpus Norwegian various written essays    
University of Bergen, Norway
 
The PIKUST pilot learner corpus Slovene various written mostly argumentative essays Majority advanced – but also intermediate and beginner 35,000 Mojca Stritar
University of Ljubljana, Slovenia
 
The Anglia Polytechnic University (APU) Learner Spanish Corpus Spanish various written     120,000
Anglia Ruskin University, UK
 
Aprescrilov ("Aprendera Escribiren Lovaina") Spanish Dutch written Written assignments and tests; several text types (letters, expository, descriptive, argumentative, narrative) from A1 to C1 c. 1 m


KU Leuven, Belgium

Restricted online access
CEDEL2 (Corpus Escrito del Español L2) Spanish English written Written compositions by learners of Spanish   600,000


Universidad Autónoma de Madrid, Spain

Universidad de Granada, Spain

 Please contact to get a free sample of the corpus
The Corpus of Taiwanese Learners of Spanish (Corpus de Aprendices Taiwaneses de Español, CATE) Spanish Chinese written Student essays various 337,122 (under development)  
The DIAZ corpus Spanish

various:

German
Swedish
Icelandic
Korean
Chinese

spoken Semi-spontaneous (structured interviews) and experimental (structured questionnaires) Adult Spanish L2/L3 oral data various  
Universitat Pompeu Fabra, Spain
Freely available
The Japanese learner corpus of Spanish Spanish Japanese written Student essays   83,400
University of Birmingham, UK
 
Spanish Learner Language Oral Corpus (SPLLOC) Spanish English spoken Learner narratives, interviews and picture description tasks from beginner to advanced   Laura Dominguez
University of Southampton, UK
Searchable online
Data freely available for download
Spanish Learner Oral Corpus Spanish various
(9+ languages - especially Portuguese, French, Italian)
spoken Semi-spontaneous interviews, narrative and descriptive tasks A2-B1 more than 50,000 words
Laboratorio de Lingüistica Informatica
Universidad Autonoma de Madrid, Spain
Online access
The ASU corpus Swedish   spoken and written Transcribed audio-recorded conversations and written texts from adult learners of Swedish – longitudinal data   490,000 words
(415,000 spoken and 75,000 written)

Stockholm University, Sweden
 
The ESF (European Science Foundation Second Language) Database

Multilingual:

Dutch
English
French
German
Swedish

various:

Punjabi
Italian
Turkish
Arabic
Spanish
Finnish

spoken Spontaneous second language acquisition of forty adult immigrant workers living in Western Europe, and their communication with native speakers in the respective host countries various  
Clive Perdue
Max Planck Institut, Nijmegen, Netherlands
Freely available
The Foreign Language Examination Corpus (FLEC) Multilingual Polish written Data from the Warsaw University
Certification Exams
various under development

Warsaw University, Poland
 
The MeLLANGE Learner Translator Corpus (LTC) Multilingual various written Legal, technical, administrative and journalistic texts Trainee translators  


Université Paris Diderot, France.

Searchable online
The MiLC Corpus

Multilingual:

Catalan
English
French
Spanish

Catalan written Formal and informal letters, summaries, curriculum vitae, essays, reports, translations, synchronous and asynchronous communication exchanges, business letters     et al
Universidad Polytecnica de Valencia, Spain
 
The Multilingual Learner Corpus (MLC)

Multilingual:

English
German
Italian
Spanish

Brazilian Portuguese written Argumentative and marrative essays    
University of São Paulo, Brazil
Accessible online to registered researchers
The Padova Learner Corpus

Multilingual:

English
French
Spanish

Italian CMC
(Computer-Mediated Communication)

Student work produced in blended language courses using FirstClass conferencing software.
Variety of genres: diaries, debate contributions, formal reports, résumés etc. 
Longitudinal data

 

  under development

University of Padua, Italy
 

The PAROLE corpus
(corpus PARallèle Oral en Langue Etrangère)

 

Multilingual:

English
French
Italian

(Mainly L2 speakers but also includes data produced by L1 speakers)

various spoken 5 oral production tasks various  

Marie-Jo Derive
Nejma Succo
Jean O'Donnell
Sandra Billard
Sandrine Rutigliano-Daspet
Université de Savoie, France
 
The University of Toronto Romance Phonetics Database (RPD)

Multilingual:

English
French
Italian
Portuguese
Romanian
Spanish

various
(including English, Mandarin, Russian, Spanish, etc.)
spoken Elicited production - sentence and passage reading, story narration, description of favourite meal various  

University of Toronto, Canada
Password available from directors

 

| 12/12/2014 |