Aller au contenu principal

Calfa-GREgORI Patrologia Graeca

ciol |

Presentation and aim

Last updated version : April 21st 2026

The project, led by the GREgORI project (UCLouvain) and Calfa (Paris) under the academic supervision of Professor Jean-Marie Auwers (UCLouvain), aims to provide scholars with a digital version of texts from the Patrologia Graeca (PG) that have not yet been digitised or are not yet available online in open access.

Text transcription (OCR; word accuracy 94,60%) and linguistic analysis (lemmatization and POS-tagging; Lemma-pos accuracy 94,74%) are performed with specialized AI models developed within the scope of the project, with minimal manual proofreading of the results.

This OCR software, specially developed for this purpose, preserves the complex layout of the pages from the PG volumes, and produces a mostly reliable text, because of the well-known occasionally unclear typography of the J.-P. Migne’s publications. Despite this inconvenience and the remainder of imperfectly recognized words, the results produce a searchable version of the texts. Users will have to check and possibly complete the text they need, and are invited to send their corrections.

In addition, linguistic analysis, based on linguistic resources, computer tools, and IA models jointly developed by GREgORI and Calfa, assigns a lemma and a part-of-speech to each word attested in the processed texts.

An evaluation of the results, allowing to provide scholars with an accurate assessment of the effectiveness of the AI models, will be presented in a forthcoming paper.

Scholars interested in acquiring Greek texts from the PG (with or without linguistic analysis) are invited to email us (info-gregori@uclouvain.be or contact@calfa.fr) for terms and conditions.

About input and output files (results), see below.

Foundings

This project has received fundings from (alphabetical order):

  • ASBL Byzantion

Logo ByzantionLogo CalfaLogo CIOL

■ UCLouvain - FSS - Fondation Sedes Sapientiae

Logo Sapientiae

■ UCLouvain GREgORI Project

Logo Gregori

■ UCLouvain - INCAL - Institut des Civilisations Arts et Lettres

Logo INCAL

■ UCLouvain - RSCS - Institut de recherche pluridisciplinaire Religions Spiritualités Cultures Sociétés

Logo RSCS

 And other private financing.

Members

  • Professor Emeritus Jean-Marie Auwers (UCLouvain, RSCS)
  • Professor Sébastien Moureau (UCLouvain/CIOL)
  • Doctor Véronique Somers (UCLouvain/CIOL) 
  • Doctor Bastien Kindt (UCLouvain/CIOL)
  • Chahan Vidal-Gorène (Université Paris Sciences & Lettres and École nationale des Chartes et Calfa)

Related bibliography

Kindt B., Auwers J.-M., La Fondation Sedes Sapientiae soutient le projet de valorisation numérique de la Patrologie grecque, dans Bulletin de la Fondation Sedes Sapientiae, 45 (janvier 2024), p. 19-21 (WEB version).

Kindt B., Vidal-Gorène C., Delle Donne S., Analyse automatique du grec ancien par réseau de neurones. Évaluation sur le corpus De Thessalonica Capta, dans BABELAO, 10-11 (2022), p. 525-550 (WEB version).

Kindt B., Vidal-Gorène C., From manuscript to tagged corpora. An automated process for Ancient Armenian or other under resourced languages of the Christian East, in Armeniaca. International Journal of Armenian Studies, 1 (2022), p. 73-96 (WEB version).

Vidal-Gorène C., Cafiero F., Kindt B., Under-resourced studies of under-resourced languages: lemmatization and POS-tagging with LLM annotators for historical Armenian, Georgian, Greek and Syriac, 2025, published online on the HAL Science ouverte portal (WEB version).

Vidal-Gorène C., Kindt B., The Patrologia Graeca Corpus: OCR, Annotation, and Open Release of Noisy Nineteenth-Century Polytonic Greek Editions, 2026, published online on the HAL Science ouverte portal (WEB version)

Vidal-Gorène C., La reconnaissance automatique d'écriture à l'épreuve des langues peu dotées, Programming Historian en français, 5 (2023) (WEB version).

Vidal-Gorène C., Reconhecimento automático de manuscritos para o teste de idiomas não latinos, O Programming Historian em portugês, 5 (2024), (WEB version) (translated from the original in French published in 2023).

Input files

Input files processed by the OCR are PDF files available from the Patritisca.net portal or from the Roger Pearse weblog. These files are mainly digitized by Google, and, therefore, are also available from the Google Books portal. See also the Archive.org portal.

Output files and results 

File formats description

All files are encoded in UTF-8 plain text format (this format ensures data intercoperability).

  • [file_name]_text.txt : texts with markups (volume number, page number, and page of the processed PDF file), no hyphenation, empty line detection.
  • [file_name]_tagged_text.vert : vertical texts enriched with intuitive form, lemma, intuitive lemma, and POS for each wordform (analysis performed by AI with minimal manual proofreading of the results). These *.vert files can be uploaded to Sketch Engine (see screenshot below)
Screenshot

List of processed texts

Total of processed words in the CGPG corpus : 6,735,144 tokens - 5,605,015 words.

Click here to download the corpus (texts only)
Click here to download the tagged corpus (lemmatisation and POS-tagging) 

This list has been compiled using data from the Roger Pearse weblog, and from the Patrologia Graeca entry on Academia

PG 3
Date : Pre-Nicaean
Authors and Works : Dionysius the Areopagite (vol. 1)
Processed PDF file
Word count : 134,866

PG 5
Date : Pre-Nicaean
Authors and Works : Ignatius, Polycarp, Popes of 2nd c., Melito, others
Processed PDF file
Word count : 46,164

PG 6
Date : Pre-Nicaean
Authors and Works : Justin, Tatian, Athenagoras, Theophilus, Hermias
Processed PDF file
Word count : 170,482

PG 8
Date : Pre-Nicaean
Authors and Works : Clement of Alexandria (vol. 1); Cohortatio, Paedagogus, Stromata
Processed PDF file
Word count : 168,277

PG 9
Date : Pre-Nicaean
Authors and Works : Clement of Alexandria (vol. 2); Stromata, Quis dives, Excerpta, Eclogae, old scholia, diss. by Le Nourry
Processed PDF file
Word count : 82,135

PG 16.3
Date : Pre-Nicaean
Authors and Works : Origen (vol. 6.3); Hexapla (contd); Hippolytus, Philosophumena
Processed PDF file
Word count : 60,921

PG 21
Date : 4th century
Authors and Works : Eusebius (vol. 3); Praeparatio Evangelica
Processed PDF file
Word count : 236,625

PG 42
Date : 4th century
Authors and Works : Epiphanius (vol. 2); Panarion (contd), Expositio fidei, Anacephalaeosis; Appendix: dissertations
Processed PDF file
Word count : 161,237

PG 67
Date : 5th century
Authors and Works : Socrates, Historia Ecclesiastica; Sozomen, Historia Ecclesiastica
Processed PDF file
Word count : 170,445

PG 71
Date : 5th century
Authors and Works : Cyril of Alexandria (vol. 4): Commentaries on: Hosea, Joel, Amos, Jonah, Abdiah, Micaiah, Nahum, Habakuk, Haggai, etc
Processed PDF file
Word count : 210,957

PG 73
Date : 5th century
Authors and Works : Cyril of Alexandria (vol. 6): Commentary on John
Processed PDF file
Word count : 191,303

PG 87.1
Date : 7th century
Authors and Works : Procopius of Gaza (vol. 1) , Vetus Testamentum commentaries
Processed PDF file
Word count : 151,167

PG 101
Date : 9th century
Authors and Works : Photius (vol. 1): Exegetica: Quaestiones Amphilochiana, Commentary on Novum Testamentum
Processed PDF file
Word count : 178,850

PG 107
Date : 10th century
Authors and Works : Leo the Emperor: Theologica: 19 homilies and panegyrics, Letter to Omar, king of the Saracens, others, Juridical and canonical works – and Poems, Apologia, Epigrams, Tactica sive de re militaria, oracula
Processed PDF file
Word count : 196,727

PG 109
Date : 10th century
Authors and Works : Historical works: 4 books of continuation of Theophanes; Constantine Porphyrogenitus, De vita et rebus avi sui Basilii Macedonis; 2 books on the lives of the emperors Leo, Alexander, and Romanus; Dissertatio steliteutica contra iconomachos (7th century); John of Jerusalem, Narratio de origine motuum iconoclastarum; John Cameniates, Narratio de excidio Thessalonicae; Gregory the monk, ex vita Basil Junioris; Symeon Magister et Logothetes, Annales a Leo Armenio ad Nicephorum Phocam; George the monk, Lives of the recent emperors; Josephus Genesius, History of Constantinople
Processed PDF file
Word count : 148,584

PG 112
Date : 10th century
Authors and Works : Constantine Porphyrogenitus (vol. 1): De ceremoniis
Processed PDF file
Word count : 129,56

PG 113
Date : 10th century
Authors and Works : Constantine Porphyrogenitus (vol. 2): De thematibus, Hieroclis Gramatici Synecdemus, De administrando imperio, Vita Basilii Macedonis, etc..; Anon., Acta S. Nicon in Creta; Theodosius Diaconus, De expugnatione Cretae, etc.
Processed PDF file
Word count : 104,371

PG 118
Date : 10th century
Authors and Works : Oecumenius (vol. 1): Commentary on Acts, Commentary on Paul’s letters, Commentary on the Catholic letters
Processed PDF file
Word count : 208,448

PG 121
Date : 11th century
Authors and Works : George Cedrenus (vol. 1): Compendium Historiarum
Processed PDF file
Word count : 160,853

PG 122
Date : 11th century
Authors and Works : George Cedrenus (vol. 2): Compendium Historiarum (contd); John Scylitzes, Breviarium historicum; Michael Psellus, many works
Processed PDF file
Word count : 150,647

PG 123
Date : 11th century
Authors and Works : Theophylact of Bulgaria (vol. 1): Ennaratio in Evangelium Matthaei / Marci / Lucae / Joannis
Processed PDF file
Word count : 208,024

PG 124
Date : 11th century
Authors and Works : Theophylact of Bulgaria (vol. 2): Commentarius in Joannis Evangelium (contd.); Commentary on Paul’s letters
Processed PDF file
Word count : 210,302

PG 125
Date : 11th century
Authors and Works : Theophylact of Bulgaria (vol. 3): Commentary on Paul’s letters (contd); 1 and 2 Peter; alternative versions of commentaries
Processed PDF file
Word count : 172,696

PG 126
Date : 11th century
Authors and Works : Theophylact of Bulgaria (vol. 4): More Novum Testamentum commentaries; Orations, Letters, Commentaries on minor prophets. Indexes
Processed PDF file
Word count : 164,706

PG 134
Date : 12th century
Authors and Works : John Zonaras (vol. 1): Annales
Processed PDF file
Word count : 196,859

PG 139
Date : 13th century
Authors and Works : Isidore of Thessalonica, Sermons; Nicetas Maroneae; John of Citrus; Marcus of Alexandria; Joel Chronographus, Chronologia compendiara; Nicetas Choniates, Historia Byzantina (from John Comnenus to 1204), On ancient statues destroyed by the Franks after the fall of the city, Thesaurus in 14 books (1-5)
Processed PDF file
Word count : 134,703

PG 146
Date : 14th century
Authors and Works : Nicephorus Callistus (vol. 2); Ecclesiastical History books 8-14
Processed PDF file
Word count : 156,848

PG 148
Date : 14th century
Authors and Works : Nicephorus Gregoras (vol. 1): Historia Byzantina, books 1-24
Processed PDF file
Word count : 234,855

PG 151
Date : 14th century
Authors and Works : Gregory Palamas v2; Gregory Acindynus, Barlaam
Processed PDF file
Word count : 399,518

PG 153
Date : 14th century
Authors and Works : John Cantacuzene (vol. 1): Historia Byzantina in 4 books (events from 1320-1354)
Processed PDF file
Word count : 230,239

PG 155
Date : 15th century
Authors and Works : (1430 AD) Simeon of Thessalonica
Processed PDF file : http://books.google.com/books?id=_McUAAAAQAAJ
Word count : 175,482

PG 157
Date : 15th century
Authors and Works : (1400-1462) George Codinus, works about Constantinople, including De sepulchris imperatorem quae sunt in temple SS. Apostolorum; Ducas, Historia Byzantina 1341-1462, Chronicon breve (contd to 1523)
Processed PDF file
Word count : 95,020

PG 158
Date : 15th century
Authors and Works : (1448-1453) Michael Glyca, Annals, Letters; Others
Processed PDF file
Word count : 163,148