Corpus

Here you can download files containing data from the corpus. The .xlsx files contain source and target texts, segmented into clauses, with syntactic annotation on the clause level. The following texts are included in the ENHIGLA corpus:

1) The Book of Genesis (Latin - Old English)

2) Bede's Historia Ecclesiastica Gentis Anglorum (Latin - Old English)

3) The Gospel of Luke from West-Saxon Gospels (Latin - Old English)

4) Tatian Gospel Translation (Latin - Old High German)

5) Isidor's De fide catolica (Latin - Old High German)

6) Physiologus (Latin - Old High German)

OE texts

Title

 

Type

 

Time of

composition

Length of sample

 

No. of target clauses

No. of source clauses

Heptateuch

biblical

Late 10th cent.

25 chapters of Book of Genesis

1767

2355

Bede

secular

Late 9th cent.

Book 1 and four chapters of Book 2

2101

2343

WS Gospels

biblical

Late 10th cent.

10 chapters of Gospel of Luke

2092

1955

OHG texts

Title

Type

Time of composition

Length of sample

No. of target clauses

No. of source clauses

Tatian

biblical

8th cent.

74 chapters

3029

2980

Isidor

religious

8th cent.

whole text

825

902

Physiologus

religious

Early 11th cent.

whole text

298

265

 The next set of .xlsx files contains source and target texts aligned on the level of clauses:

1) The Book of Genesis (Latin - Old English aligned)

2) Bede's Historia Ecclesiastica Gentis Anglorum (Latin - Old English aligned)

3) The Gospel of Luke from West-Saxon Gospels (Latin - Old English aligned)

4) Tatian Gospel Translation (Latin - Old High German aligned)

5) Isidor's De fide catolica (Latin - Old High German aligned)

6) Physiologus (Latin - Old High German aligned)

 

The complete ENHIGLA database (in the form of an .sql file) may be downloaded here.