Pelcra

Polish & English
Language Corpora
for Research
& Applications

Conversational Spoken Corpus of Polish

 The most recent version of our spoken-conversational corpus is available at: spokes.clarin-pl.eu and http://pelcra.clarin-pl.eu/spokes2-web/.

Offline time-aligned corpus 

http://pelcra.pl/resources/spoken/pelcra_sp_2.tgz

Please contact us if you would like to download the 16GB media files.

This citation is required to fulfill the CC attribution condition of the license.

  • Piotr Pęzik 2012 Język mówiony w NKJP. In Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw. 2012.

More recent corpora

There are also a number of more recent corpora listed below which can be downloaded with recordings.

The following paper should be cited fulfill the CC attribution condition of the license for these resources:

PELCRA_EMO

A corpus of focused interviews (people reflecting upon their emotions). 

 

https://uniwersytetlodzki-my.sharepoint.com/:f:/g/personal/pelcra_uni_lodz_pl/EtxBCk44jGZIs24XDRds4lgB9JsAzOyuVIWPh0xQDnJrPw?e=dDhvIp

 

PELCRA_LUZ

 

A corpus of open interviews.

 

https://uniwersytetlodzki-my.sharepoint.com/:f:/g/personal/pelcra_uni_lodz_pl/EnMgq0aOpPVGvYeR_SbjZIMBmjUdXW_GYsKX1HnmQEbWAg?e=gxIegQ

 

PELCRA_PARL

 

Samples of spoken parliamentary data.

 

https://uniwersytetlodzki-my.sharepoint.com/:f:/g/personal/pelcra_uni_lodz_pl/EpPehikqGqZJltrAKlVp3k0BPXe0_Nhweb4cc0imfDS-vg?e=KByc6Y

 

PELCRA_EMI

 

A corpus of Polish emmigrants to Scotland.

 

https://uniwersytetlodzki-my.sharepoint.com/:f:/g/personal/pelcra_uni_lodz_pl/EgbGrvTeG65Kjw4eYNAZiuYBMFAPJFkdhb7ttPBo1wSLww?e=tvWL9S