dc.contributor.author | Wallenberg, Joel C. |
dc.contributor.author | Ingason, Anton Karl |
dc.contributor.author | Sigurðsson, Einar Freyr |
dc.contributor.author | Rögnvaldsson, Eiríkur |
dc.date.accessioned | 2024-03-26T10:05:06Z |
dc.date.available | 2024-03-26T10:05:06Z |
dc.date.issued | 2024-03 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/325 |
dc.description | The Icelandic Parsed Historical Corpus (IcePaHC) is a manually corrected treebank, parsed according to the annotation guidelines of The Penn Parsed Corpora of Historical English (PPCHE), with minor modifications that are specific to Icelandic (see https://linguist.is/wiki/ for further details). It consists of about 1 million words from the 12th century to the 21st. The samples in the corpus are close to being evenly distributed over this period. Most of the text consists of narratives and religious material but some samples from other genres are also included. The file format is labeled bracketing with a UTF-8 encoding. The corpus is released under a CC BY 4.0 license. Sögulegi íslenski trjábankinn (IcePaHC) er handleiðréttur trjábanki sem er greindur samkvæmt þáttunarskema sögulegu ensku Penn-trjábankanna (Penn Parsed Corpora of Historical English; PPCHE). Bankinn inniheldur um 1 milljón lesmálsorða frá 12. til 21. aldar. Gögnin í málheildinni eru tiltölulega jafndreifð yfir þetta tímabil. Flestir textarnir eru frásagnartextar eða trúartextar en einnig er um að ræða einhver dæmi um aðrar textategundir. Skráarsniðið er svigasnið (e. labeled bracketing) og textinn er í UTF-8 stafasetti. Málheildinni er dreift með CC BY 4.0 leyfi. |
dc.language.iso | isl |
dc.publisher | University of Iceland |
dc.relation.replaces | http://hdl.handle.net/20.500.12537/62 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/antonkarl/icecorpus/ |
dc.source.uri | https://linguist.is/wiki/ |
dc.subject | treebank |
dc.subject | corpus |
dc.title | Icelandic Parsed Historical Corpus (IcePaHC) 2024.03 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Joel C. Wallenberg joel.wallenberg@york.ac.uk University of York |
contact.person | Anton Karl Ingason antoni@hi.is University of Iceland |
contact.person | Einar Freyr Sigurðsson einar.freyr.sigurdsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
sponsor | Icelandic Research Fund (RANNÍS) 090662011 Viable Language Technology beyond English – Icelandic as a test case nationalFunds |
sponsor | The U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP) OISE-0853114 Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English Other |
sponsor | The ICT Policy Support Programme (EU 7th Framework) 270899 META-NORD, Baltic and Nordic Parts of the European Open Linguistic Infrastructure euFunds |
size.info | 1000000 words |
files.size | 13025504 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- icepahc-v2024.03.zip
- Size
- 12.42 MB
- Format
- application/zip
- Description
- A zip file containing the parsed, tagged, info and raw files
- MD5
- 4b6fc35d53e1abc9fe76b0619c69f551
- icepahc-v2024.03
- txt
- 1720.vidalin.rel-ser.txt123 kB
- 1540.ntacts.rel-bib.txt88 kB
- 1593.eintal.rel-oth.txt125 kB
- 1250.thetubrot.nar-sag.txt18 kB
- 1675.armann.nar-fic.txt59 kB
- 1350.bandamennM.nar-sag.txt65 kB
- 1835.jonasedli.sci-nat.txt18 kB
- 1250.sturlunga.nar-sag.txt123 kB
- 1907.leysing.nar-fic.txt114 kB
- 1859.hugvekjur.rel-ser.txt109 kB
- 1611.okur.rel-oth.txt88 kB
- 1675.magnus.bio-oth.txt18 kB
- 1628.olafuregils.bio-tra.txt87 kB
- 1150.homiliubok.rel-ser.txt213 kB
- 1908.ofurefli.nar-fic.txt106 kB
- 1985.sagan.nar-fic.txt118 kB
- 1790.fimmbraedra.nar-sag.txt100 kB
- 1861.orrusta.nar-fic.txt109 kB
- 1450.ectorssaga.nar-sag.txt113 kB
- 1260.jomsvikingar.nar-sag.txt107 kB
- 1400.viglundur.nar-sag.txt71 kB
- 1888.grimur.nar-fic.txt37 kB
- 1210.jartein.rel-sag.txt54 kB
- 2008.ofsi.nar-sag.txt116 kB
- 1888.vordraumur.nar-fic.txt57 kB
- 1725.biskupasogur.nar-rel.txt132 kB
- 1400.gunnar2.nar-sag.txt16 kB
- 1450.judit.rel-bib.txt35 kB
- 1791.jonsteingrims.bio-aut.txt118 kB
- 1275.morkin.nar-his.txt121 kB
- 2008.mamma.nar-fic.txt119 kB
- 1883.voggur.nar-fic.txt10 kB
- 1985.margsaga.nar-fic.txt124 kB
- 1475.aevintyri.nar-rel.txt89 kB
- 1480.jarlmann.nar-sag.txt73 kB
- 1525.erasmus.nar-sag.txt45 kB
- 1350.finnbogi.nar-sag.txt119 kB
- 1210.thorlakur.rel-sag.txt59 kB
- 1680.skalholt.nar-rel.txt55 kB
- 1450.vilhjalmur.nar-sag.txt121 kB
- 1525.georgius.nar-rel.txt109 kB
- 1830.hellismenn.nar-sag.txt80 kB
- 1150.firstgrammar.sci-lin.txt22 kB
- 1300.alexander.nar-sag.txt125 kB
- 1902.fossar.nar-fic.txt109 kB
- 1400.gunnar.nar-sag.txt43 kB
- 1325.arni.nar-sag.txt115 kB
- 1850.piltur.nar-fic.txt94 kB
- 1745.klim.nar-fic.txt124 kB
- 1675.modars.nar-fic.txt20 kB
- 1630.gerhard.rel-oth.txt70 kB
- 1540.ntjohn.rel-bib.txt106 kB
- 1650.illugi.nar-sag.txt107 kB
- 1920.arin.rel-ser.txt116 kB
- 1661.indiafari.bio-tra.txt126 kB
- 1270.gragas.law-law.txt30 kB
- 1310.grettir.nar-sag.txt107 kB
- 1882.torfhildur.nar-fic.txt138 kB
- 1450.bandamenn.nar-sag.txt54 kB
- 1350.marta.rel-sag.txt94 kB
- 1659.pislarsaga.bio-aut.txt55 kB
- psd
- 1920.arin.rel-ser.psd692 kB
- 1661.indiafari.bio-tra.psd766 kB
- 1270.gragas.law-law.psd204 kB
- 1310.grettir.nar-sag.psd669 kB
- 1882.torfhildur.nar-fic.psd857 kB
- 1450.bandamenn.nar-sag.psd378 kB
- 1350.marta.rel-sag.psd561 kB
- 1659.pislarsaga.bio-aut.psd349 kB
- 1720.vidalin.rel-ser.psd783 kB
- 1593.eintal.rel-oth.psd792 kB
- 1250.thetubrot.nar-sag.psd120 kB
- 1540.ntacts.rel-bib.psd579 kB
- 1675.armann.nar-fic.psd371 kB
- 1350.bandamennM.nar-sag.psd445 kB
- 1835.jonasedli.sci-nat.psd112 kB
- 1250.sturlunga.nar-sag.psd753 kB
- 1907.leysing.nar-fic.psd656 kB
- 1859.hugvekjur.rel-ser.psd668 kB
- 1611.okur.rel-oth.psd534 kB
- 1675.magnus.bio-oth.psd109 kB
- 1628.olafuregils.bio-tra.psd550 kB
- 1150.homiliubok.rel-ser.psd1 MB
- 1908.ofurefli.nar-fic.psd668 kB
- 1985.sagan.nar-fic.psd718 kB
- 1790.fimmbraedra.nar-sag.psd634 kB
- 1861.orrusta.nar-fic.psd658 kB
- 1450.ectorssaga.nar-sag.psd709 kB
- 1260.jomsvikingar.nar-sag.psd688 kB
- 1400.viglundur.nar-sag.psd443 kB
- 1888.grimur.nar-fic.psd233 kB
- 1210.jartein.rel-sag.psd338 kB
- 2008.ofsi.nar-sag.psd708 kB
- 1888.vordraumur.nar-fic.psd357 kB
- 1725.biskupasogur.nar-rel.psd732 kB
- 1400.gunnar2.nar-sag.psd104 kB
- 1450.judit.rel-bib.psd222 kB
- 1791.jonsteingrims.bio-aut.psd749 kB
- 1275.morkin.nar-his.psd786 kB
- 2008.mamma.nar-fic.psd747 kB
- 1883.voggur.nar-fic.psd64 kB
- 1985.margsaga.nar-fic.psd750 kB
- 1480.jarlmann.nar-sag.psd463 kB
- 1525.erasmus.nar-sag.psd283 kB
- 1475.aevintyri.nar-rel.psd587 kB
- 1350.finnbogi.nar-sag.psd764 kB
- 1210.thorlakur.rel-sag.psd371 kB
- 1680.skalholt.nar-rel.psd331 kB
- 1450.vilhjalmur.nar-sag.psd784 kB
- 1525.georgius.nar-rel.psd703 kB
- 1830.hellismenn.nar-sag.psd486 kB
- 1300.alexander.nar-sag.psd781 kB
- 1150.firstgrammar.sci-lin.psd151 kB
- 1902.fossar.nar-fic.psd658 kB
- 1400.gunnar.nar-sag.psd276 kB
- 1325.arni.nar-sag.psd679 kB
- 1850.piltur.nar-fic.psd584 kB
- 1745.klim.nar-fic.psd737 kB
- 1675.modars.nar-fic.psd127 kB
- 1630.gerhard.rel-oth.psd423 kB
- 1540.ntjohn.rel-bib.psd744 kB
- 1650.illugi.nar-sag.psd664 kB
- tagged
- 2008.ofsi.nar-sag.tagged321 kB
- 1675.magnus.bio-oth.tagged49 kB
- 1270.gragas.law-law.tagged86 kB
- 1450.bandamenn.nar-sag.tagged156 kB
- 1725.biskupasogur.nar-rel.tagged361 kB
- 1680.skalholt.nar-rel.tagged155 kB
- 1210.thorlakur.rel-sag.tagged163 kB
- 1745.klim.nar-fic.tagged339 kB
- 1659.pislarsaga.bio-aut.tagged153 kB
- 1400.viglundur.nar-sag.tagged198 kB
- 1593.eintal.rel-oth.tagged354 kB
- 1400.gunnar2.nar-sag.tagged47 kB
- 1630.gerhard.rel-oth.tagged197 kB
- 1250.sturlunga.nar-sag.tagged346 kB
- 1790.fimmbraedra.nar-sag.tagged280 kB
- 1350.finnbogi.nar-sag.tagged336 kB
- 1210.jartein.rel-sag.tagged154 kB
- 1661.indiafari.bio-tra.tagged350 kB
- 2008.mamma.nar-fic.tagged337 kB
- 1907.leysing.nar-fic.tagged316 kB
- 1611.okur.rel-oth.tagged241 kB
- 1150.firstgrammar.sci-lin.tagged62 kB
- 1888.vordraumur.nar-fic.tagged161 kB
- 1883.voggur.nar-fic.tagged29 kB
- 1310.grettir.nar-sag.tagged300 kB
- 1525.georgius.nar-rel.tagged303 kB
- 1675.modars.nar-fic.tagged57 kB
- 1325.arni.nar-sag.tagged313 kB
- 1902.fossar.nar-fic.tagged306 kB
- 1540.ntacts.rel-bib.tagged246 kB
- 1985.sagan.nar-fic.tagged332 kB
- 1850.piltur.nar-fic.tagged265 kB
- 1400.gunnar.nar-sag.tagged123 kB
- 1835.jonasedli.sci-nat.tagged51 kB
- 1861.orrusta.nar-fic.tagged305 kB
- 1350.bandamennM.nar-sag.tagged185 kB
- 1888.grimur.nar-fic.tagged106 kB
- 1720.vidalin.rel-ser.tagged345 kB
- 1525.erasmus.nar-sag.tagged125 kB
- 1475.aevintyri.nar-rel.tagged253 kB
- 1260.jomsvikingar.nar-sag.tagged301 kB
- 1450.ectorssaga.nar-sag.tagged318 kB
- 1250.thetubrot.nar-sag.tagged51 kB
- 1920.arin.rel-ser.tagged321 kB
- 1540.ntjohn.rel-bib.tagged301 kB
- 1859.hugvekjur.rel-ser.tagged304 kB
- 1830.hellismenn.nar-sag.tagged223 kB
- 1675.armann.nar-fic.tagged166 kB
- 1882.torfhildur.nar-fic.tagged385 kB
- 1650.illugi.nar-sag.tagged302 kB
- 1985.margsaga.nar-fic.tagged346 kB
- 1350.marta.rel-sag.tagged259 kB
- 1275.morkin.nar-his.tagged345 kB
- 1300.alexander.nar-sag.tagged344 kB
- 1450.judit.rel-bib.tagged96 kB
- 1628.olafuregils.bio-tra.tagged247 kB
- 1450.vilhjalmur.nar-sag.tagged344 kB
- 1150.homiliubok.rel-ser.tagged588 kB
- 1908.ofurefli.nar-fic.tagged298 kB
- 1791.jonsteingrims.bio-aut.tagged330 kB
- 1480.jarlmann.nar-sag.tagged205 kB
- cc-by-4.0.txt18 kB
- README2 kB
- info
- 1791.jonsteingrims.bio-aut.info516 B
- 1260.jomsvikingar.nar-sag.info336 B
- 1859.hugvekjur.rel-ser.info413 B
- 1630.gerhard.rel-oth.info611 B
- 1659.pislarsaga.bio-aut.info329 B
- 1720.vidalin.rel-ser.info416 B
- 1920.arin.rel-ser.info263 B
- 1985.margsaga.nar-fic.info206 B
- 1150.firstgrammar.sci-lin.info971 B
- 1270.gragas.law-law.info342 B
- 1611.okur.rel-oth.info303 B
- 1275.morkin.nar-his.info404 B
- 1902.fossar.nar-fic.info324 B
- 1850.piltur.nar-fic.info499 B
- 1725.biskupasogur.nar-rel.info348 B
- 1985.sagan.nar-fic.info201 B
- 1210.jartein.rel-sag.info364 B
- 1835.jonasedli.sci-nat.info417 B
- 1250.thetubrot.nar-sag.info551 B
- 1480.jarlmann.nar-sag.info291 B
- 1675.modars.nar-fic.info271 B
- 1250.sturlunga.nar-sag.info351 B
- 1400.gunnar2.nar-sag.info354 B
- 1400.viglundur.nar-sag.info291 B
- 1540.ntjohn.rel-bib.info1 kB
- 1882.torfhildur.nar-fic.info576 B
- 1790.fimmbraedra.nar-sag.info306 B
- 1650.illugi.nar-sag.info270 B
- 1525.erasmus.nar-sag.info325 B
- 1350.finnbogi.nar-sag.info338 B
- 1680.skalholt.nar-rel.info282 B
- 1888.vordraumur.nar-fic.info289 B
- 1888.grimur.nar-fic.info291 B
- 1210.thorlakur.rel-sag.info379 B
- 1661.indiafari.bio-tra.info707 B
- 1745.klim.nar-fic.info302 B
- 1300.alexander.nar-sag.info511 B
- 1450.ectorssaga.nar-sag.info295 B
- 1350.marta.rel-sag.info402 B
- 1525.georgius.nar-rel.info339 B
- 1400.gunnar.nar-sag.info355 B
- 1310.grettir.nar-sag.info306 B
- 2008.ofsi.nar-sag.info184 B
- 1675.magnus.bio-oth.info264 B
- 1325.arni.nar-sag.info585 B
- 1830.hellismenn.nar-sag.info285 B
- 1675.armann.nar-fic.info350 B
- 1593.eintal.rel-oth.info275 B
- 1883.voggur.nar-fic.info285 B
- 1540.ntacts.rel-bib.info1 kB
- 1861.orrusta.nar-fic.info227 B
- 1150.homiliubok.rel-ser.info1 kB
- 1907.leysing.nar-fic.info382 B
- 1350.bandamennM.nar-sag.info362 B
- 1450.judit.rel-bib.info283 B
- 1628.olafuregils.bio-tra.info374 B
- 1450.bandamenn.nar-sag.info357 B
- 1475.aevintyri.nar-rel.info277 B
- 2008.mamma.nar-fic.info240 B
- 1908.ofurefli.nar-fic.info248 B
- 1450.vilhjalmur.nar-sag.info295 B
- txt