dc.contributor.author |
Barkarson, Starkaður |
dc.contributor.author |
Andrésdóttir, Þórdís Dröfn |
dc.contributor.author |
Hafsteindóttir, Hildur |
dc.contributor.author |
Magnússon, Árni Davíð |
dc.contributor.author |
Rúnarsson, Kristján |
dc.contributor.author |
Steingrímsson, Steinþór |
dc.contributor.author |
Jónsson, Haukur Páll |
dc.contributor.author |
Loftsson, Hrafn |
dc.contributor.author |
Sigurðsson, Einar Freyr |
dc.contributor.author |
Rögnvaldsson, Eiríkur |
dc.contributor.author |
Helgadóttir, Sigrún |
dc.date.accessioned |
2021-06-03T07:58:13Z |
dc.date.available |
2021-06-03T07:58:13Z |
dc.date.issued |
2021-06-03 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/114 |
dc.description |
[ENGLISH] Training and testing sets from MIM-GOLD 21.05, which is a gold standard for PoS-tagging and lemmatization of Icelandic texts. The gold standard contains approximately 1 million running words with manually annotated PoS-tags and lemmas. The texts are from The Tagged Icelandic Corpus (MÍM), which was published in 2013. The tagset was revised in 2019-2020. It builds upon a tagging scheme created for the Icelandic Frequency Dictionary in 1991. All changes to the tagging scheme are described in the package.
[ICELANDIC] Þjálfunar- og prófunargögn úr MÍM-GULL 21.05 sem er gullstaðall fyrir málfræðilega mörkun og lemmun íslenskra texta. Gullstaðallinn inniheldur u.þ.b. 1 milljón orða og eru mörkin og lemmurnar handyfirfarin. Textarnir eru úr Markaðri íslenskri málheild (MÍM), sem var gefin út 2013. Markamengið var endurskoðað 2019-2020. Það byggir á markaskrá sem var gerð fyrir Íslenska orðtíðnibók árið 1991. Öllum breytingum á markamenginu er lýst í skrá sem fylgir gullstaðlinum. |
dc.language.iso |
isl |
dc.publisher |
The Árni Magnússon Institute for Icelandic Studies |
dc.relation.isreferencedby |
http://www.ru.is/~hrafn/papers/corpusTagging.final.pdf |
dc.rights |
Icelandic Mim Gold Standard for PoS Tagging |
dc.rights.uri |
https://repository.clarin.is/repository/xmlui/page/license-mim-gold |
dc.rights.label |
PUB |
dc.source.uri |
http://www.malfong.is/index.php?lang=en&pg=gull |
dc.subject |
corpora |
dc.subject |
gold standard |
dc.subject |
pos-tagging |
dc.subject |
morphosyntactic tagging |
dc.subject |
lemmatization |
dc.subject |
training sets |
dc.subject |
test sets |
dc.title |
MIM-GOLD 21.05 - train/test |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
sponsor |
Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info |
1000218 tokens |
size.info |
897443 words |
size.info |
58412 sentences |
files.size |
43232069 |
files.count |
1 |