Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Andrésdóttir, Þórdís Dröfn
dc.contributor.author Hafsteindóttir, Hildur
dc.contributor.author Magnússon, Árni Davíð
dc.contributor.author Rúnarsson, Kristján
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Jónsson, Haukur Páll
dc.contributor.author Loftsson, Hrafn
dc.contributor.author Sigurðsson, Einar Freyr
dc.contributor.author Rögnvaldsson, Eiríkur
dc.contributor.author Helgadóttir, Sigrún
dc.date.accessioned 2021-06-03T07:58:13Z
dc.date.available 2021-06-03T07:58:13Z
dc.date.issued 2021-06-03
dc.identifier.uri http://hdl.handle.net/20.500.12537/114
dc.description [ENGLISH] Training and testing sets from MIM-GOLD 21.05, which is a gold standard for PoS-tagging and lemmatization of Icelandic texts. The gold standard contains approximately 1 million running words with manually annotated PoS-tags and lemmas. The texts are from The Tagged Icelandic Corpus (MÍM), which was published in 2013. The tagset was revised in 2019-2020. It builds upon a tagging scheme created for the Icelandic Frequency Dictionary in 1991. All changes to the tagging scheme are described in the package. [ICELANDIC] Þjálfunar- og prófunargögn úr MÍM-GULL 21.05 sem er gullstaðall fyrir málfræðilega mörkun og lemmun íslenskra texta. Gullstaðallinn inniheldur u.þ.b. 1 milljón orða og eru mörkin og lemmurnar handyfirfarin. Textarnir eru úr Markaðri íslenskri málheild (MÍM), sem var gefin út 2013. Markamengið var endurskoðað 2019-2020. Það byggir á markaskrá sem var gerð fyrir Íslenska orðtíðnibók árið 1991. Öllum breytingum á markamenginu er lýst í skrá sem fylgir gullstaðlinum.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby http://www.ru.is/~hrafn/papers/corpusTagging.final.pdf
dc.rights Icelandic Mim Gold Standard for PoS Tagging
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-mim-gold
dc.rights.label PUB
dc.source.uri http://www.malfong.is/index.php?lang=en&pg=gull
dc.subject corpora
dc.subject gold standard
dc.subject pos-tagging
dc.subject morphosyntactic tagging
dc.subject lemmatization
dc.subject training sets
dc.subject test sets
dc.title MIM-GOLD 21.05 - train/test
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 1000218 tokens
size.info 897443 words
size.info 58412 sentences
files.size 43232069
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Mim Gold Standard for PoS Tagging
Icon
Name
MIM-GOLD-SETS-21.05.zip
Size
41.23 MB
Format
application/zip
Description
MIM-GOLD-SETS-21.05
MD5
0a377525f5cabb60f4283db1d0254b29
 Download file  Preview
 File Preview  
  • MIM-GOLD-SETS.21.05
    • MIM_GOLD_DESCRIPTION_EN.pdf-1 B
    • MIM_GOLD_DESCRIPTION_IS.pdf-1 B
    • sets
      • 03TM.tsv-1 B
      • 02PM.tsv-1 B
      • 01PM.tsv-1 B
      • 09PM.tsv-1 B
      • 08PM.tsv-1 B
      • 10PM.tsv-1 B
      • 07TM.tsv-1 B
      • 06TM.tsv-1 B
      • 05TM.tsv-1 B
      • 05PM.tsv-1 B
      • 04PM.tsv-1 B
      • 03PM.tsv-1 B
      • 02TM.tsv-1 B
      • 09TM.tsv-1 B
      • 01TM.tsv-1 B
      • 08TM.tsv-1 B
      • 10TM.tsv-1 B
      • 07PM.tsv-1 B
      • 06PM.tsv-1 B
      • 04TM.tsv-1 B
    • README-1 B

Show simple item record