Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Andrésdóttir, Þórdís Dröfn
dc.contributor.author Hafsteinsdóttir, Hildur
dc.contributor.author Magnússon, Árni Davíð
dc.contributor.author Rúnarsson, Kristján
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Jónsson, Haukur Páll
dc.contributor.author Loftsson, Hrafn
dc.contributor.author Sigurðsson, Einar Freyr
dc.contributor.author Rögnvaldsson, Eiríkur
dc.contributor.author Helgadóttir, Sigrún
dc.date.accessioned 2021-06-03T07:32:36Z
dc.date.available 2021-06-03T07:32:36Z
dc.date.issued 2021-06-02
dc.identifier.uri http://hdl.handle.net/20.500.12537/113
dc.description [ENGLISH] MIM-GOLD 21.05 is a gold standard for PoS-tagging and lemmatizing Icelandic texts. This new version contains the same texts as version 20.05 but lemmas have been added and some corrections have been made to the PoS-tagging. The gold standard contains approximately 1 million running words with manually annotated PoS-tags and lemmas. The texts are from The Tagged Icelandic Corpus (MÍM), which was published in 2013. The tagset was revised in 2019-2020. It builds upon a tagging scheme created for the Icelandic Frequency Dictionary in 1991. The tagging scheme is described in the package. [ICELANDIC] MÍM-GULL 21.05 er gullstaðall fyrir mörkun of lemmun íslenskra texta. Þessi nýja útgáfa inniheldur sama texta og útgáfa 20.05 en lemmum hefur verið bætt við og einhver mörk leiðrétt. Gullstaðallinn inniheldur u.þ.b. 1 milljón orða og mörkin eru handyfirfarin. Textarnir eru úr Markaðri íslenskri málheild (MÍM), sem var gefin út 2013. Markamengið var endurskoðað 2019-2020. Það byggir á markaskrá sem var gerð fyrir Íslenska orðtíðnibók árið 1991. Markamenginu er lýst í skrá sem fylgir gullstaðlinum.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby http://www.ru.is/~hrafn/papers/corpusTagging.final.pdf
dc.rights Icelandic Mim Gold Standard for PoS Tagging
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-mim-gold
dc.rights.label PUB
dc.subject gold standard
dc.subject pos-tagging
dc.subject morphosyntactic tagging
dc.subject lemmatization
dc.title MIM-GOLD 21.05
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 1000218 tokens
size.info 58412 sentences
files.size 9284697
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Mim Gold Standard for PoS Tagging
Icon
Name
MIM-GOLD-21.05.zip
Size
8.85 MB
Format
application/zip
Description
MIM-GOLD-21.05
MD5
46aefe051f79300c58fd2b63056749b5
 Download file  Preview
 File Preview  
  • MIM-GOLD21.05
    • MIM_GOLD_DESCRIPTION_EN.pdf-1 B
    • MIM_GOLD_DESCRIPTION_IS.pdf-1 B
    • mim.tsv-1 B
    • README-1 B
    • data
      • fbl.tsv-1 B
      • websites.tsv-1 B
      • blog.tsv-1 B
      • webmedia.tsv-1 B
      • school-essays.tsv-1 B
      • scienceweb.tsv-1 B
      • emails.tsv-1 B
      • written-to-be-spoken.tsv-1 B
      • books.tsv-1 B
      • adjucations.tsv-1 B
      • laws.tsv-1 B
      • mbl.tsv-1 B
      • radio-tv-news.tsv-1 B

Show simple item record