Show simple item record

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Sigurðsson, Einar Freyr
dc.contributor.author Rögnvaldsson, Eiríkur
dc.contributor.author Hafsteinsdóttir, Hildur
dc.contributor.author Loftsson, Hrafn
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Andrésdóttir, Þórdís Dröfn
dc.date.accessioned 2020-06-12T14:02:38Z
dc.date.available 2020-06-12T14:02:38Z
dc.date.issued 2020-05-29
dc.identifier.uri http://hdl.handle.net/20.500.12537/40
dc.description Training and testing sets from MIM-GOLD 20.05, which is a gold standard for PoS-tagging Icelandic texts. This new version uses a revised tagset. The gold standard contains approximately 1 million running words with manually annotated PoS-tags. The texts are from The Tagged Icelandic Corpus (MÍM), which was published in 2013. The tagset was revised in 2019-2020. It builds upon a tagging scheme created for the Icelandic Frequency Dictionary in 1991. All changes to the tagging scheme are described in the package. ----------- Þjálfunar- og prófunargögn úr MÍM-GULL 20.05 sem er gullstaðall fyrir mörkun íslenskra texta. Þessi nýja útgáfa notast við endurskoðað markamengi. Gullstaðallinn inniheldur u.þ.b. 1 milljón orða og mörkin eru handyfirfarin. Textarnir eru úr Markaðri íslenskri málheild (MÍM), sem var gefin út 2013. Markamengið var endurskoðað 2019-2020. Það byggir á markaskrá sem var gerð fyrir Íslenska orðtíðnibók árið 1991. Öllum breytingum á markamenginu er lýst í skrá sem fylgir gullstaðlinum.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby http://www.ru.is/~hrafn/papers/corpusTagging.final.pdf
dc.rights Icelandic Mim Gold Standard for PoS Tagging
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-mim-gold
dc.rights.label PUB
dc.source.uri http://www.malfong.is/index.php?lang=en&pg=gull
dc.subject text corpus
dc.subject gold standard
dc.subject pos-tagging
dc.subject morphosyntactic tagging
dc.subject lemmatization
dc.subject training sets
dc.subject test sets
dc.title MIM-GOLD 20.05 - train/test
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 1000218 tokens
size.info 897443 words
size.info 58412 sentences
files.size 30156395
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Mim Gold Standard for PoS Tagging
Icon
Name
MIM-GOLD-SETS.20.05.zip
Size
28.76 MB
Format
application/zip
Description
MIM-GOLD-SETS.20.05
MD5
7d775b5faa3cf5edc8098f6c25b845e0
 Download file  Preview
 File Preview  
  • sets
    • 09PM.plain1 MB
    • 08TM.plain9 MB
    • 08PM.plain1 MB
    • 07TM.plain9 MB
    • 06PM.plain1 MB
    • 04PM.plain1 MB
    • 10PM.plain1 MB
    • 05TM.plain9 MB
    • 03TM.plain9 MB
    • 01TM.plain9 MB
    • 01PM.plain1 MB
    • 09TM.plain9 MB
    • 07PM.plain1 MB
    • 05PM.plain1 MB
    • 06TM.plain9 MB
    • 04TM.plain9 MB
    • 10TM.plain9 MB
    • 03PM.plain1 MB
    • 02TM.plain9 MB
    • 02PM.plain1 MB
    • MIM_GOLD_DESCRIPTION_EN.pdf123 kB
    • MIM_GOLD_DESCRIPTION_IS.pdf123 kB
    • README1 kB

Show simple item record