Show simple item record

 
dc.contributor.author Loftsson, Hrafn
dc.contributor.author Yngvason, Jökull H.
dc.contributor.author Helgadóttir, Sigrún
dc.contributor.author Rögnvaldsson, Eiríkur
dc.contributor.author Barkarson, Starkaður
dc.date.accessioned 2020-06-15T11:34:56Z
dc.date.available 2020-06-15T11:34:56Z
dc.date.issued 2018-09-06
dc.identifier.uri http://hdl.handle.net/20.500.12537/44
dc.description The MIM-GOLD corpus version 1.0 consists of 13 files with tagged Icelandic text that has been sampled from 13 domains of texts of the 25 million word Tagged Icelandic Corpus (MIM). The texts were cleaned extensively and then run through an automatic tagging process. The tags were then semi-manually and manually corrected. This version is based on version 0.9 form 2013 but contains corrections to tokenization and tagging that were performed from 2013 to 2017. The corpus is intended for the training of data-drvien taggers for Icelandic. ------- Í útgáfu 1.0 af Gullstaðlinum eru 13 skrár með mörkuðum textum sem voru valdir með úrtaki úr 13 textaflokkum úr 25 milljón orða Markaðri íslenskri málheild (MIM, http://malfong.is/?pg=mim). Textarnir voru hreinsaðir og síðan markaðir með sjálfvirkum aðferðum og síðan var mörkun leiðrétt með hálfsjálfvirkum og handvirkum aðferðum. Þessi útgáfa byggist á útgáfu 0,9 frá 2013 en með leiðréttingum sem voru gerðar á tilreiðslu og mörkun frá 2013 til 2017. Gert er ráð fyrir að málheildin verði notuð sem gullstaðall fyrir þjálfun námfúsra markara.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby http://www.ru.is/~hrafn/papers/corpusTagging.final.pdf
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/39
dc.rights Icelandic Mim Gold Standard for PoS Tagging
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-mim-gold
dc.rights.label PUB
dc.source.uri http://www.malfong.is/index.php?lang=en&pg=gull
dc.subject text corpus
dc.subject gold standard
dc.subject pos-tagging
dc.subject morphosyntactic tagging
dc.subject lemmatization
dc.title MIM-GOLD 1.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor The Iclelandic Student Innovation Fund 903171091 Mörkun og leiðrétting nýrrar málheildar nationalFunds
sponsor Icelandic Research Fund (RANNÍS) 090662011 Viable Language Technology beyond English – Icelandic as a test case nationalFunds
sponsor The Iclelandic Student Innovation Fund 104540000 Íslensk staðalmálheild nationalFunds
size.info 1005688 tokens
size.info 901198 words
size.info 58783 sentences
size.info 13 files
files.size 3705608
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Mim Gold Standard for PoS Tagging
Icon
Name
MIM-GOLD-1_0.zip
Size
3.53 MB
Format
application/zip
Description
MIM-GOLD-1_0
MD5
2c3f557200f5da451b72523e8d62e021
 Download file  Preview
 File Preview  
  • MIM-GOLD-1_0
    • laws.txt453 kB
    • radio_tv_news.txt124 kB
    • mbl.txt2 MB
    • websites.txt743 kB
    • fbl.txt998 kB
    • school_essays.txt373 kB
    • blog.txt1 MB
    • webmedia.txt95 kB
    • scienceweb.txt975 kB
    • written-to-be-spoken.txt203 kB
    • emails.txt57 kB
    • books.txt2 MB
    • adjucations.txt134 kB
    • userlicense_mim_gold_download_en.pdf526 kB
    • readme_mim-gold_1_03 kB
    • userlicense_mim_gold_download_is.pdf101 kB

Show simple item record