Sýna einfalda færslu atriðis

 
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Andrésdóttir, Þórdís Dröfn
dc.contributor.author Hafsteinsdóttir, Hildur
dc.date.accessioned 2020-09-14T11:20:41Z
dc.date.available 2020-09-14T11:20:41Z
dc.date.issued 2020-09-14
dc.identifier.uri http://hdl.handle.net/20.500.12537/51
dc.description The evaluation set contains 101.261 tokens and is divided into nine subcorpora: adjudications, books, educational websites, legal tests, news, opinions, parliamentary speeches, sport news and radio and tv news scripts. The texts were retrieved randomly from Icelandic Gigaword Corpus (version 2018) and pos-tagged with the ABL-tagger that had been trained on MIM-GOLD 20.05. It was then flagged by using four complementary methods and the flagged tags then manually checked. Each line contains a word and a pos-tag, separated by a tab. Each sentence is separated by a newline. Þetta prófunarsett inniheldur 101,261 tóka og skiptist í níu undirmálheildir: dóma, bækur, fræðslumiðla, lög, fréttir, skoðanir, þingræður, íþróttafréttir og handrit að sjónvarps- og útvarpsfréttum. Textarnir voru valdir af handahófi úr Risamálheildinni (2018) og markaðir með ABL-markaranum sem hafði verið þjálfaður á MIM-GOLD 20.05. Vafaatriði voru merkt með fjórum mismunandi aðferðum og yfirfarin handvirkt. Hver lína hefur að geyma orð og mark, aðgreind með dálkmerki. Hver málsgrein er sér í línu.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.rights Icelandic Mim Gold Standard for PoS Tagging
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-mim-gold
dc.rights.label PUB
dc.source.uri http://igc.arnastofnun.is
dc.subject evaluation set
dc.subject morphosyntactic tagging
dc.subject gigaword corpus
dc.subject pos-tagging
dc.title IGC - evaluation set 20.09
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture Icelandic Gigaword Corpus (G1) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 101261 tokens
files.size 329580
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Mim Gold Standard for PoS Tagging
Icon
Name
eval_igc_20.09.zip
Size
321.86 KB
Format
application/zip
Description
eval_igc_20.09
MD5
4f6507987fd384e104cb201ce580451c
 Download file  Preview
 File Preview  
  • eval_set
    • books.plain109 kB
    • tv_radio.plain121 kB
    • opinions.plain129 kB
    • adjudications.plain135 kB
    • law.plain131 kB
    • news.plain127 kB
    • parliamentary_speaches.plain130 kB
    • educational_websites.plain116 kB
    • sport_news.plain98 kB
    • README2 kB

Sýna einfalda færslu atriðis