Sýna einfalda færslu atriðis
| dc.contributor.author |
Barkarson, Starkaður |
| dc.contributor.author |
Steingrímsson, Steinþór |
| dc.contributor.author |
Andrésdóttir, Þórdís Dröfn |
| dc.contributor.author |
Hafsteinsdóttir, Hildur |
| dc.date.accessioned |
2020-09-14T11:20:41Z |
| dc.date.available |
2020-09-14T11:20:41Z |
| dc.date.issued |
2020-09-14 |
| dc.identifier.uri |
http://hdl.handle.net/20.500.12537/51 |
| dc.description |
The evaluation set contains 101.261 tokens and is divided into nine subcorpora: adjudications, books, educational websites, legal tests, news, opinions, parliamentary speeches, sport news and radio and tv news scripts. The texts were retrieved randomly from Icelandic Gigaword Corpus (version 2018) and pos-tagged with the ABL-tagger that had been trained on MIM-GOLD 20.05. It was then flagged by using four complementary methods and the flagged tags then manually checked.
Each line contains a word and a pos-tag, separated by a tab. Each sentence is separated by a newline.
Þetta prófunarsett inniheldur 101,261 tóka og skiptist í níu undirmálheildir: dóma, bækur, fræðslumiðla, lög, fréttir, skoðanir, þingræður, íþróttafréttir og handrit að sjónvarps- og útvarpsfréttum. Textarnir voru valdir af handahófi úr Risamálheildinni (2018) og markaðir með ABL-markaranum sem hafði verið þjálfaður á MIM-GOLD 20.05. Vafaatriði voru merkt með fjórum mismunandi aðferðum og yfirfarin handvirkt.
Hver lína hefur að geyma orð og mark, aðgreind með dálkmerki. Hver málsgrein er sér í línu. |
| dc.language.iso |
isl |
| dc.publisher |
The Árni Magnússon Institute for Icelandic Studies |
| dc.rights |
Icelandic Mim Gold Standard for PoS Tagging |
| dc.rights.uri |
https://repository.clarin.is/repository/xmlui/page/license-mim-gold |
| dc.rights.label |
PUB |
| dc.source.uri |
http://igc.arnastofnun.is |
| dc.subject |
evaluation set |
| dc.subject |
morphosyntactic tagging |
| dc.subject |
gigaword corpus |
| dc.subject |
pos-tagging |
| dc.title |
IGC - evaluation set 20.09 |
| dc.type |
corpus |
| metashare.ResourceInfo#ContentInfo.mediaType |
text |
| has.files |
yes |
| branding |
Clarin IS Repository |
| contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
| sponsor |
Ministry of Education, Science and Culture Icelandic Gigaword Corpus (G1) Language Technology for Icelandic 2019-2023 nationalFunds |
| size.info |
101261 tokens |
| files.size |
329580 |
| files.count |
1 |
Files in this item
This item is
Publicly Available
and licensed under:
Icelandic Mim Gold Standard for PoS Tagging
- Name
- eval_igc_20.09.zip
- Size
- 321.86
KB
- Format
- application/zip
- Description
- eval_igc_20.09
- MD5
- 4f6507987fd384e104cb201ce580451c
Download file
Preview
- eval_set
- books.plain109 kB
- tv_radio.plain121 kB
- opinions.plain129 kB
- adjudications.plain135 kB
- law.plain131 kB
- news.plain127 kB
- parliamentary_speaches.plain130 kB
- educational_websites.plain116 kB
- sport_news.plain98 kB
Sýna einfalda færslu atriðis