| dc.contributor.author | 
Helgadóttir, Sigrún | 
| dc.contributor.author | 
Barkarson, Starkaður | 
| dc.contributor.author | 
Hafsteinsdóttir, Hildur | 
| dc.contributor.author | 
Andrésdóttir, Þórdís Dröfn | 
| dc.date.accessioned | 
2020-06-11T13:56:48Z | 
| dc.date.available | 
2020-06-11T13:56:48Z | 
| dc.date.issued | 
2020-05-31 | 
| dc.identifier.uri | 
http://hdl.handle.net/20.500.12537/38 | 
| dc.description | 
Testing and training sets for pos-tagging from IFD 2020.05 (Icelandic Frequency Dictionary) which contains fragments from 100 texts, published between the years 1980 and 1989.
The testing and training pairs were created in such a way that all the 100 texts that constitute the corpus were divided into ten roughly equal parts. Each of these ten parts forms one test set and a corresponding training set contains the other nine parts.
The pos-tags were mapped to Tagset MIM-GOLD 2.0 (see discussion in http://hdl.handle.net/20.500.12537/26).
----------------
Þjálfunar- og prófunarsafn fyrir málfræðilega mörkun sem unnin voru upp úr Orðtíðinibókinni (2020.05) en hún inniheldur brot úr 100 textum sem gefnir voru út á árunum 1980 til 1989.
Pörin voru búin til þannig að hverri skrá var skipt upp í tíu nokkurn veginn jafna hluta. Hver þessara tíu hluta myndar eitt prófunarsafn og samstætt þjálfunarsafn hefur að geyma hina hlutana níu í hvert sinn. 
Mörkunum var varpað yfir á nýtt markamengi, MIM-GULL 2.0 (sjá umfjöllun í http://hdl.handle.net/20.500.12537/26). | 
| dc.language.iso | 
isl | 
| dc.publisher | 
The Árni Magnússon Institute for Icelandic Studies | 
| dc.rights | 
Icelandic Frequency Dictonary | 
| dc.rights.uri | 
https://repository.clarin.is/repository/xmlui/page/license-frequency-dictionary | 
| dc.rights.label | 
PUB | 
| dc.source.uri | 
http://www.malfong.is/index.php?lang=is&pg=ordtidnibok | 
| dc.subject | 
test sets | 
| dc.subject | 
training sets | 
| dc.subject | 
lemmatized | 
| dc.subject | 
pos-tagged | 
| dc.title | 
Icelandic Frequency Dictionary 2020.05 - training/testing sets | 
| dc.type | 
corpus | 
| metashare.ResourceInfo#ContentInfo.mediaType | 
text | 
| has.files | 
yes | 
| branding | 
Clarin IS Repository | 
| contact.person | 
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies | 
| sponsor | 
Ministry of Education, Science and Culture A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds  | 
| size.info | 
589771 tokens | 
| size.info | 
518652 words | 
| size.info | 
37181 sentences | 
| files.size | 
17820840 | 
| files.count | 
1 |