Sýna einfalda færslu atriðis
dc.contributor.author |
Helgadóttir, Sigrún |
dc.contributor.author |
Barkarson, Starkaður |
dc.contributor.author |
Hafsteinsdóttir, Hildur |
dc.contributor.author |
Andrésdóttir, Þórdís Dröfn |
dc.date.accessioned |
2020-06-11T13:56:48Z |
dc.date.available |
2020-06-11T13:56:48Z |
dc.date.issued |
2020-05-31 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/38 |
dc.description |
Testing and training sets for pos-tagging from IFD 2020.05 (Icelandic Frequency Dictionary) which contains fragments from 100 texts, published between the years 1980 and 1989.
The testing and training pairs were created in such a way that all the 100 texts that constitute the corpus were divided into ten roughly equal parts. Each of these ten parts forms one test set and a corresponding training set contains the other nine parts.
The pos-tags were mapped to Tagset MIM-GOLD 2.0 (see discussion in http://hdl.handle.net/20.500.12537/26).
----------------
Þjálfunar- og prófunarsafn fyrir málfræðilega mörkun sem unnin voru upp úr Orðtíðinibókinni (2020.05) en hún inniheldur brot úr 100 textum sem gefnir voru út á árunum 1980 til 1989.
Pörin voru búin til þannig að hverri skrá var skipt upp í tíu nokkurn veginn jafna hluta. Hver þessara tíu hluta myndar eitt prófunarsafn og samstætt þjálfunarsafn hefur að geyma hina hlutana níu í hvert sinn.
Mörkunum var varpað yfir á nýtt markamengi, MIM-GULL 2.0 (sjá umfjöllun í http://hdl.handle.net/20.500.12537/26). |
dc.language.iso |
isl |
dc.publisher |
The Árni Magnússon Institute for Icelandic Studies |
dc.rights |
Icelandic Frequency Dictonary |
dc.rights.uri |
https://repository.clarin.is/repository/xmlui/page/license-frequency-dictionary |
dc.rights.label |
PUB |
dc.source.uri |
http://www.malfong.is/index.php?lang=is&pg=ordtidnibok |
dc.subject |
test sets |
dc.subject |
training sets |
dc.subject |
lemmatized |
dc.subject |
pos-tagged |
dc.title |
Icelandic Frequency Dictionary 2020.05 - training/testing sets |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
sponsor |
Ministry of Education, Science and Culture A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info |
589771 tokens |
size.info |
518652 words |
size.info |
37181 sentences |
files.size |
17820840 |
files.count |
1 |
Files in this item
This item is
Publicly Available
and licensed under:
Icelandic Frequency Dictonary
- Name
- IFD3_SETS.zip
- Size
- 17
MB
- Format
- application/zip
- Description
- IFD3_SETS
- MD5
- 3b61859bbff1ac16d163f14624563d58
Download file
Preview
- 09PM.plain626 kB
- 08TM.plain5 MB
- 08PM.plain627 kB
- 07TM.plain5 MB
- 06PM.plain627 kB
- 10PM.plain619 kB
- 04PM.plain626 kB
- 05TM.plain5 MB
- 03TM.plain5 MB
- 01TM.plain5 MB
- 01PM.plain635 kB
- README.txt508 B
- 09TM.plain5 MB
- 07PM.plain627 kB
- 05PM.plain626 kB
- 06TM.plain5 MB
- 10TM.plain5 MB
- 04TM.plain5 MB
- 03PM.plain625 kB
- 02TM.plain5 MB
- 02PM.plain625 kB
Sýna einfalda færslu atriðis