Show simple item record

 
dc.contributor.author Helgadóttir, Sigrún
dc.contributor.author Barkarson, Starkaður
dc.contributor.author Hafsteinsdóttir, Hildur
dc.contributor.author Andrésdóttir, Þórdís Dröfn
dc.date.accessioned 2020-06-11T13:56:48Z
dc.date.available 2020-06-11T13:56:48Z
dc.date.issued 2020-05-31
dc.identifier.uri http://hdl.handle.net/20.500.12537/38
dc.description Testing and training sets for pos-tagging from IFD 2020.05 (Icelandic Frequency Dictionary) which contains fragments from 100 texts, published between the years 1980 and 1989. The testing and training pairs were created in such a way that all the 100 texts that constitute the corpus were divided into ten roughly equal parts. Each of these ten parts forms one test set and a corresponding training set contains the other nine parts. The pos-tags were mapped to Tagset MIM-GOLD 2.0 (see discussion in http://hdl.handle.net/20.500.12537/26). ---------------- Þjálfunar- og prófunarsafn fyrir málfræðilega mörkun sem unnin voru upp úr Orðtíðinibókinni (2020.05) en hún inniheldur brot úr 100 textum sem gefnir voru út á árunum 1980 til 1989. Pörin voru búin til þannig að hverri skrá var skipt upp í tíu nokkurn veginn jafna hluta. Hver þessara tíu hluta myndar eitt prófunarsafn og samstætt þjálfunarsafn hefur að geyma hina hlutana níu í hvert sinn. Mörkunum var varpað yfir á nýtt markamengi, MIM-GULL 2.0 (sjá umfjöllun í http://hdl.handle.net/20.500.12537/26).
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.rights Icelandic Frequency Dictonary
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-frequency-dictionary
dc.rights.label PUB
dc.source.uri http://www.malfong.is/index.php?lang=is&pg=ordtidnibok
dc.subject test sets
dc.subject training sets
dc.subject lemmatized
dc.subject pos-tagged
dc.title Icelandic Frequency Dictionary 2020.05 - training/testing sets
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture A Gold Standard for PoS Tagging (G10) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 589771 tokens
size.info 518652 words
size.info 37181 sentences
files.size 17820840
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Frequency Dictonary
Icon
Name
IFD3_SETS.zip
Size
17 MB
Format
application/zip
Description
IFD3_SETS
MD5
3b61859bbff1ac16d163f14624563d58
 Download file  Preview
 File Preview  
    • 09PM.plain626 kB
    • 08TM.plain5 MB
    • 08PM.plain627 kB
    • 07TM.plain5 MB
    • 06PM.plain627 kB
    • 10PM.plain619 kB
    • 04PM.plain626 kB
    • 05TM.plain5 MB
    • 03TM.plain5 MB
    • 01TM.plain5 MB
    • 01PM.plain635 kB
    • README.txt508 B
    • 09TM.plain5 MB
    • 07PM.plain627 kB
    • 05PM.plain626 kB
    • 06TM.plain5 MB
    • 10TM.plain5 MB
    • 04TM.plain5 MB
    • 03PM.plain625 kB
    • 02TM.plain5 MB
    • 02PM.plain625 kB

Show simple item record