Show simple item record

 
dc.contributor.author Pind, Jörgen
dc.contributor.author Magnússon, Friðrik
dc.contributor.author Briem, Stefán
dc.date.accessioned 2020-06-11T13:48:55Z
dc.date.available 2020-06-11T13:48:55Z
dc.date.issued 2018-10-05
dc.identifier.uri http://hdl.handle.net/20.500.12537/36
dc.description A special text corpus was created for the making of the Icelandic Frequency Dictionary (Pind, Magnússon and Briem, 1991), published by The Institute of Lexicography in 1991. Preparations for that corpus started in 1985 and a detailed description of the work can be found in the preface to the book. There are fragments from 100 texts in the corpus; all published between the years 1980 and 1989. Each text contains about 5,000 running words. The texts were selected from five categories: Icelandic fiction (20 texts), translated fiction (20 texts), biographies and memoirs (20 texts), non-fiction (10 in the field of humanities, 10 in the field of science) and books for children and teenagers (10 original texts, 10 translations). Each token is followed by a lemma and a pos-tag. The pos-tags were analyzed automatically and then manually corrected. The version available here has been corrected further, automatically. For this version (2018.10) the tagset used has been reduced: a) The tagging of proper nouns has been simplified - all proper nouns are treated the same way instead of making a difference between place names, person names and other names. b) Only one tag (ta) is used for numerical constants, i.e. these numerals are not tagged as cardinal numbers that can be declined. ----------------- Fyrir vinnslu Íslenskrar orðtíðnibókar (Jörgen Pind, Stefán Briem og Friðrik Magnússon 1991) sem Orðabók Háskólans gaf út 1991 var gert sérstakt textasafn. Vinna við undirbúning textasafnsins hófst 1985 og er safninu lýst nákvæmlega í formála Orðtíðnibókarinnar. Í textasafninu eru brot úr 100 textum sem voru gefnir út á tímabilinu 1980-1989, hvert með um 5.000 lesmálsorðum. Textarnir voru valdir úr fimm textaflokkum: íslenskum skáldverkum (20 textar), þýddum skáldverkum (20 textar), ævisögum og minningum (20 textar), fræðslutextum (10 á sviði hugvísinda, 10 á sviði raunvísinda) og barna og unglingabókum (10 frumsamdir textar, 10 þýddir textar). Hverjum tóka fylgir lemma og málfræðimark. Málfræðimörkin voru valin vélrænt og svo leiðrétt. Útgáfan sem hér er í boði hefur verið leiðrétt en frekar (http://aclweb.org/anthology-new/E/E09/E09-1060.pdf). Fyrir þessa útgáfu málheildarinnar (2018.10) var notast við einfaldað markamengi: a) Mörkun sérnafna hefur verið einfölduð - öll sérnöfn eru mörkuð á sama hátt en ekki gerður greinarmunur á mannanöfnum, örnefnum og öðrum sérnöfnum. b) Aðeins er notast við markið 'ta' til að marka tölustafi, þ.e. þeir eru ekki markaðir sem töluorð sem hægt er að fallbeygja.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.replaces https://repository.clarin.is/repository/xmlui/handle/20.500.12537/35
dc.rights Icelandic Frequency Dictonary
dc.rights.uri https://repository.clarin.is/repository/xmlui/page/license-frequency-dictionary
dc.rights.label PUB
dc.source.uri http://www.malfong.is/?pg=ordtidnibok
dc.subject text corpus
dc.subject lemmatized
dc.subject pos-tagged
dc.subject IFD
dc.title Icelandic Frequency Dictionary 2018.10
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
demo.uri https://malheildir.arnastofnun.is/?mode=otb
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
size.info 590299 tokens
size.info 519180 words
size.info 36912 sentences
files.size 3617430
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Icelandic Frequency Dictonary
Icon
Name
IFD_2.zip
Size
3.45 MB
Format
application/zip
Description
IFD_2
MD5
fc7156deb728ffeb0bba10ba5d3a7032
 Download file  Preview
 File Preview  
    • A1A.xml-1 B
    • A4N.xml-1 B
    • A2G.xml-1 B
    • A5T.xml-1 B
    • A3M.xml-1 B
    • A1F.xml-1 B
    • A5D.xml-1 B
    • A4S.xml-1 B
    • A2L.xml-1 B
    • A4C.xml-1 B
    • A3R.xml-1 B
    • A1K.xml-1 B
    • A5I.xml-1 B
    • A3B.xml-1 B
    • A2Q.xml-1 B
    • A4H.xml-1 B
    • A2A.xml-1 B
    • A1P.xml-1 B
    • A5N.xml-1 B
    • A3G.xml-1 B
    • A4M.xml-1 B
    • A2F.xml-1 B
    • A5S.xml-1 B
    • A3L.xml-1 B
    • A1E.xml-1 B
    • A5C.xml-1 B
    • A4R.xml-1 B
    • A2K.xml-1 B
    • A4B.xml-1 B
    • otbHdr.xml-1 B
    • A3Q.xml-1 B
    • A1J.xml-1 B
    • A5H.xml-1 B
    • A3A.xml-1 B
    • A2P.xml-1 B
    • A4G.xml-1 B
    • A1O.xml-1 B
    • A5M.xml-1 B
    • A3F.xml-1 B
    • A4L.xml-1 B
    • A2E.xml-1 B
    • A1T.xml-1 B
    • A5R.xml-1 B
    • A3K.xml-1 B
    • A1D.xml-1 B
    • A5B.xml-1 B
    • A4Q.xml-1 B
    • A2J.xml-1 B
    • A4A.xml-1 B
    • A3P.xml-1 B
    • A1I.xml-1 B
    • A5G.xml-1 B
    • A2O.xml-1 B
    • A4F.xml-1 B
    • A1N.xml-1 B
    • A5L.xml-1 B
    • A3E.xml-1 B
    • A2T.xml-1 B
    • A4K.xml-1 B
    • A2D.xml-1 B
    • A1S.xml-1 B
    • A5Q.xml-1 B
    • A3J.xml-1 B
    • A1C.xml-1 B
    • A5A.xml-1 B
    • A4P.xml-1 B
    • A2I.xml-1 B
    • A3O.xml-1 B
    • A1H.xml-1 B
    • A5F.xml-1 B
    • A2N.xml-1 B
    • A4E.xml-1 B
    • A3T.xml-1 B
    • A1M.xml-1 B
    • A5K.xml-1 B
    • A3D.xml-1 B
    • A2S.xml-1 B
    • A4J.xml-1 B
    • A2C.xml-1 B
    • A1R.xml-1 B
    • A5P.xml-1 B
    • A3I.xml-1 B
    • A1B.xml-1 B
    • A4O.xml-1 B
    • A2H.xml-1 B
    • A3N.xml-1 B
    • A5E.xml-1 B
    • A4T.xml-1 B
    • A2M.xml-1 B
    • A4D.xml-1 B
    • A3S.xml-1 B
    • A1L.xml-1 B
    • A5J.xml-1 B
    • A3C.xml-1 B
    • A2R.xml-1 B
    • A2B.xml-1 B
    • A1Q.xml-1 B
    • A5O.xml-1 B
    • A3H.xml-1 B

Show simple item record