Icelandic Frequency Dictionary 2012.11

Pind, Jörgen; Magnússon, Friðrik; Briem, Stefán

Icelandic Frequency Dictionary 2012.11

Clarin IS Repository

Authors: Pind, Jörgen ; Magnússon, Friðrik and Briem, Stefán

Item identifier: http://hdl.handle.net/20.500.12537/35

Project URL: http://www.malfong.is/?pg=ordtidnibok

Date issued: 2012

Type: corpus, text

Size: 590299 tokens, 519180 words, 36912 sentences

Language(s): Icelandic

Description: A special text corpus was created for the making of the Icelandic Frequency Dictionary (Pind, Magnússon and Briem, 1991), published by The Institute of Lexicography in 1991. Preparations for that corpus started in 1985 and a detailed description of the work can be found in the preface to the book. There are fragments from 100 texts in the corpus; all published between the years 1980 and 1989. Each text contains about 5,000 running words. The texts were selected from five categories: Icelandic fiction (20 texts), translated fiction (20 texts), biographies and memoirs (20 texts), non-fiction (10 in the field of humanities, 10 in the field of science) and books for children and teenagers (10 original texts, 10 translations). Each token is followed by a lemma and a pos-tag. The pos-tags were analyzed automatically and then manually corrected. The version available here has been corrected further, automatically (http://aclweb.org/anthology-new/E/E09/E09-1060.pdf). ----------------- Fyrir vinnslu Íslenskrar orðtíðnibókar (Jörgen Pind, Stefán Briem og Friðrik Magnússon 1991) sem Orðabók Háskólans gaf út 1991 var gert sérstakt textasafn. Vinna við undirbúning textasafnsins hófst 1985 og er safninu lýst nákvæmlega í formála Orðtíðnibókarinnar. Í textasafninu eru brot úr 100 textum sem voru gefnir út á tímabilinu 1980-1989, hvert með um 5.000 lesmálsorðum. Textarnir voru valdir úr fimm textaflokkum: íslenskum skáldverkum (20 textar), þýddum skáldverkum (20 textar), ævisögum og minningum (20 textar), fræðslutextum (10 á sviði hugvísinda, 10 á sviði raunvísinda) og barna og unglingabókum (10 frumsamdir textar, 10 þýddir textar). Hverjum tóka fylgir lemma og málfræðimark. Málfræðimörkin voru valin vélrænt og svo leiðrétt. Útgáfan sem hér er í boði hefur verið leiðrétt en frekar (http://aclweb.org/anthology-new/E/E09/E09-1060.pdf).

Publisher: The Árni Magnússon Institute for Icelandic Studies

Subject(s): text corpus lemmatized pos-tagged

Collection(s): Clarin IS

This item is replaced by a newer submission:

https://repository.clarin.is/repository/xmlui/handle/20.500.12537/36

Show full item record

Files in this item

This item is

Publicly Available

and licensed under:
Icelandic Frequency Dictonary

Name: IFD_1.zip
Size: 3.46 MB
Format: application/zip
Description: IFD_1
MD5: a774e5badc9b1e3a55102b8ec88105a0

Download file Preview

File Preview

- A1A.xml-1 B
- A4N.xml-1 B
- A2G.xml-1 B
- A5T.xml-1 B
- A3M.xml-1 B
- A1F.xml-1 B
- A5D.xml-1 B
- A4S.xml-1 B
- A2L.xml-1 B
- A4C.xml-1 B
- A3R.xml-1 B
- A1K.xml-1 B
- A5I.xml-1 B
- A3B.xml-1 B
- A2Q.xml-1 B
- A4H.xml-1 B
- A2A.xml-1 B
- A1P.xml-1 B
- A5N.xml-1 B
- A3G.xml-1 B
- A4M.xml-1 B
- A2F.xml-1 B
- A5S.xml-1 B
- A3L.xml-1 B
- A1E.xml-1 B
- A5C.xml-1 B
- A4R.xml-1 B
- A2K.xml-1 B
- A4B.xml-1 B
- otbHdr.xml-1 B
- A3Q.xml-1 B
- A1J.xml-1 B
- A5H.xml-1 B
- A3A.xml-1 B
- A2P.xml-1 B
- A4G.xml-1 B
- A1O.xml-1 B
- A5M.xml-1 B
- A3F.xml-1 B
- A4L.xml-1 B
- A2E.xml-1 B
- A1T.xml-1 B
- A5R.xml-1 B
- A3K.xml-1 B
- A1D.xml-1 B
- A5B.xml-1 B
- A4Q.xml-1 B
- A2J.xml-1 B
- A4A.xml-1 B
- A3P.xml-1 B
- A1I.xml-1 B
- A5G.xml-1 B
- A2O.xml-1 B
- A4F.xml-1 B
- A1N.xml-1 B
- A5L.xml-1 B
- A3E.xml-1 B
- A2T.xml-1 B
- A4K.xml-1 B
- A2D.xml-1 B
- A1S.xml-1 B
- A5Q.xml-1 B
- A3J.xml-1 B
- A1C.xml-1 B
- A5A.xml-1 B
- A4P.xml-1 B
- A2I.xml-1 B
- A3O.xml-1 B
- A1H.xml-1 B
- A5F.xml-1 B
- A2N.xml-1 B
- A4E.xml-1 B
- A3T.xml-1 B
- A1M.xml-1 B
- A5K.xml-1 B
- A3D.xml-1 B
- A2S.xml-1 B
- A4J.xml-1 B
- A2C.xml-1 B
- A1R.xml-1 B
- A5P.xml-1 B
- A3I.xml-1 B
- A1B.xml-1 B
- A4O.xml-1 B
- A2H.xml-1 B
- A3N.xml-1 B
- A5E.xml-1 B
- A4T.xml-1 B
- A2M.xml-1 B
- A4D.xml-1 B
- A3S.xml-1 B
- A1L.xml-1 B
- A5J.xml-1 B
- A3C.xml-1 B
- A2R.xml-1 B
- A2B.xml-1 B
- A1Q.xml-1 B
- A5O.xml-1 B
- A3H.xml-1 B

Icelandic Frequency Dictionary 2012.11

Files in this item

Partners, Coordination, Funding

Repository

More