dc.contributor.author | Arnardóttir, Þórunn |
dc.contributor.author | Ingason, Anton Karl |
dc.date.accessioned | 2023-06-23T15:31:15Z |
dc.date.available | 2023-06-23T15:31:15Z |
dc.date.issued | 2023-06-01 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/314 |
dc.description | The data consists of frequency lists for three Icelandic corpora: The Icelandic Parsed Historical Corpus (IcePaHC; http://hdl.handle.net/20.500.12537/62), the Tagged Icelandic Corpus (MÍM; http://hdl.handle.net/20.500.12537/195) and the Icelandic Gigaword Corpus (IGC; http://hdl.handle.net/20.500.12537/254). Frequency information for each corpus is shown in a tsv file which has three columns, first the lemma, then its word category, including the gender if the word in question is a noun, and finally, the lemma's frequency. Lemmas are sorted according to descending frequency. Gögnin innihalda tíðnilista fyrir þrjár íslenskar málheildir: Sögulega íslenska trjábankann (IcePaHC; http://hdl.handle.net/20.500.12537/62), Markaða íslenska málheild (MÍM; http://hdl.handle.net/20.500.12537/195) og Risamálheildina (IGC; http://hdl.handle.net/20.500.12537/254). Tíðnilisti fyrir hverja málheild er í tsv-skjali sem er skipt í þrjá dálka: einn fyrir uppflettimynd, annan fyrir orðflokk hennar, ásamt kyni orðsins ef það er nafnorð, og að lokum einn fyrir tíðni uppflettimyndarinnar. Uppflettimyndum er raðað miðað við lækkandi tíðni þeirra. |
dc.language.iso | isl |
dc.publisher | University of Iceland |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://github.com/thorunna/LemmaFrequency |
dc.subject | lemma frequency |
dc.subject | frequency lists |
dc.subject | icepahc |
dc.subject | mím |
dc.subject | igc |
dc.title | Frequency lists for Icelandic 23.06 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Þórunn Arnardóttir thar@hi.is University of Iceland |
files.size | 68936611 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- Frequency_lists.zip
- Size
- 65.74 MB
- Format
- application/zip
- Description
- A zip file containing the three frequency lists
- MD5
- 46a53aee186de58dad65a507c76068ac
- Frequency lists for Icelandic
- mim_simple_freq.tsv9 MB
- icepahc_simple_freq.tsv641 kB
- giga_simple_freq.tsv146 MB