What's New

 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 2 (2018) consists of approximately 1,400 million running words of text (punctation marks not included). Each running word is accompanied by a ...
 This item contains 1 file (6.31 GB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 2 (2018) consists of approximately 1,400 million running words of text (punctation marks not included). Each running word is accompanied by a ...
 This item contains 1 file (9.76 GB).
 
Publicly Available
 toolService 
toolService
Description:
A simple command-line tool that uses Pyphen (https://pyphen.org) to hyphenate text according to the newest hyphenation patterns from the Icelandic Hyphenation Dictionary (http://hdl.handle.net/20.500.12537/86). Can also ...
 This item contains 1 file (749.12 KB).
 
Publicly Available

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
This Icelandic named entity (NE) corpus, MIM-GOLD-NER, is a version of the MIM-GOLD corpus tagged for NEs. Over 48 thousand NEs are tagged in this corpus of one million tokens, which can be used for training named entity ...
 This item contains 13 files (8.53 MB).
 
Publicly Available
 toolService 
toolService
Description:
A web application for the editing of pronunciation dictionaries. The tool offers detailed annotation of entries, e.g. on compounds, prefixes, dialects and part-of-speech. Exports dictionaries in .tsv format for use in ...
 This item contains 1 file (1.37 MB).
 
Publicly Available
 toolService 
toolService
Description:
Grapheme-to-phoneme (g2p) models for Icelandic, trained on an encoder-decoder LSTM neural network. The models are delivered with scripts for automatic transcription of Icelandic in the standard pronunciation variation, in ...
 This item contains 1 file (312.91 MB).
 
Publicly Available