What's New

 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 1 (2017) consists of approximately 1,250 million running words of text. Each running word is accompanied by a morphosyntactic tag and lemma and ...
 This item contains 1 file (5.63 GB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 2 (2018) consists of approximately 1,400 million running words of text (punctation marks not included). Each running word is accompanied by a ...
 This item contains 1 file (6.31 GB).
 
Publicly Available

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The Icelandic Parsed Historical Corpus (IcePaHC) is a manually corrected treebank, parsed according to the annotation guidelines of The Penn Parsed Corpora of Historical English (PPCHE), with minor modifications that are ...
 This item contains 1 file (11.91 MB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Contemporary Corpus (IceConTree) is a machine-parsed treebank parsed according to the IcePaHC annotation scheme. It consists of texts from the Icelandic Gigaword Corpus, parsed using the IceNeuralParsingPipeline. ...
 This item contains 1 file (3.92 GB).
 
Publicly Available
 corpus 
corpus
Description:
This Icelandic named entity (NE) corpus, MIM-GOLD-NER, is a version of the MIM-GOLD corpus tagged for NEs. Over 48 thousand NEs are tagged in this corpus of one million tokens, which can be used for training named entity ...
 This item contains 13 files (8.53 MB).
 
Publicly Available