What's New

 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 1 (2017) consists of approximately 1,250 million running words of text. Each running word is accompanied by a morphosyntactic tag and lemma and ...
 This item contains 1 file (5.63 GB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 2 (2018) consists of approximately 1,400 million running words of text (punctation marks not included). Each running word is accompanied by a ...
 This item contains 1 file (6.31 GB).
 
Publicly Available

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
MIM-GOLD 20.05 is a gold standard for PoS-tagging Icelandic texts. This new version uses a revised tagset. The gold standard contains approximately 1 million running words with manually annotated PoS-tags. The texts are ...
 This item contains 1 file (5.99 MB).
 
Publicly Available
 corpus 
corpus
Description:
ParIce is an English-Icelandic parallel corpus. This is the first parallel corpus built for the purposes of language technology development and research for Icelandic. It includes 3.5 million translation segment pairs from ...
 This item contains 1 file (696.19 MB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Contemporary Corpus (IceConTree) is a machine-parsed treebank parsed according to the IcePaHC annotation scheme. It consists of texts from the Icelandic Gigaword Corpus, parsed using the IceNeuralParsingPipeline. ...
 This item contains 1 file (3.92 GB).
 
Publicly Available