What's New

 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 1 (2017) consists of approximately 1,250 million running words of text. Each running word is accompanied by a morphosyntactic tag and lemma and ...
 This item contains 1 file (5.63 GB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 2 (2018) consists of approximately 1,400 million running words of text (punctation marks not included). Each running word is accompanied by a ...
 This item contains 1 file (6.31 GB).
 
Publicly Available

Most Viewed Items

Top Last Week
 toolService 
toolService
Description:
Tokenizer is a compact pure-Python (2 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, ...
 This item contains 1 file (239.62 KB).
 corpus 
corpus
Description:
The Icelandic Parsed Historical Corpus (IcePaHC) is a manually corrected treebank, parsed according to the annotation guidelines of The Penn Parsed Corpora of Historical English (PPCHE), with minor modifications that are ...
 This item contains 1 file (11.91 MB).
 
Publicly Available
 corpus 
corpus
Description:
A list of words in Icelandic that may in some way be considered inappropriate, taboo and/or loaded in use or meaning. These can be words such as; words that are biased against certain minorities (i.e. people of different ...
 This item contains 1 file (312 KB).
 
Publicly Available