What's New

 corpus 
corpus
Description:
The RUV TV set is 6 hours and 43 minutes of TV data from RÚV, from two talk shows and the news: Kastljós (news commentary), Kiljan (literature discussions), and the prime time news (Fréttir kl. 19:00). The data contains ...
 This item contains 1 file (5.58 GB).
 
Publicly Available
 corpus 
corpus
Description:
The Faroese Parsed Historical Corpus (FarPaHC) is a manually corrected treebank, parsed according to the annotation guidelines of The Penn Parsed Corpora of Historical English (PPCHE) and The Icelandic Parsed Historical ...
 This item contains 1 file (507.15 KB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. Version 1 (2017) consists of approximately 1,250 million running words of text. Each running word is accompanied by a morphosyntactic tag and lemma and ...
 This item contains 1 file (5.63 GB).
 
Publicly Available

Most Viewed Items

Top Last Week
 toolService 
toolService
Description:
Tokenizer is a compact pure-Python (2 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, ...
 This item contains 1 file (239.62 KB).
 corpus 
corpus
Description:
A list of words in Icelandic that may in some way be considered inappropriate, taboo and/or loaded in use or meaning. These can be words such as; words that are biased against certain minorities (i.e. people of different ...
 This item contains 1 file (312 KB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Contemporary Corpus (IceConTree) is a machine-parsed treebank parsed according to the IcePaHC annotation scheme. It consists of texts from the Icelandic Gigaword Corpus, parsed using the IceNeuralParsingPipeline. ...
 This item contains 1 file (3.92 GB).
 
Publicly Available