What's New

 toolService 
toolService
Description:
These are a set of speaker diarization recipes which depend on the speech toolkit Kaldi. There are two types of recipes here. First are recipes used for decoding unseen audio. The second type of recipes are for training ...
 This item contains no files.
 
Publicly Available
 toolService 
toolService
Description:
A python package that punctuates Icelandic text. The input data is unpunctuated text and punctuated text is returned. The user can choose between two punctuation models, a BERT-based Transformer and a bidirectional RNN ...
 This item contains no files.
 
Publicly Available
 corpus 
corpus
Description:
A corpus of: * 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer * 70,000 sentences taken from sports news, both before normalization and normalized using Regína ...
 This item contains 1 file (16.97 MB).
 
Publicly Available

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. The 20.05 version consists of approximately 1550 million running words of text. Each running word is accompanied by a morphosyntactic tag and lemma and ...
 This item contains 1 file (10.31 GB).
 
Publicly Available
 corpus 
corpus
Description:
Three dev/test sets for MT quality estimation created from subcorpora of ParIce. The dev/test sets contain English-Icelandic segment pairs. One of the three sets is made up of subtitle segments from OpenSubtitles, one of ...
 This item contains 1 file (135.37 MB).
 
Publicly Available
 corpus 
corpus
Description:
MIM-GOLD 20.05 is a gold standard for PoS-tagging Icelandic texts. This new version uses a revised tagset. The gold standard contains approximately 1 million running words with manually annotated PoS-tags. The texts are ...
 This item contains 1 file (5.99 MB).
 
Publicly Available