What's New

 toolService 
toolService
Description:
These are a set of speaker diarization recipes which depend on the speech toolkit Kaldi. There are two types of recipes here. First are recipes used for decoding unseen audio. The second type of recipes are for training ...
 This item contains no files.
 
Publicly Available
 toolService 
toolService
Description:
A python package that punctuates Icelandic text. The input data is unpunctuated text and punctuated text is returned. The user can choose between two punctuation models, a BERT-based Transformer and a bidirectional RNN ...
 This item contains no files.
 
Publicly Available
 corpus 
corpus
Description:
A corpus of: * 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer * 70,000 sentences taken from sports news, both before normalization and normalized using Regína ...
 This item contains 1 file (16.97 MB).
 
Publicly Available

Most Viewed Items

Top Last Week
 toolService 
toolService
Description:
A dockerized Named Entity Recognition (NER) API for Icelandic. It uses a ELECTRA-base language model, that has been fine tuned for NER using MIM-GOLD-NER. It achieves F1-score of ~91.9 on the test set for MIM-GOLD-NER. ...
 This item contains 1 file (389.74 MB).
 
Publicly Available
 corpus 
corpus
Description:
Three dev/test sets for MT quality estimation created from subcorpora of ParIce. The dev/test sets contain English-Icelandic segment pairs. One of the three sets is made up of subtitle segments from OpenSubtitles, one of ...
 This item contains 1 file (135.37 MB).
 
Publicly Available
 corpus 
corpus
Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. The 20.05 version consists of approximately 1550 million running words of text. Each running word is accompanied by a morphosyntactic tag and lemma and ...
 This item contains 1 file (10.31 GB).
 
Publicly Available