What's New

 corpus 
corpus
Description:
[ENGLISH] IGC-Parla is a part of the IGC-project (Icelandic Gigaword corpus) that aims to collect as much as possible of Icelandic texts that can be published under an open or restricted license. IGC-Parla contains ...
 This item contains 1 file (3.06 GB).
 
Publicly Available
 toolService 
toolService
Description:
Tokenizer is a compact pure-Python (2.7 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, ...
 This item contains 2 files (526.52 KB).
 
Publicly Available
 toolService 
toolService
Description:
BinPackage is a Python Package that embeds the vocabulary of the DMII (bin.arnastofnun.is) and offers various lookups and queries of the data. The database, maintained by The Árni Magnússon Institute for Icelandic Studies, ...
 This item contains 2 files (18.47 MB).
 
Publicly Available

Most Viewed Items

Top Last Week
 corpus 
corpus
Description:
GreynirCorpus is a corpus of 7 million sentences, mosly from news sources, that have been parsed into full constituency trees by an automatic rule-based parser. It also contains a gold standard corpus of 2,610 hand annotated ...
 This item contains 2 files (1.52 GB).
 
Publicly Available
 languageDescription 
languageDescription
Description:
Icegrams is a Python 3 package that encapsulates a large trigram library for Icelandic. 14 million unique trigrams and their frequency counts are heavily compressed using radix tries and quasi-succinct indices employing ...
 This item contains 2 files (149.38 KB).
 
Publicly Available
 corpus 
corpus
Description:
Talrómur 2 is a public domain speech corpus for Text-To-Speech (TTS) research and development. The corpus consists of 56,225 audio clips of forty different speakers reading short sentences. The audio was recorded in 2021 ...
 This item contains 3 files (8.6 GB).
 
Publicly Available