What's New
toolService

Description:
Annotald is a program for annotating parsed corpora in the Penn Treebank format. For more information on the format (as instantiated by the Penn Parsed Corpora of Historical English), see the documentation by Beatrice ...
This item contains 2 files (2.89
MB).
Publicly Available
toolService

Description:
Yfirlestur.is is a public website where you can enter or submit your Icelandic text and have it checked for spelling and grammar errors.
The tool also gives hints on words and structures that might not be appropriate, ...
This item contains 2 files (1.27
MB).
Publicly Available
toolService

Description:
This is a pipeline for creating GreynirSeq domain-aware translation models. A valid checkpoint of a base translation model based on mBART25 can be finetuned as a domain translation model. The resulting model can be queried ...
This item contains 2 files (4.54
MB).
Publicly Available
Most Viewed Items
Top Last Week
corpus

Description:
Talrómur is a public domain speech corpus for text-to-speech research and development.
The corpus consists of 122,417 short audio clips of eight different speakers reading short sentences.
The audio was recorded in 2020 ...
This item contains 11 files (19.99
GB).
Publicly Available
corpus

Description:
ParIce is an English-Icelandic parallel corpus. This is the first parallel corpus built for the purposes of language technology development and research for Icelandic. It includes 3.5 million translation segment pairs from ...
This item contains 1 file (696.19
MB).
Publicly Available
corpus

Description:
The Icelandic Gigaword corpus (IGC) is a tagged and lemmatized corpus. The 20.05 version consists of approximately 1530 million running words of text. Each running word is accompanied by a morphosyntactic tag and lemma and ...
This item contains 1 file (7.55
GB).
Publicly Available