What's New
corpus
Description:
[English]
This is a JSONL version of the 2024 release of the Icelandic Gigaword Corpus (IGC), prepared for language model training. The archive contains training and validation sets of unannotated documents from the ...
This item contains 2 files (2.48
GB).
Publicly Available
corpus
Description:
[English]
This is a JSONL version of the 2024 release of the Icelandic Gigaword Corpus (IGC), prepared for language model training. The archive contains training and validation sets of unannotated, CC-BY-licensed documents ...
This item contains 2 files (2.01
GB).
Publicly Available
lexicalConceptualResource
Description:
This dataset contains terminology from basketball, chess, football, golf and gymnastics. The vocabulary found here originates in a test suite at WMT25 (Conference on Machine Translation) where sports segments were translated ...
This item contains 2 files (16.36
KB).
Publicly Available