dc.contributor.author | Helgadóttir, Inga Rún |
dc.contributor.author | Kjaran, Róbert |
dc.contributor.author | Nikulásdóttir, Anna Björk |
dc.contributor.author | Guðnason, Jón |
dc.date.accessioned | 2022-09-29T08:51:45Z |
dc.date.available | 2022-09-29T08:51:45Z |
dc.date.issued | 2017-07-01 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/277 |
dc.description | [ENGLISH] Althingi's Parliamentary Speeches is an aligned and segmented corpus of speech recordings. This is an aligned and segmented corpus of 6493 Althingi recordings with 196 speakers. The corpus is split up into a training-, development- and evaluation set. |
dc.description | [ICELANDIC] Alþingisgögnin eru samröðuð tal- og textagögn, unnin upp úr ræðum á Alþingi. Gögnin samanstanda af 6493 Alþingisræðum, frá 196 ræðumönnum. Þau eru samröðuð og skipt niður í hæfilega stórar einingar fyrir þjálfun. Gagnasafninu er skipt upp í þjálfunarsett og tvö prófunarsett “dev” og “eval”. |
dc.language.iso | isl |
dc.publisher | The Árni Magnússon Institute for Icelandic Studies |
dc.relation.isreferencedby | https://clarin.is/media/uploads/building-asr-corpus_final.pdf |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://clarin.is/gogn/althingisgognin/ |
dc.subject | parliamentary speeches |
dc.subject | speeches |
dc.subject | recordings |
dc.title | Althingi's Parliamentary Speeches |
dc.title | Alþingisgögnin (til talgreiningar) |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | audio |
hasMetadata | false |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
size.info | 199614 elements |
size.info | 542 hours |
size.info | 4583751 words |
files.size | 19374308379 |
files.count | 6 |
Files in this item
Download all files in item (18.04 GB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- althingi_texti.tar.gz
- Size
- 464.13 MB
- Format
- application/gzip
- Description
- Unknown
- MD5
- 79f957eec8ac54e6f4c26d38c04dfd5e
- malfong
- lang_5glarge
- L_disambig.fst51 MB
- topo1 kB
- G.carpa736 MB
- phones
- disambig.int24 B
- wdisambig_words.int7 B
- sets.int860 B
- silence.int21 B
- roots.txt2 kB
- align_lexicon.int8 MB
- extra_questions.txt1 kB
- optional_silence.int2 B
- word_boundary.int2 kB
- nonsilence.txt1 kB
- nonsilence.csl839 B
- context_indep.txt56 B
- disambig.txt18 B
- context_indep.csl21 B
- sets.txt1 kB
- silence.txt56 B
- disambig.csl24 B
- silence.csl21 B
- wdisambig.txt3 B
- align_lexicon.txt12 MB
- roots.int1 kB
- wdisambig_phones.int4 B
- extra_questions.int860 B
- optional_silence.txt4 B
- optional_silence.csl2 B
- word_boundary.txt2 kB
- nonsilence.int839 B
- context_indep.int21 B
- phones.txt2 kB
- oov.txt6 B
- words.txt3 MB
- L.fst51 MB
- oov.int2 B
- dev
- utt2spk187 kB
- spk2utt161 kB
- spk2gender396 B
- segments363 kB
- text971 kB
- eval
- utt2spk187 kB
- spk2utt161 kB
- spk2gender396 B
- segments363 kB
- text964 kB
- pron_dict.txt6 MB
- metadata.csv626 kB
- lang_3gsmall
- L_disambig.fst51 MB
- topo1 kB
- phones
- disambig.int24 B
- wdisambig_words.int7 B
- sets.int860 B
- silence.int21 B
- roots.txt2 kB
- align_lexicon.int8 MB
- extra_questions.txt1 kB
- optional_silence.int2 B
- word_boundary.int2 kB
- nonsilence.txt1 kB
- nonsilence.csl839 B
- context_indep.txt56 B
- disambig.txt18 B
- context_indep.csl21 B
- sets.txt1 kB
- silence.txt56 B
- disambig.csl24 B
- silence.csl21 B
- wdisambig.txt3 B
- align_lexicon.txt12 MB
- roots.int1 kB
- wdisambig_phones.int4 B
- extra_questions.int860 B
- optional_silence.txt4 B
- optional_silence.csl2 B
- word_boundary.txt2 kB
- nonsilence.int839 B
- context_indep.int21 B
- phones.txt2 kB
- oov.txt6 B
- words.txt3 MB
- L.fst51 MB
- oov.int2 B
- kenlm_3g_cs_023pruned.arpa.gz21 MB
- G.fst41 MB
- train
- utt2spk6 MB
- spk2utt5 MB
- spk2gender1 kB
- segments12 MB
- text34 MB
- name_id_gender.tsv5 kB
- lang_5glarge
- Name
- althingi_upptokur.zip
- Size
- 2.94 GB
- Format
- application/zip
- Description
- althingi_upptokur.zip (file 2/4)
- MD5
- 7c1837e4bddf4de55d3cbd216069711f
- Name
- althingi_upptokur.z01
- Size
- 4.88 GB
- Format
- Unknown
- Description
- althingi_upptokur.zip (file 2/4)
- MD5
- 98b6c617e1b756d649e8c1fc50e8d3d3
- Name
- althingi_upptokur.z02
- Size
- 4.88 GB
- Format
- Unknown
- Description
- althingi_upptokur.zip (file 3/4)
- MD5
- 44160831b65cfb6a9be210558b7713cd
- Name
- althingi_upptokur.z03
- Size
- 4.88 GB
- Format
- Unknown
- Description
- althingi_upptokur.zip (file 4/4)
- MD5
- 13a1a4fb5d905e2a4ab50845257a3400
- Name
- README.txt
- Size
- 7.23 KB
- Format
- Text file
- Description
- README
- MD5
- d093eb58f7e5c8efd4bc2c01aad9a0a8
############################################################################# ######### Althingi Parliamentary speach corpus ######### ######### http://hdl.handle.net/20.500.12537/277 ######### ############################################################################# THE FILES The corpus contains two packages, delivered in two compressed files: A) althingi_texts.tar.gz: The file althingi_texts.zip contains the training-, evaluation- and development sets and two language models (pruned trigram model, used in decoding and a unpruned constant arpa 5-gram model, used for rescoring decoding results). The three sets are located in the folders train, dev and eval. Each folder contains file files: - segments: links each text segment to its place in the audio files - spk2gender: lists all the speakers and their gender - spk2utt: lists all the speakers and their utterances/segments - text: lists the ID of each segment and its text - utt2spk: . . .