Files in this item
Download all files in item (18.04 GB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- althingi_texti.tar.gz
- Size
- 464.13 MB
- Format
- application/gzip
- Description
- Unknown
- MD5
- 79f957eec8ac54e6f4c26d38c04dfd5e
- malfong
- lang_5glarge
- L_disambig.fst51 MB
- topo1 kB
- G.carpa736 MB
- phones
- disambig.int24 B
- wdisambig_words.int7 B
- sets.int860 B
- silence.int21 B
- roots.txt2 kB
- align_lexicon.int8 MB
- extra_questions.txt1 kB
- optional_silence.int2 B
- word_boundary.int2 kB
- nonsilence.txt1 kB
- nonsilence.csl839 B
- context_indep.txt56 B
- disambig.txt18 B
- context_indep.csl21 B
- sets.txt1 kB
- silence.txt56 B
- disambig.csl24 B
- silence.csl21 B
- wdisambig.txt3 B
- align_lexicon.txt12 MB
- roots.int1 kB
- wdisambig_phones.int4 B
- extra_questions.int860 B
- optional_silence.txt4 B
- optional_silence.csl2 B
- word_boundary.txt2 kB
- nonsilence.int839 B
- context_indep.int21 B
- phones.txt2 kB
- oov.txt6 B
- words.txt3 MB
- L.fst51 MB
- oov.int2 B
- dev
- utt2spk187 kB
- spk2utt161 kB
- spk2gender396 B
- segments363 kB
- text971 kB
- eval
- utt2spk187 kB
- spk2utt161 kB
- spk2gender396 B
- segments363 kB
- text964 kB
- pron_dict.txt6 MB
- metadata.csv626 kB
- lang_3gsmall
- L_disambig.fst51 MB
- topo1 kB
- phones
- disambig.int24 B
- wdisambig_words.int7 B
- sets.int860 B
- silence.int21 B
- roots.txt2 kB
- align_lexicon.int8 MB
- extra_questions.txt1 kB
- optional_silence.int2 B
- word_boundary.int2 kB
- nonsilence.txt1 kB
- nonsilence.csl839 B
- context_indep.txt56 B
- disambig.txt18 B
- context_indep.csl21 B
- sets.txt1 kB
- silence.txt56 B
- disambig.csl24 B
- silence.csl21 B
- wdisambig.txt3 B
- align_lexicon.txt12 MB
- roots.int1 kB
- wdisambig_phones.int4 B
- extra_questions.int860 B
- optional_silence.txt4 B
- optional_silence.csl2 B
- word_boundary.txt2 kB
- nonsilence.int839 B
- context_indep.int21 B
- phones.txt2 kB
- oov.txt6 B
- words.txt3 MB
- L.fst51 MB
- oov.int2 B
- kenlm_3g_cs_023pruned.arpa.gz21 MB
- G.fst41 MB
- train
- utt2spk6 MB
- spk2utt5 MB
- spk2gender1 kB
- segments12 MB
- text34 MB
- name_id_gender.tsv5 kB
- lang_5glarge
- Name
- althingi_upptokur.zip
- Size
- 2.94 GB
- Format
- application/zip
- Description
- althingi_upptokur.zip (file 2/4)
- MD5
- 7c1837e4bddf4de55d3cbd216069711f
- Name
- althingi_upptokur.z01
- Size
- 4.88 GB
- Format
- Unknown
- Description
- althingi_upptokur.zip (file 2/4)
- MD5
- 98b6c617e1b756d649e8c1fc50e8d3d3
- Name
- althingi_upptokur.z02
- Size
- 4.88 GB
- Format
- Unknown
- Description
- althingi_upptokur.zip (file 3/4)
- MD5
- 44160831b65cfb6a9be210558b7713cd
- Name
- althingi_upptokur.z03
- Size
- 4.88 GB
- Format
- Unknown
- Description
- althingi_upptokur.zip (file 4/4)
- MD5
- 13a1a4fb5d905e2a4ab50845257a3400
- Name
- README.txt
- Size
- 7.23 KB
- Format
- Text file
- Description
- README
- MD5
- d093eb58f7e5c8efd4bc2c01aad9a0a8
############################################################################# ######### Althingi Parliamentary speach corpus ######### ######### http://hdl.handle.net/20.500.12537/277 ######### ############################################################################# THE FILES The corpus contains two packages, delivered in two compressed files: A) althingi_texts.tar.gz: The file althingi_texts.zip contains the training-, evaluation- and development sets and two language models (pruned trigram model, used in decoding and a unpruned constant arpa 5-gram model, used for rescoring decoding results). The three sets are located in the folders train, dev and eval. Each folder contains file files: - segments: links each text segment to its place in the audio files - spk2gender: lists all the speakers and their gender - spk2utt: lists all the speakers and their utterances/segments - text: lists the ID of each segment and its text - utt2spk: . . .