Files in this item

 Download all files in item (18.04 GB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
althingi_texti.tar.gz
Size
464.13 MB
Format
application/gzip
Description
Unknown
MD5
79f957eec8ac54e6f4c26d38c04dfd5e
 Download file  Preview
 File Preview  
  • malfong
    • lang_5glarge
      • L_disambig.fst51 MB
      • topo1 kB
      • G.carpa736 MB
      • phones
        • disambig.int24 B
        • wdisambig_words.int7 B
        • sets.int860 B
        • silence.int21 B
        • roots.txt2 kB
        • align_lexicon.int8 MB
        • extra_questions.txt1 kB
        • optional_silence.int2 B
        • word_boundary.int2 kB
        • nonsilence.txt1 kB
        • nonsilence.csl839 B
        • context_indep.txt56 B
        • disambig.txt18 B
        • context_indep.csl21 B
        • sets.txt1 kB
        • silence.txt56 B
        • disambig.csl24 B
        • silence.csl21 B
        • wdisambig.txt3 B
        • align_lexicon.txt12 MB
        • roots.int1 kB
        • wdisambig_phones.int4 B
        • extra_questions.int860 B
        • optional_silence.txt4 B
        • optional_silence.csl2 B
        • word_boundary.txt2 kB
        • nonsilence.int839 B
        • context_indep.int21 B
      • phones.txt2 kB
      • oov.txt6 B
      • words.txt3 MB
      • L.fst51 MB
      • oov.int2 B
    • dev
      • utt2spk187 kB
      • spk2utt161 kB
      • spk2gender396 B
      • segments363 kB
      • text971 kB
    • eval
      • utt2spk187 kB
      • spk2utt161 kB
      • spk2gender396 B
      • segments363 kB
      • text964 kB
    • pron_dict.txt6 MB
    • metadata.csv626 kB
    • lang_3gsmall
      • L_disambig.fst51 MB
      • topo1 kB
      • phones
        • disambig.int24 B
        • wdisambig_words.int7 B
        • sets.int860 B
        • silence.int21 B
        • roots.txt2 kB
        • align_lexicon.int8 MB
        • extra_questions.txt1 kB
        • optional_silence.int2 B
        • word_boundary.int2 kB
        • nonsilence.txt1 kB
        • nonsilence.csl839 B
        • context_indep.txt56 B
        • disambig.txt18 B
        • context_indep.csl21 B
        • sets.txt1 kB
        • silence.txt56 B
        • disambig.csl24 B
        • silence.csl21 B
        • wdisambig.txt3 B
        • align_lexicon.txt12 MB
        • roots.int1 kB
        • wdisambig_phones.int4 B
        • extra_questions.int860 B
        • optional_silence.txt4 B
        • optional_silence.csl2 B
        • word_boundary.txt2 kB
        • nonsilence.int839 B
        • context_indep.int21 B
      • phones.txt2 kB
      • oov.txt6 B
      • words.txt3 MB
      • L.fst51 MB
      • oov.int2 B
      • kenlm_3g_cs_023pruned.arpa.gz21 MB
      • G.fst41 MB
    • train
      • utt2spk6 MB
      • spk2utt5 MB
      • spk2gender1 kB
      • segments12 MB
      • text34 MB
    • name_id_gender.tsv5 kB
Icon
Name
althingi_upptokur.zip
Size
2.94 GB
Format
application/zip
Description
althingi_upptokur.zip (file 2/4)
MD5
7c1837e4bddf4de55d3cbd216069711f
 Download file
Icon
Name
althingi_upptokur.z01
Size
4.88 GB
Format
Unknown
Description
althingi_upptokur.zip (file 2/4)
MD5
98b6c617e1b756d649e8c1fc50e8d3d3
 Download file
Icon
Name
althingi_upptokur.z02
Size
4.88 GB
Format
Unknown
Description
althingi_upptokur.zip (file 3/4)
MD5
44160831b65cfb6a9be210558b7713cd
 Download file
Icon
Name
althingi_upptokur.z03
Size
4.88 GB
Format
Unknown
Description
althingi_upptokur.zip (file 4/4)
MD5
13a1a4fb5d905e2a4ab50845257a3400
 Download file
Icon
Name
README.txt
Size
7.23 KB
Format
Text file
Description
README
MD5
d093eb58f7e5c8efd4bc2c01aad9a0a8
 Download file  Preview
 File Preview  
#############################################################################
#########            Althingi Parliamentary speach corpus           #########
#########            http://hdl.handle.net/20.500.12537/277         #########
#############################################################################

THE FILES

The corpus contains two packages, delivered in two compressed files:

A) althingi_texts.tar.gz:

The file althingi_texts.zip contains the training-, evaluation- and 
development sets and two language models (pruned trigram model, used in 
decoding and a unpruned constant arpa 5-gram model, used for rescoring 
decoding results).

The three sets are located in the folders train, dev and eval. Each folder contains file files:

- segments: links each text segment to its place in the audio files 
- spk2gender: lists all the speakers and their gender 
- spk2utt: lists all the speakers and their utterances/segments
- text: lists the ID of each segment and its text
- utt2spk: . . .