Show simple item record

 
dc.contributor.author Helgadóttir, Inga Rún
dc.contributor.author Kjaran, Róbert
dc.contributor.author Nikulásdóttir, Anna Björk
dc.contributor.author Guðnason, Jón
dc.date.accessioned 2022-09-29T08:51:45Z
dc.date.available 2022-09-29T08:51:45Z
dc.date.issued 2017-07-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/277
dc.description [ENGLISH] Althingi's Parliamentary Speeches is an aligned and segmented corpus of speech recordings. This is an aligned and segmented corpus of 6493 Althingi recordings with 196 speakers. The corpus is split up into a training-, development- and evaluation set.
dc.description [ICELANDIC] Alþingisgögnin eru samröðuð tal- og textagögn, unnin upp úr ræðum á Alþingi. Gögnin samanstanda af 6493 Alþingisræðum, frá 196 ræðumönnum. Þau eru samröðuð og skipt niður í hæfilega stórar einingar fyrir þjálfun. Gagnasafninu er skipt upp í þjálfunarsett og tvö prófunarsett “dev” og “eval”.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby https://clarin.is/media/uploads/building-asr-corpus_final.pdf
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://clarin.is/gogn/althingisgognin/
dc.subject parliamentary speeches
dc.subject speeches
dc.subject recordings
dc.title Althingi's Parliamentary Speeches
dc.title Alþingisgögnin (til talgreiningar)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
hasMetadata false
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
size.info 199614 elements
size.info 542 hours
size.info 4583751 words
files.size 19374308379
files.count 6


 Files in this item

 Download all files in item (18.04 GB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
althingi_texti.tar.gz
Size
464.13 MB
Format
application/gzip
Description
Unknown
MD5
79f957eec8ac54e6f4c26d38c04dfd5e
 Download file  Preview
 File Preview  
  • malfong
    • lang_5glarge
      • L_disambig.fst51 MB
      • topo1 kB
      • G.carpa736 MB
      • phones
        • disambig.int24 B
        • wdisambig_words.int7 B
        • sets.int860 B
        • silence.int21 B
        • roots.txt2 kB
        • align_lexicon.int8 MB
        • extra_questions.txt1 kB
        • optional_silence.int2 B
        • word_boundary.int2 kB
        • nonsilence.txt1 kB
        • nonsilence.csl839 B
        • context_indep.txt56 B
        • disambig.txt18 B
        • context_indep.csl21 B
        • sets.txt1 kB
        • silence.txt56 B
        • disambig.csl24 B
        • silence.csl21 B
        • wdisambig.txt3 B
        • align_lexicon.txt12 MB
        • roots.int1 kB
        • wdisambig_phones.int4 B
        • extra_questions.int860 B
        • optional_silence.txt4 B
        • optional_silence.csl2 B
        • word_boundary.txt2 kB
        • nonsilence.int839 B
        • context_indep.int21 B
      • phones.txt2 kB
      • oov.txt6 B
      • words.txt3 MB
      • L.fst51 MB
      • oov.int2 B
    • dev
      • utt2spk187 kB
      • spk2utt161 kB
      • spk2gender396 B
      • segments363 kB
      • text971 kB
    • eval
      • utt2spk187 kB
      • spk2utt161 kB
      • spk2gender396 B
      • segments363 kB
      • text964 kB
    • pron_dict.txt6 MB
    • metadata.csv626 kB
    • lang_3gsmall
      • L_disambig.fst51 MB
      • topo1 kB
      • phones
        • disambig.int24 B
        • wdisambig_words.int7 B
        • sets.int860 B
        • silence.int21 B
        • roots.txt2 kB
        • align_lexicon.int8 MB
        • extra_questions.txt1 kB
        • optional_silence.int2 B
        • word_boundary.int2 kB
        • nonsilence.txt1 kB
        • nonsilence.csl839 B
        • context_indep.txt56 B
        • disambig.txt18 B
        • context_indep.csl21 B
        • sets.txt1 kB
        • silence.txt56 B
        • disambig.csl24 B
        • silence.csl21 B
        • wdisambig.txt3 B
        • align_lexicon.txt12 MB
        • roots.int1 kB
        • wdisambig_phones.int4 B
        • extra_questions.int860 B
        • optional_silence.txt4 B
        • optional_silence.csl2 B
        • word_boundary.txt2 kB
        • nonsilence.int839 B
        • context_indep.int21 B
      • phones.txt2 kB
      • oov.txt6 B
      • words.txt3 MB
      • L.fst51 MB
      • oov.int2 B
      • kenlm_3g_cs_023pruned.arpa.gz21 MB
      • G.fst41 MB
    • train
      • utt2spk6 MB
      • spk2utt5 MB
      • spk2gender1 kB
      • segments12 MB
      • text34 MB
    • name_id_gender.tsv5 kB
Icon
Name
althingi_upptokur.zip
Size
2.94 GB
Format
application/zip
Description
althingi_upptokur.zip (file 2/4)
MD5
7c1837e4bddf4de55d3cbd216069711f
 Download file
Icon
Name
althingi_upptokur.z01
Size
4.88 GB
Format
Unknown
Description
althingi_upptokur.zip (file 2/4)
MD5
98b6c617e1b756d649e8c1fc50e8d3d3
 Download file
Icon
Name
althingi_upptokur.z02
Size
4.88 GB
Format
Unknown
Description
althingi_upptokur.zip (file 3/4)
MD5
44160831b65cfb6a9be210558b7713cd
 Download file
Icon
Name
althingi_upptokur.z03
Size
4.88 GB
Format
Unknown
Description
althingi_upptokur.zip (file 4/4)
MD5
13a1a4fb5d905e2a4ab50845257a3400
 Download file
Icon
Name
README.txt
Size
7.23 KB
Format
Text file
Description
README
MD5
d093eb58f7e5c8efd4bc2c01aad9a0a8
 Download file  Preview
 File Preview  
#############################################################################
#########            Althingi Parliamentary speach corpus           #########
#########            http://hdl.handle.net/20.500.12537/277         #########
#############################################################################

THE FILES

The corpus contains two packages, delivered in two compressed files:

A) althingi_texts.tar.gz:

The file althingi_texts.zip contains the training-, evaluation- and 
development sets and two language models (pruned trigram model, used in 
decoding and a unpruned constant arpa 5-gram model, used for rescoring 
decoding results).

The three sets are located in the folders train, dev and eval. Each folder contains file files:

- segments: links each text segment to its place in the audio files 
- spk2gender: lists all the speakers and their gender 
- spk2utt: lists all the speakers and their utterances/segments
- text: lists the ID of each segment and its text
- utt2spk: . . .
                                            

Show simple item record