Althingi's Parliamentary Speeches

Helgadóttir, Inga Rún; Kjaran, Róbert; Nikulásdóttir, Anna Björk; Guðnason, Jón

dc.contributor.author	Helgadóttir, Inga Rún
dc.contributor.author	Kjaran, Róbert
dc.contributor.author	Nikulásdóttir, Anna Björk
dc.contributor.author	Guðnason, Jón
dc.date.accessioned	2022-09-29T08:51:45Z
dc.date.available	2022-09-29T08:51:45Z
dc.date.issued	2017-07-01
dc.identifier.uri	http://hdl.handle.net/20.500.12537/277
dc.description	[ENGLISH] Althingi's Parliamentary Speeches is an aligned and segmented corpus of speech recordings. This is an aligned and segmented corpus of 6493 Althingi recordings with 196 speakers. The corpus is split up into a training-, development- and evaluation set.
dc.description	[ICELANDIC] Alþingisgögnin eru samröðuð tal- og textagögn, unnin upp úr ræðum á Alþingi. Gögnin samanstanda af 6493 Alþingisræðum, frá 196 ræðumönnum. Þau eru samröðuð og skipt niður í hæfilega stórar einingar fyrir þjálfun. Gagnasafninu er skipt upp í þjálfunarsett og tvö prófunarsett “dev” og “eval”.
dc.language.iso	isl
dc.publisher	The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreferencedby	https://clarin.is/media/uploads/building-asr-corpus_final.pdf
dc.rights	Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.rights.label	PUB
dc.source.uri	https://clarin.is/gogn/althingisgognin/
dc.subject	parliamentary speeches
dc.subject	speeches
dc.subject	recordings
dc.title	Althingi's Parliamentary Speeches
dc.title	Alþingisgögnin (til talgreiningar)
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	audio
hasMetadata	false
has.files	yes
branding	Clarin IS Repository
contact.person	Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
size.info	199614 elements
size.info	542 hours
size.info	4583751 words
files.size	19374308379
files.count	6

Files in this item

Download all files in item (18.04 GB)

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

Name: althingi_texti.tar.gz
Size: 464.13 MB
Format: application/gzip
Description: Unknown
MD5: 79f957eec8ac54e6f4c26d38c04dfd5e

Download file Preview

File Preview

malfong
- lang_5glarge
  - L_disambig.fst51 MB
  - topo1 kB
  - G.carpa736 MB
  - phones
    - disambig.int24 B
    - wdisambig_words.int7 B
    - sets.int860 B
    - silence.int21 B
    - roots.txt2 kB
    - align_lexicon.int8 MB
    - extra_questions.txt1 kB
    - optional_silence.int2 B
    - word_boundary.int2 kB
    - nonsilence.txt1 kB
    - nonsilence.csl839 B
    - context_indep.txt56 B
    - disambig.txt18 B
    - context_indep.csl21 B
    - sets.txt1 kB
    - silence.txt56 B
    - disambig.csl24 B
    - silence.csl21 B
    - wdisambig.txt3 B
    - align_lexicon.txt12 MB
    - roots.int1 kB
    - wdisambig_phones.int4 B
    - extra_questions.int860 B
    - optional_silence.txt4 B
    - optional_silence.csl2 B
    - word_boundary.txt2 kB
    - nonsilence.int839 B
    - context_indep.int21 B
  - phones.txt2 kB
  - oov.txt6 B
  - words.txt3 MB
  - L.fst51 MB
  - oov.int2 B
- dev
  - utt2spk187 kB
  - spk2utt161 kB
  - spk2gender396 B
  - segments363 kB
  - text971 kB
- eval
  - utt2spk187 kB
  - spk2utt161 kB
  - spk2gender396 B
  - segments363 kB
  - text964 kB
- pron_dict.txt6 MB
- metadata.csv626 kB
- lang_3gsmall
  - L_disambig.fst51 MB
  - topo1 kB
  - phones
    - disambig.int24 B
    - wdisambig_words.int7 B
    - sets.int860 B
    - silence.int21 B
    - roots.txt2 kB
    - align_lexicon.int8 MB
    - extra_questions.txt1 kB
    - optional_silence.int2 B
    - word_boundary.int2 kB
    - nonsilence.txt1 kB
    - nonsilence.csl839 B
    - context_indep.txt56 B
    - disambig.txt18 B
    - context_indep.csl21 B
    - sets.txt1 kB
    - silence.txt56 B
    - disambig.csl24 B
    - silence.csl21 B
    - wdisambig.txt3 B
    - align_lexicon.txt12 MB
    - roots.int1 kB
    - wdisambig_phones.int4 B
    - extra_questions.int860 B
    - optional_silence.txt4 B
    - optional_silence.csl2 B
    - word_boundary.txt2 kB
    - nonsilence.int839 B
    - context_indep.int21 B
  - phones.txt2 kB
  - oov.txt6 B
  - words.txt3 MB
  - L.fst51 MB
  - oov.int2 B
  - kenlm_3g_cs_023pruned.arpa.gz21 MB
  - G.fst41 MB
- train
  - utt2spk6 MB
  - spk2utt5 MB
  - spk2gender1 kB
  - segments12 MB
  - text34 MB
- name_id_gender.tsv5 kB

Name: althingi_upptokur.zip
Size: 2.94 GB
Format: application/zip
Description: althingi_upptokur.zip (file 2/4)
MD5: 7c1837e4bddf4de55d3cbd216069711f

Download file

Name: althingi_upptokur.z01
Size: 4.88 GB
Format: Unknown
Description: althingi_upptokur.zip (file 2/4)
MD5: 98b6c617e1b756d649e8c1fc50e8d3d3

Download file

Name: althingi_upptokur.z02
Size: 4.88 GB
Format: Unknown
Description: althingi_upptokur.zip (file 3/4)
MD5: 44160831b65cfb6a9be210558b7713cd

Download file

Name: althingi_upptokur.z03
Size: 4.88 GB
Format: Unknown
Description: althingi_upptokur.zip (file 4/4)
MD5: 13a1a4fb5d905e2a4ab50845257a3400

Download file

Name: README.txt
Size: 7.23 KB
Format: Text file
Description: README
MD5: d093eb58f7e5c8efd4bc2c01aad9a0a8

Download file Preview

File Preview

#############################################################################
#########            Althingi Parliamentary speach corpus           #########
#########            http://hdl.handle.net/20.500.12537/277         #########
#############################################################################

THE FILES

The corpus contains two packages, delivered in two compressed files:

A) althingi_texts.tar.gz:

The file althingi_texts.zip contains the training-, evaluation- and 
development sets and two language models (pruned trigram model, used in 
decoding and a unpruned constant arpa 5-gram model, used for rescoring 
decoding results).

The three sets are located in the folders train, dev and eval. Each folder contains file files:

- segments: links each text segment to its place in the audio files 
- spk2gender: lists all the speakers and their gender 
- spk2utt: lists all the speakers and their utterances/segments
- text: lists the ID of each segment and its text
- utt2spk: . . .

Show simple item record

Files in this item

Partners, Coordination, Funding

Repository

More