Show simple item record

 
dc.contributor.author Mena, Carlos
dc.contributor.author Borsky, Michal
dc.contributor.author Mollberg, David Erik
dc.contributor.author Guðmundsson, Smári Freyr
dc.contributor.author Hedström, Staffan
dc.contributor.author Pálsson, Ragnar
dc.contributor.author Jónsson, Ólafur Helgi
dc.contributor.author Þorsteinsdóttir, Sunneva
dc.contributor.author Guðmundsdóttir, Jóhanna Vigdís
dc.contributor.author Magnúsdóttir, Eydís Huld
dc.contributor.author Þórhallsdóttir, Ragnheiður
dc.contributor.author Gudnason, Jon
dc.date.accessioned 2022-01-31T13:20:41Z
dc.date.available 2022-01-31T13:20:41Z
dc.date.issued 2021-09-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/185
dc.description This release of data from the Samrómur collection focuses on children (4-17 years old). It contains more than 137.000 (131 hours) of validated speech recordings uttered by children in Icelandic. The corpus is a result of the crowd-sourcing effort run by the Language and Voice Lab (LVL) at the Reykjavik University, in cooperation with Almannarómur, Center for Language Technology. The recording process was started in October 2019 and continues to this day (January 2022). The corpus contains 3,175 speakers and has been split into train, dev, and test sets. Lengths of the sets are: train = 127h25m, test = 1h50m, dev = 1h50m. Each subset contains folders that correspond to speaker IDs, and the audio files inside use the following naming convention: {speaker_ID}-{utterance_ID}.flac. The average recording length is 3.4 seconds. Þessi útgáfa gagna úr safni Samróms einblínir á börn (4-17 ára). Útgáfan inniheldur 137.000 (131 klst.) staðfestar talupptökur frá börnum á íslensku. Málheildin er afrakstur lýðvistunar (e. crowd sourcing) á vegum Mál- og raddtæknistofu (LVL) við Gervigreindarsetur Háskólans í Reykjavík, í samvinnu við Almannaróm, miðstöð máltækni á Íslandi. Upptökur hófust í október 2019 og standa enn yfir (janúar 2022). Hópurinn inniheldur 3.175 mismunandi raddir og hefur verið skipt í þjálfunar- (train), þróunar- (dev) og prófunarsett (test). Lengd settanna er: þjálfunarsett = 127 klst 25 mín, prófunarsett = 1 klst 50 mín, þróunarsett = 1 klst 50 mín. Hvert sett inniheldur möppur sem samsvara auðkenni raddar. Hljóðskrárnar nota eftirfarandi nafnareglur: {speaker_ID}-{utterance_ID}.flac. Meðallengd upptöku er 3,4 sekúnda.
dc.language.iso isl
dc.publisher Reykjavík University
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/cadia-lvl/samromur
dc.subject audio corpus
dc.subject speech recognition
dc.subject automatic speech recognition
dc.subject children
dc.title Samromur Children 21.09
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
has.files yes
branding Clarin IS Repository
contact.person Jon Gudnason jg@ru.is Reykjavík University
sponsor Ministry of Education, Science and Culture Data recording using Eyra/Samrómur (H1) Language Technology for Icelandic 2019-2023 nationalFunds
size.info 8 gb
size.info 131 hours
size.info 137597 utterances
files.size 6829364691
files.count 4


 Files in this item

 Download all files in item (6.36 GB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
samromur_children_21.09.zip
Size
512.99 MB
Format
application/zip
Description
main file for corpus
MD5
c89cbabaafaf68f71f6d01aed1030826
 Download file
Icon
Name
samromur_children_21.09.z01
Size
1.95 GB
Format
Unknown
Description
part 1
MD5
43ad12f3eaadb25a24ed1dfa9c5090a3
 Download file
Icon
Name
samromur_children_21.09.z02
Size
1.95 GB
Format
Unknown
Description
part 2
MD5
ad0def6d195d53e02de4323b729299f2
 Download file
Icon
Name
samromur_children_21.09.z03
Size
1.95 GB
Format
Unknown
Description
part 3
MD5
f5ae6da99404fd5260ede7de68170a78
 Download file

Show simple item record